More in this section

Forums / Developing with Sitefinity / Daily Exports of News, Images & Documents from DB as HTML, Image & Document Files

Daily Exports of News, Images & Documents from DB as HTML, Image & Document Files

14 posts, 1 answered
  1. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    26 Jan 2010
    Link to this post
    Hi,
        We have a News listing on our home Intranet page that must be exported and replicated to a server overseas that has no connection to Sitefinity or the database server. So we must copy HTML, image and document (pdf, docx, pptx, etc.) files to an FTP site and let our partner organization grab them from there.

        Our challenge is this: we need to save the last 14 news entries to an HTML file; we need to scan the content of the news entries for any links to documents or <img> tags, extract those files from the database, save them to the file system, and change the links in the articles to point to the new files instead of Sitefinity ashx references.

        I'm thinking we can make an RSS feed for the news and use XSLT to transform it into HTML. However, I haven't found any examples in the forum for scanning content and extracting referenced images and files from Sitefinity's Images & Documents module and saving them to the file system.

        Any ideas? Is there a better/easier way to do this?

    Thanks,
    VectorLord
  2. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    26 Jan 2010
    Link to this post
    Hi VectorLord,

    Thank you for posting your question. 

    Usually I would recommend your way - since the RSS is already working with the API to get the contents and serve them in an xml like format. The last problem - how to take the links from the content and possibly download the items. 

    You can try to parse the content field and perform regular expression search for links/images. 

    I would go with the API directly though. 

    First of all, you can use the NewsManager class to work with the News items. NewsManager.Content.GetContent() can return you the last 14 news items added. Then you have access to each item.Content property, which holds the actual news text - and here you can run your regex parser. Once you have all the links, just use webrequests to them and save the response. 

    I hope this does sound as a solution as well.

    Best wishes,
    Georgi
    the Telerik team

    Instantly find answers to your questions on the new Telerik Support Portal.
    Watch a video on how to optimize your support resource searches and check out more tips on the blogs.
    Answered
  3. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    26 Jan 2010
    Link to this post
    Ah, I knew there was probably an API solution to this. This sounds like the right direction to go. Thank you so much!
  4. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    28 Jan 2010
    Link to this post
    The API approach is coming along just fine for the News and documents; however, I just learned today that the home page is not the only content I need to export.

    It turns out, we have to export about 50% of the site content (not all site sections are included and, in fact, do not show up in the navigation on our overseas servers), images, and documents! We need the ability to export select sections of the site to flat HTML, image, and document files (with working links to the new file locations) and then copy that file structure to the overseas servers on a nightly basis. The site navigation would have to leave out the links to sections that are not included in the export process.

    Is there a magical "Publish Static Copy" button somewhere that I've foolishly overlooked? (Please say, "Yes!")

    If not, how on Earth could we do this without it becoming a huge programming project in and of itself? Perhaps a 3rd party "Spider" solution is needed?

    BTW: It looks like "sarath" might be seeking a similar solution... here.
  5. Pierre
    Pierre avatar
    433 posts
    Registered:
    16 Feb 2006
    28 Jan 2010
    Link to this post
    Hi Team,
    Hi VertorLord,

    Telerik.Cms.Engine.ContentProviderBase content some methods to export  or Import Dataset of contents. Maybe you can found some complementary explain how to use and some case of best uses. After serialize and crypt you can send around the world for later uploading and integrate. Uff a lot of time, Vectorlord! but it's possible.

    Thanks.


  6. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    29 Jan 2010
    Link to this post
    Hello,

    Unfortunately we do not have such option yet. I see why it would be helpful though. I definitely think that 3rd party spider would be a good solution. Actually there are free softwares that would download the entire web site or a section of it together with the links and images, so it could be browsable offline. 

    You can implement your own spider though. You need the sitemap object, and do web requests to each page of it (or the ones you need). You will get http response with the containing html of the page. Save the html, and I believe you have the functionality for exporting images and documents module. When you put the files together, I think you should get a workable offline copy of the site. 

    Regards,
    Georgi
    the Telerik team

    Instantly find answers to your questions on the new Telerik Support Portal.
    Watch a video on how to optimize your support resource searches and check out more tips on the blogs.
  7. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    26 May 2010
    Link to this post
    Well, we solved the problem of how to select the content needed for export. We simply created a duplicate site and pointed the content provider to the same database, but with a separate sitemap. So we built the pages we needed and inserted the content as shared content from the original site. If we make updates in the main site, the content changes are reflected in both sites.

    After my original post, and your subsequent suggestion, I built a DLL that would iterate through the news postings and save them as static HTML, complete with downloaded files and modified image and link paths. Because we wanted this done site-wide, we started looking at some 3rd party site download tools. Our problem now is that the active directory security on our intranet pages is preventing the site download tools from accessing any of the pages.

    What I want to do is point the download tool to our publicLogin.aspx page and let it pass its credentials in the queryString. The publicLogin.aspx.cs code would then authenticate it using those credentials and redirect it to the default.aspx page where it could then crawl the site. However, a call to FormsAuthentication.Authenticate(user, pass) always returns False (even if I use my own account). Is there something else I need to call or set to make this work?

    Thanks,
    VectorLord
  8. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    27 May 2010
    Link to this post
    Hello VectorLord,

    You can try something like this:
    UserManager userManager = new UserManager("YourAD");
    if(ADManager.ValidateUser(userName, password)
    {
            FormsAuthentication.SetAuthCookie(userName, true);
            var aCookie = FormsAuthentication.GetAuthCookie(userName, true);
            userManager.SetAuthenticationCookie(aCookie);
    }


    Regards,
    Radoslav Georgiev
    the Telerik team

    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items.
  9. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    27 May 2010
    Link to this post
    Is the second line supposed to read?...

    if(ADManager.ValidateUser(userName, password)

    If not, how should the ADManager object be created?

    Thanks,
    VectorLord
  10. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    27 May 2010
    Link to this post
    Hello VectorLord,

    Please see the first line. The ADManager object is an instance of UserManager class instantiated with provider name for your membership provider.

    Best wishes,
    Radoslav Georgiev
    the Telerik team

    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items.
  11. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    27 May 2010
    Link to this post
    But, the first line creates an object called userManager, not ADManager. Is this just a typo?
  12. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    27 May 2010
    Link to this post
    Hello VectorLord,

    My bad, its a substitute ADManager with userManager.

    Sincerely yours,
    Radoslav Georgiev
    the Telerik team

    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items.
  13. VectorLord
    VectorLord avatar
    24 posts
    Registered:
    28 Sep 2009
    01 Jun 2010
    Link to this post
    Okay, that works on my local VS2008 server, but not on our dev server. If I paste the following into my browser, it logs me right into my local copy and redirects to the default.aspx page...
               
    localhost:2027/AppDir/publicLogin.aspx?u=username&p=password&ReturnUrl=/AppDir/Default.aspx

    If I feed this same URL to the 3rd party site downloader, it also logs in and starts downloading the site with no problem.

    ------

    Likewise, if I paste the following into my browser, it logs me into our development server copy and redirects to the default.aspx page...
               
    devserver:200/publicLogin.aspx?u=username&p=password&ReturnUrl=/Default.aspx

    Unfortunately, when I use this same URL in the site downloader, it receives this error from the web server...
               
    "Unauthorized" (401) at link devserver:200/publicLogin.aspx?u=username&p=password&ReturnUrl=/Default.aspx

    The tool is running under my user credentials (the same as the browser), and they're both using the same username and password to authenticate to the Sitefinity site. Again, both methods work fine on my local copy (which uses the same database and content as our development server).

    Is there a difference in the way VS2008's development server handles authentication versus how IIS6.0 handles it? Is there something in the login code or web.config I could change to treat the servers differently?

    Thanks again!
    VectorLord
  14. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    03 Jun 2010
    Link to this post
    Hello VectorLord,

    Thank you for getting back to me.

    Could you run the code in debug and see if the  ValidateMethod is returning true and execution enters the if statement code block?

    Greetings,
    Radoslav Georgiev
    the Telerik team

    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items.
Register for webinar
14 posts, 1 answered