+1-888-365-2779
Try Now
More in this section

Forums / Set-up & Installation / Does Search index PDFs and DOC files?

Does Search index PDFs and DOC files?

36 posts, 0 answered
  1. James Greaves
    James Greaves avatar
    25 posts
    Registered:
    21 Oct 2009
    10 May 2012
    Link to this post
    I am really just bumping this thread.

    As Shae said...We are approaching 5 years now.  Is there any movement on this issue? 
  2. Stanislav Velikov
    Stanislav Velikov avatar
    1113 posts
    Registered:
    06 Dec 2016
    15 May 2012
    Link to this post
    Hi,

    The feature is currently on the roadmap for Sitefinity 5.1. Until then you can modify the TxtDocumentSearchInboundPipe pipe and use it to search through different types of content. I've attached a sample, which is modified to search through .pdf files. For this purpose we're using a third-party library called iTextSharp.text. The sample pipe also searches in one library only (take a look at the PushData() method), so that it suits your second requirement. What is different from the sample in my colleague's blog post is that in the CanProcessItem() method we check whether the item we process has a .pdf extension, instead of a .txt extension.
    if (documentType.IsAssignableFrom(item.GetType()))
                {
                    var docItem = ((Telerik.Sitefinity.Libraries.Model.Document)item);
                        if (docItem.Extension == "pdf" || docItem.Extension == ".pdf")
                            return true;
                    return false;
                }
    Then in the GetFileLink() method we get the url:
    var manager = LibrariesManager.GetManager();
               var docUrl = String.Concat("~", manager.Provider.GetItemUrl(doc), doc.Extension);
               docUrl = Telerik.Sitefinity.Web.RouteHelper.ResolveUrl(docUrl, UrlResolveOptions.Absolute);
               return docUrl;
    The last thing we do is pass its value to the OpenPDF()method and the .pdf content is retrieved, using a sample method from the page of the iTextSharp.text library :
    private string openPDF(string fileUrl)
           {
               string str = "";
               iTextSharp.text.Document doc = new iTextSharp.text.Document();
      
               PdfReader reader = new PdfReader(fileUrl);
               for (int i = 1; i <= reader.NumberOfPages; i++)
               {
                   byte[] bt = reader.GetPageContent(i);
      
                   str += ExtractTextFromPDFBytes(bt);
      
               }
               return str;
    I have attached the modifications made to the project form this blog post.

    Regards,
    Stanislav Velikov
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  3. Don
    Don avatar
    1 posts
    Registered:
    02 Aug 2011
    05 Mar 2013
    Link to this post

    Version 5.1 has been out a while.  Was PDF functionality added to the search?

  4. Stanislav Velikov
    Stanislav Velikov avatar
    1113 posts
    Registered:
    06 Dec 2016
    08 Mar 2013
    Link to this post
    Hello,

    Yes PDF file contents are searchable in sitefintiy 5.1

    Regards,
    Stanislav Velikov
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  5. Tyler
    Tyler avatar
    7 posts
    Registered:
    21 Aug 2012
    15 Jul 2013
    Link to this post
    Hello.

    I am currently using SF 5.4.

    I have uploaded a handful of documents. I have created an index for documents only.

    When searching the results are not showing anything within the pdf. Do I need to enable this functionality somehow?

    Thanks.

    -Tyler
  6. Patrick Dunn
    Patrick Dunn avatar
    237 posts
    Registered:
    03 Nov 2014
    16 Jul 2013
    Link to this post
    Hi Tyler,

     Visit Administration > Search Indexes and reindex your search. Give it some time to reindex. If you have a lot or a few large files it can take some time.

    Try searching again afterwards. If you are still having problems clear your log files, reindex again, and feel free to attach them to a support ticket if you notice anything so we can help you analyze them.

    Regards,
    Patrick Dunn
    Telerik
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
Register for webinar
36 posts, 0 answered
1 2