More in this section

Forums / General Discussions / Document indexing for Search

Document indexing for Search

30 posts, 0 answered
  1. apollo
    apollo avatar
    6 posts
    Registered:
    20 Nov 2006
    28 Jul 2010
    Link to this post
    Will version 4.0 have the ability to index the content of documents like PDF or Word Documents that will then be searchable? So if someone were to do a seach from the live site, and there was a match in a PDF file that is linked from a page within the site, will it show in the results?
  2. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    28 Jul 2010
    Link to this post
    Hello apollo,

    There will be search engine for all content items. The exact content of a given file will not be indexed in 4.0, we will try to provide this functionality out of the box in 4.1

    Greetings,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  3. apollo
    apollo avatar
    6 posts
    Registered:
    20 Nov 2006
    29 Jul 2010
    Link to this post
    Thanks for the response Ivan.

    Do you know of a solution that we could integrate in the meantime that will provide this functionality? Or do you have an idea when 4.1 will be available?
  4. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    29 Jul 2010
    Link to this post
    Hi apollo,

    The search engine for Sitefinity 4 has not been implemented so far - we will do this after the BETA. We will use Lucene engine. To search inside document content you have to use 3rd party framework to extract the content of the files. Most probably we will provider  API that allows you to extract the content from the doc files. For other file types you could use some open source libraries as Apache PDFBox or iTextSharp.

    If you use PDFBox you can read the stream by using PDDocument.load(stream); and then call getText of  PDFTextStripper instance.

    You can use Apache PDFBox or iTextSharp with custom provider in 3.x editions as well.

    Greetings,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  5. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    10 Jan 2011
    Link to this post
    hello,
    can you please update me on this.... will I be able to search inside PDF with the realease of 14/01/2011?
    Thanks
  6. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    10 Jan 2011
    Link to this post
    Hello,

    In the official version of Sitefinity 4.0  we will not have PDF indexing. This will be implemented on a later stage, but we have not scheduled a time frame for the implementation yet.

    Kind regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  7. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    10 Jan 2011
    Link to this post
    and for what concern the DOC/DOCX File will it be avaiable? could I implement my own pdf search service and integrate it with sitefinity?
    Thanks
  8. Jean
    Jean avatar
    90 posts
    Registered:
    06 Nov 2008
    10 Jan 2011
    Link to this post
    Hi,

    Can you provide us a sample of implementing a custom search provider in SF 4.0. We have gone through extensive development for document indexing in SF3.7.

    Regards,
    Jean
  9. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    11 Jan 2011
    Link to this post
    Hi apollo,

    We do not have a sample that shows how to create a custom index. The index is based on pipes, so you can check this post.

    Regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  10. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    30 Mar 2011
    Link to this post
    Hello Ivan,
    when the 4.1 will be released? Thanks
  11. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    01 Apr 2011
    Link to this post
    Ivan where can I find a sample of using PagePipe to develop my custom index?
    Thanks
  12. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    04 Apr 2011
    Link to this post
    Hello Paolo,

    We made a sample that shows how to create a custom pipes and it will be included in the Q1 release scheduled in the middle of April.

    Best wishes,
    Ivan Dimitrov
    the Telerik team

  13. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    04 Apr 2011
    Link to this post
    Hello Ivan,
    since I need to end the first part of the project for the end of april is it possible to have it before?? I don't want to overflow the deadline for the search index...thanks
  14. Kalina
    Kalina avatar
    176 posts
    Registered:
    27 Oct 2016
    07 Apr 2011
    Link to this post
    Hi Paolo,
     
    The sample will be released the next week. We will post a lin to this forum post when we are done.
    We are sorry for not being to speed up our delivery. We are currently focused on the coming Q1 release next week and all our efforts go in this direction.

    I hope the suggested timing can work for  you.

    All the best,
    Kalina
    the Telerik team

  15. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    20 Apr 2011
    Link to this post
    Hello Telerik,
    can you please provide me a sample working on pipes for search??
    Thanks
    Paolo
  16. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    20 Apr 2011
    Link to this post
    Hi Paolo,

    We will have a sample with the SDK release that will be available next week.


    All the best,
    Ivan Dimitrov
    the Telerik team

  17. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    20 Apr 2011
    Link to this post
    Hello Telerik....
    can you just provide me some points to work on.... waiting for next week means I've got almost 4 days to develop my part of solution with search.....
    Thanks
  18. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    22 Apr 2011
    Link to this post
    Hi Paolo,

    The implementation is about 5 classes that are specific. I suggest that you should wait for the SDK release.

    Kind regards,
    Ivan Dimitrov
    the Telerik team

  19. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    29 Apr 2011
    Link to this post
    Hello Ivan,
    I've downloaded the SDK, can you please tell me which example I should look at?
    Thanks
    Paolo
  20. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    29 Apr 2011
    Link to this post
    Hello Paolo,

    We removed the index from the SDK, because we are going to change the publishing API this Q and the entire code of the pipe should be rewritten. If you want you can open a support request and I will send you the current implementation, but you should know that this code will not work once Q2 release is done and you will have to create your custom pipe again from scratch. There are currently know issues related to the pipes and customizing the index at this stage is a handy task.

    Kind regards,
    Ivan Dimitrov
    the Telerik team

  21. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    29 Apr 2011
    Link to this post
    hello Ivan,
    since I'm helping our customer that holds a license and I don't how can I open a ticket to you? can I send for the .NET suite specifing it's for you?
    Thanks
  22. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    03 May 2011
    Link to this post
    Hello Paolo,

    I will send a sample in the general feedback request you opened, but please keep in mind that this will not work in 4.2 release, since we are going to change the API for the indexing in order to improve it.

    Kind regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  23. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    04 May 2011
    Link to this post
    Hello Ivan,
    I've tried those two days to implement a sample but with no luck.... can you please tell me where should I start at? I've no idea what those pipes are used to...thanks
    Paolo
  24. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    09 May 2011
    Link to this post
    Hi Paolo,

    Please check this article

    http://www.sitefinity.com/40/help/developers-guide/t_telerik_sitefinity_publishing_pipes_publishingpipebase.html

    Best wishes,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  25. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    16 May 2011
    Link to this post
    Hello Ivan,
    I've tried but with no success... here' re my questions ... Need I to develop a customsearchprovider that inherits from LuceneSearchProvider? if so where do I tell to the search index to use this provider?
    I really need to develop a solution for the end of the week that looks inside the pdf, can you please tell me how can I do this beign aware that with 4.2 things will change?

    Thanks
    Paolo
  26. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    17 May 2011
    Link to this post
    Hello Ivan,
    your link doesn't leads me anywhere... where should I regiter a pipe?
  27. Ivan Dimitrov
    Ivan Dimitrov avatar
    16072 posts
    Registered:
    12 Sep 2017
    17 May 2011
    Link to this post
    Hello Paolo,

    You can try to extend the index pipe that I sent you. You need to include an external library that will get the item content. Another option is not using a custom pipe, but making a  hack into the publishing point. Bwlow is a sample code. So youneed an instace of the SearchIndex pipe and there you should call HandleItemAction where you need to pass an IEnumerable of your content objects.

    var pipesettings = PublishingManager.GetManager("Search")
                                                    .GetPublishingPoints()
                                                    .Where(pp => pp.Name == "MySearchPublishingPoint")
                                                    .ToList()
                                                    .First()
                                                    .PipeSettings.Where(ps => ps.PipeName == "SearchIndex")
                                                    .First();

                var pipe = PipeFactory.ResolvePipe2("SearchIndex").Initialize(pipesettings);
                pipe.HandleItemAction(new List<HandleActionArgs>() { new HandleActionArgs() { Item = new Content() } });

    Regards,
    Ivan Dimitrov
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  28. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    17 May 2011
    Link to this post
    Hello Ivan,
    thanks for your reply...I've tried extending your example... but when I add the Products file for having a working sample, I got no luck with :

    Could not find the specified key "ProductsLandingPageTitle" or class id "ProductsResources".

    in the alternative example you gave me where and how I break the publishing? For you that developed SF it's easy for me not! and how breaking the publishing of an item leads me to search for it?

  29. Paolo
    Paolo avatar
    147 posts
    Registered:
    11 Jun 2009
    17 May 2011
    Link to this post
    Please Ivan
    tell me how can I achieve this, I'm really in a hurry for this...thanks
  30. Boyan Barnev
    Boyan Barnev avatar
    1429 posts
    Registered:
    30 Oct 2017
    20 May 2011
    Link to this post
    Hello Paolo,

    I have replied to you in the support ticket you have opened - can you please verify if the Products module runs fine before implementing the custom pipe in it, so that we can be sure what might be the cause of this issue. It would also help if you could send over your implementation, so that we can give you a more focused response.

    Best wishes,
    Boyan Barnev
    the Telerik team
    Do you want to have your say in the Sitefinity development roadmap? Do you want to know when a feature you requested is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
30 posts, 0 answered