+1-888-365-2779
Try Now
More in this section

Forums / Bugs & Issues / Search Indexes TOO Much

Search Indexes TOO Much

63 posts, 0 answered
  1. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    17 Oct 2007
    Link to this post
    Hi,

    Thanks so much for getting search working.  It's great to have.  i was, however, a little surprised by how you implemented a couple of things.  The biggest thing is that it seems to actually spider the built pages (instead of just reading the data stored in the database).  This has some advantages (like reading custom WebUserControls as rendered) but also has at least one big problem.

    The biggest problem is that if someone searches for something that is in menu, it returns EVERY PAGE!  For example, on my site, a lot of people will be searching for CampusTrakker or Web Hosting and even though both of those are very different things, they both return every page on the site.

    One idea I had for getting around indexing things in the template would be for you to try to recognize template info and links vs content links (as Google Must.  If you search Google for Trakkware Pricing it only returns my pricing page, not every page even though the word Pricing is on every page).  Now of course I realize you are not Google, and don't have their resources, but I'm wondering if you can either look for template data, or as an easier (and maybe temporary solution) have an exclude tag.

    I guess what I'm thinking is something like
    <html> 
    <head> 
        <header info....> 
    </head> 
    <body> 
    <!-- BEGIN_IGNORE_FOR_SITEFINITY_SEARCH --> 
        <template data, menus, etc......> 
    <!-- END_IGNORE_FOR_SITEFINITY_SEARCH --> 
          
        regular content, editable regions, etc....  
     
    <!-- BEGIN_IGNORE_FOR_SITEFINITY_SEARCH --> 
        <more template data, mainly ending tags.....> 
    <!-- END_IGNORE_FOR_SITEFINITY_SEARCH --> 
    </body> 
    </html> 

    The nice thing about something like that is people don't have to use it, but if they are having trouble, they could just drop a couple of begin and end ignore tags on their master page and they'd be set.

    Does that make any sense?  Does that seem like it would be a good thing to do?  Would other people be interested in having a feature like that? Would that be something doable by SP1?

    Ben
  2. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    17 Oct 2007
    Link to this post
    Hello Ben Alexandra,

    You will have full control of what is indexed and what not in v3.2. Unfortunately, we will not be able to make it for SP1.

    Our approach is a bit different. You will be able to specify filter fields for each index catalogue and specify weight for them. The weight will be used to set the rank of the page within the result set. So weight with negative value means no index. You will be able to filter by tags and attributes. Here is an example of such configuration:
    <?xml version="1.0" encoding="utf-8"?>   
    <Fields>   
        <field name="text" filterTag="body" filterAttrebutes="" weight="1.0" indexAttrebute="" />   
        <field name="keywords" filterTag="meta" filterAttrebutes="name:keywords" weight="2.0" indexAttrebute="content" />   
        <field name="description" filterTag="meta" filterAttrebutes="name:description" weight="1.5" indexAttrebute="content" />   
        <field name="header" filterTag="h1" filterAttrebutes="" weight="1.6" /> 
        <field name="noindex" filterTag="div" filterAttrebutes="class:noindex" weight="-1" /> 
    </Fields>  

    Of course, we will provide user interface for these settings in the Search/Index section.

    Furthermore, we will provide a way for controls to be able to determine whether the current request is from a crawler so you can decide what to render. For example, the navigation controls will render empty string by default in this case.

    Do you think this is flexible enough? Let me know what you think.

    Best wishes,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  3. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    18 Oct 2007
    Link to this post
    Hi Bob,

    I think that would work well, but it's a long way off.  Are there any temporary work-arounds?  I find it pretty much useless at this point, unfortunately.  It sounds like your solution is good, but I need to decide if I'm going to remove Search functionality from my sites (all recent sites have been built with a search box in the template based on it being available for 3.1).

    I know my suggestion about Ignore is pretty hackey, but it  seems like an easy temporary solution to the problem of too much being indexed.  Or at least having the option of ignoring RadMenus and RadPanelbars.

    Thanks.  Just let me know what you decide and if there are any temporary fixes I can do for now.

    Thanks a ton!

    Ben

    PS Attribute is spelled with an i, not an e ;)
  4. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    18 Oct 2007
    Link to this post
    Hello Ben,

    Thank you for correcting my spelling. I should take another English course:)

    Most of the functionality will be available in SP1. Menus and PanelBars will handle that. Keywords, Title and Description will be predefined and you will be able to set weight for them but you will not be able to add your own tags and therefore specify areas that will not be indexed.

    Greetings,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  5. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    18 Oct 2007
    Link to this post
    Hi Bob,

    No, your english is fantastic.  I only mention it because it's source code and we wouldn't want any typos in Sitefinity's source code, right? ;)

    OK, so I'm confused.  I thought you said: "You will have full control of what is indexed and what not in v3.2. Unfortunately, we will not be able to make it for SP1."  Now you're saying "Most of the functionality will be available in SP1."

    I guess I'm wondering if you're going to have initial functionality in SP1, then more full features in 3.2 or did you decide to move up the functionality from 3.2 to SP1?

    I guess the main question is, will the big bug (too much being indexed) be resolved by SP1?  Even if it's not perfect and not ranking (which sounds cool), will I at least get decent results?  What functionality will you provide in SP1, what will be added to 3.2?

    Thanks so much!  Keep up the great work!

    Ben

    PS Is there a tentative date for 3.2?  Sitefinity kicks ass!  And with every version it kicks more ass and kicks it harder, so of course I'm anxious for each new release!
  6. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    18 Oct 2007
    Link to this post
    Hi Ben,

    Thank you for the nice words.

    So, we are going to have initial functionality in v3.1 SP1 and full features in v3.2. The big bug (too much to index) will be solved. What you won’t be able to do in SP1 is define your own areas to affect the page ranking.

    A release date for version 3.2 has not been set yet but it will be in January. We pushed it little back as we were about two weeks late with 3.1. Soon we will publish the road map for 3.2 and 4.0. Some very exciting features are on the lurk.
     
    Sincerely yours,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  7. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    18 Oct 2007
    Link to this post
    That sounds perfect.  Yeah, as long as it's at least workable, that's great.  I can wait till Jan or Feb for full features.  I understand SP1 will be out in a week or so, is that right? 

    The features for 3.2 sound sweet.  I can't even imagine what you're cooking up for 4.0 ;)

    Thanks

    Ben

    PS Is there an API for search?
  8. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    18 Oct 2007
    Link to this post
    Hi Ben,

    The service pack should be out next week.

    Yes there is API for search. In fact, you can provide your own sources for indexing. I hope that we will be able to provide examples soon.

    Sincerely yours,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  9. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    27 Feb 2008
    Link to this post
    Hi,

    Is it setup to only search certain regions yet?  Do you have documentation on how to set that up?

    http://cms.newcenturybank.com/search/index.aspx?IndexCatalogue=web&SearchQuery=community

    Also, have you looked at the issue of duplicate pages showing up, due to multiple page addresses.  It seems it should only find the Primary page, not the other Urls, no?  Look at the link above, you'll see it returns /index/search.aspx and /search/index.aspx which are the same page.

    Thanks

    Ben
  10. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    04 Mar 2008
    Link to this post
    Hi Ben Alexandra,

    Unfortunately none of these issues could make it to this release. A lot of changes and optimizations have started on this front and I hope we will be able to finish them for the SP1 of v3.2.

    I’m sorry for not being able to help this time.

    Kind regards,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  11. Brook
    Brook avatar
    39 posts
    Registered:
    21 Mar 2007
    18 Mar 2008
    Link to this post
    Any informaton on what may or may not make it into 3.2 SP1 in regards to the search engine?

    Thanks in Advance...
  12. Ben Alexandra
    Ben Alexandra avatar
    215 posts
    Registered:
    15 Sep 2012
    18 Mar 2008
    Link to this post
    Yes, we're anxiously awaiting, as the current model really doesn't work.  The 2 biggest problems being 1) Duplicate Page Urls being returned (should be just based on the sitemap, not on all Urls for a page) and 2) Extraneous words in template (not content of pages) being returned, so if you search for something that exists in the template, such as Products, EVERY page is returned, and each page is returned multiple times due to the first problem.

    If you solve those 2 problems, I think ti'll work really well.  Hopefully creating sections to be ignored in the template will be easy.

    Thanks a lot!

    Ben
  13. Nikifor
    Nikifor avatar
    232 posts
    Registered:
    18 May 2013
    19 Mar 2008
    Link to this post
    Hi Ben, Brook

    We are working hard on improving the Search Module for SP1. We found several other issues with the search functionality as well as possible fields for improvement, which will all be included in Sitefinity's 3.2 Service Pack 1.

    We apologize for any caused inconvenience.

    All the best,
    the Telerik team



    Instantly find answers to your questions at the new Telerik Support Center
  14. Paul Dain
    Paul Dain avatar
    7 posts
    Registered:
    15 Mar 2006
    28 Apr 2008
    Link to this post
    It looks like this was actually implemented in SP1 -- can you verify?
  15. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    29 Apr 2008
    Link to this post
    Hello Paul Dain,

    The issue with duplicating the search results is fixed for SP1. There are still issues with the search engine that we work on, and they will be fixed for the next service release (in May).

    Greetings,
    Georgi
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  16. Paul Dain
    Paul Dain avatar
    7 posts
    Registered:
    15 Mar 2006
    29 Apr 2008
    Link to this post
    Thanks for the info.

    We have been trying the fieldsInfoProvider.xml, but there does not appear to be any documentation for it. Specifically, what attributes/values are allowed and what are the various effects. Is this something you can provide?

    Thanks,

    - Paul
  17. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    30 Apr 2008
    Link to this post
    Hi Paul Dain,

    Sure, we can provide that information.

    This is something like a new feature, introduced in Service Pack 1. This file is used by the search engine, for better handling of the content, while indexing. Here is an example code found in that file :
    <fields> 
      <field name="title" weight="1"  
       indexAttribute="" filterTag="title" filterAttributes="" /> 
      <field name="keywords" weight="1"  
       indexAttribute="content" filterTag="meta"
       
    filterAttributes="name:keywords;" /> 
      <field name="description" weight="1"  
       indexAttribute="content" filterTag="meta"
       
    filterAttributes="name:description;" /> 
    </fields> 

    As we can see, here are 3 different fields - title, keywords and description. These are also the meta tags we can find in every html page. Every field has a weight property. The search engine spiders through the pages, indexing the content and giving weight of different part of the content of the page. This weight depends on the values set in this file. Later, when you search for something, the results are sorted based on that weight. The results (pages) where you have your "search term" with higher weight, are first in the list. The search engine also respects the repeat ratio of the search term in the pages.

    This way you can exclude certain content from a page from indexing, or give higher priority to the keywords of a page, even if the keywords are not listed (because the keyword field is a meta tag). By certain content I mean that you can even exclude content within given tag or within tag with specified class.

    Excluding, for example, the indexing of the title tag of the page would look like this :

    <field name="title" weight="-1"  
     indexAttribute="" filterTag="title" filterAttributes="" />  

    Please note, that weight property has a negative attribute. Every field with such weight will not appear in the search results at all.

    It is true that our documentation lacks this information. We will definitely work to change this fact, and provide a full information on that file.

    All the best,
    Georgi
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  18. SelAromDotNet
    SelAromDotNet avatar
    912 posts
    Registered:
    18 Jul 2012
    30 Apr 2008
    Link to this post
    I really love the way this was implemented, it is very intuitive (once you figure it out of course). you can restrict any element from being indexed, not just meta or whatever, you just have to set the appropriate fields.

    for example, my search results were bringing up every single page because certain terms are in the navigation menu. so I set the search to exclude everything in the radmenu div using its class and another menu's id so that they are not indexed. now only my actual page content is scanned and searched. just set to filter tag to div, and the select the filterAttributes with a type:name syntax:

    <field name="navigation" weight="-1" indexAttribute="" filterTag="div" filterAttributes="id:navigation" />

    <field name="header" weight="-1" indexAttribute="" filterTag="div" filterAttributes="class:RadMenu" />


    very cool! I would assume that you could also ADD pages to your search by specifying divs to add special weight if a certain div is present like 

    <field name="header" weight="5" indexAttribute="" filterTag="div" filterAttributes="class:importantstuff" />


    so that these results come up first. not too shabby. man i love sitefinity more every day!

  19. Brook
    Brook avatar
    39 posts
    Registered:
    21 Mar 2007
    30 Apr 2008
    Link to this post
    This is great news, I wish there was a better way to communicate when these new features are implemented.  Perhaps there could be a section in the clients section by functionality, Search, blogs etc... in which the development team could post announcements of  new or changed features and those would link to the documentation ?
  20. Ivan
    Ivan avatar
    478 posts
    Registered:
    16 Jun 2015
    03 May 2008
    Link to this post
    Hi Brook,

    we are putting a lot of tought and effort into making the communication process (as well as communication infrastructure) better, faster and more accurate. We fully understand that you need this kind of information in order to be able to plan your own activities and the work on your projects.

    I just wanted to let you know that this need has been recognized and we are already taking steps in improving this area. Thank you for all your great inputs and the patience you've demonstrated.

    Greetings,
    Ivan
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  21. Zubair
    Zubair avatar
    142 posts
    Registered:
    26 Dec 2007
    15 May 2008
    Link to this post
    Ok, I came here looking for a solution to remove Title of any pages from being indexed, looks like I've found it here and I'm going to implement (and come back and post issues if any)

    But now I'll just second the opinion of Brook, this is what I've also noticed that alot of improvements/bug fixes go undocumented and nobody knows what's available for us out-of-the-box in a new version or service pack (like this one), so please please do put a section where you make all the announcements and I'd say put a link to it on the homepage. Thanks.
  22. Zubair
    Zubair avatar
    142 posts
    Registered:
    26 Dec 2007
    15 May 2008
    Link to this post
    I have been facing this problem for sometime and even more now because I need to exclude Title from search.

    I noticed that after running doing the index once, I cannot do it again and it gives me the following error.

    Could not find file '....\Index\segments'.

    Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.

    Exception Details: System.IO.FileNotFoundException: Could not find file '.............\Index\segments'.

    Source Error:

    An unhandled exception was generated during the execution of the current web request. Information regarding the origin and location of the exception can be identified using the exception stack trace below.

    Stack Trace:

    [FileNotFoundException: Could not find file 'D:\Web\DIC.website\App_Data\Search\DubaiInternetCity\Index\segments'.]
       System.IO.__Error.WinIOError(Int32 errorCode, String maybeFullPath) +1971213
       System.IO.FileStream.Init(String path, FileMode mode, FileAccess access, Int32 rights, Boolean useRights, FileShare share, Int32 bufferSize, FileOptions options, SECURITY_ATTRIBUTES secAttrs, String msgPath, Boolean bFromProxy) +998
       System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share) +114
       Lucene.Net.Store.FSIndexInput..ctor(FileInfo path) +70
       Lucene.Net.Store.FSDirectory.OpenInput(String name) +66
       Lucene.Net.Index.SegmentInfos.Read(Directory directory) +44
       Lucene.Net.Index.AnonymousClassWith.DoBody() +40
       Lucene.Net.Store.With.Run() +56
       Lucene.Net.Index.IndexReader.Open(Directory directory, Boolean closeDirectory) +102
       Telerik.Search.Engine.SearchManager.GetIndexingStatistics(String Provider) +144
       Telerik.Search.WebControls.Admin.ControlPanel.Indexes_ItemDataBound(Object sender, RepeaterItemEventArgs e) +

    Also when I set 'title='-1' in fieldsInfoProvider.xml I notice that some page with the search keyword in the content area don't appear in the results and when I try to index it again I get the above error, previously I was able to recover from the above error by deleting the Search folder under App_Data, now I can't do that even. 

    Please tell me what's going on. Thanks
  23. SelAromDotNet
    SelAromDotNet avatar
    912 posts
    Registered:
    18 Jul 2012
    15 May 2008
    Link to this post
    hmm, I seem to not have this down after all. I thought maybe it was just that I had indexed everything in full, so I deleted the index and created a new one. unfortunately this too is indexing the whole page including the navigation. I've included my filter below, can you tell me if I've done anything wrong?

    <?xml version="1.0" encoding="utf-8"?>  
    <fields> 
        <field name="title" weight="3" indexAttribute="content" filterTag="title" filterAttributes="" /> 
        <field name="keywords" weight="2" indexAttribute="content" filterTag="meta" filterAttributes="name:keywords" /> 
        <field name="description" weight="1" filterTag="meta" filterAttributes="name:description" indexAttribute="" /> 
        <field name="navigation" weight="-1" filterTag="div" filterAttributes="class:nav" indexAttribute="" /> 
        <field name="header" weight="-1" filterTag="div" filterAttributes="class:topnav" indexAttribute="" /> 
    </fields> 
  24. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    17 May 2008
    Link to this post
    Hello Zubair,

    Unfortunately a few bugs ware discovered with improperly locked or deleted index files. To fix your problem you have to delete the entire index folder (~/AppData/Search/[Index Name]) and then reindex the site. This issue has been fixed for v3.2 SP2.

    Josh,

    Your filter is correct and should work just fine. Note that the filter does not work for versions previous to v3.2 SP1 although the file is present in them. If you are using the latest version and you still have this problem, could you please send us your project to be examined? Also there is an alternative way to prevent navigation from indexing. Please consider the code below:
        protected override void Render(HtmlTextWriter writer)  
        {  
            // Checks if this is called by the Search Indexer and does not render anything if so.  
            // Navigation controls are present in every page and should NOT be indexed multiple times.  
            if (!CmsContext.IsCrawlerRequest)  
                base.Render(writer);  
        } 
    Please take a look at the implementation of the navigational controls in ~/Sitefinity/UserControls/Navigation/.

    The second approach will work for previous versions as well.

    There are a lot of improvements and bug fixes in Search for SP2 so stay tuned.


    All the best,
    Bob
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  25. Zubair
    Zubair avatar
    142 posts
    Registered:
    26 Dec 2007
    18 May 2008
    Link to this post
    hi,

    I'm facing some problems with Search and Search paging.

    I've set to show 6 PostsPerPage, but I noticed that if I get more than 6 results, sometime repeater only shows 5 results on a page and skips the <AlternateItem> template for the 6th page and shows the 6th result on next page, this happens on all pages, however reindexing the site solves the issue.

    • I also noticed that sometimes if I search for something and get 23 results where I'm only showing 6 posts per page, so the total page count returned is 4 which is fine, now here's the problem, the 2nd page only shows me 4 results -  on page 1 and 3 I see 6 results and 5 on page 4, so where did my 2 results go ?
    • Another issue is sometime the description of some of the pages is not shown and this is happening randomly for some of the pages.

    I think there's alot of issues with the search and I'm hoping that they're addressed in the SP2.

    (I can send you a Url to test this issues, please post your email)

    Please let me know what's going wrong. Thanks

  26. Nikifor
    Nikifor avatar
    232 posts
    Registered:
    18 May 2013
    19 May 2008
    Link to this post
    Hello Zubair,

    Unfortunately, we could not manage to reproduce the reported behavior using Sitefinity's Search Results control. We tried the same scenario with 23 search results - 6 per page, with no effect. Can you please provide us with a link where we can see the exact issue and troubleshoot it further. Also, please elaborate on the <AlternateItem> property and where exactly you are setting it.

    Thank you for your cooperation in advance.

    Best wishes,
    Nikifor
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
  27. SelAromDotNet
    SelAromDotNet avatar
    912 posts
    Registered:
    18 Jul 2012
    19 May 2008
    Link to this post
    i am using the latest version, and stil getting everything indexed, however your workaround works PERFECTLY. This will work for now until I can have some time to troubleshoot the filters, maybe the new sp will clear it up

    thanks!
  28. Zubair
    Zubair avatar
    142 posts
    Registered:
    26 Dec 2007
    20 May 2008
    Link to this post
    Thanks Nikifor, please provide me your email or shall I send it to support@telerik.com ?
  29. Zubair
    Zubair avatar
    142 posts
    Registered:
    26 Dec 2007
    20 May 2008
    Link to this post
    hi Nikifor,

    I've just sent an email to support@telerik.com with the details of the issue.
  30. Nikifor
    Nikifor avatar
    232 posts
    Registered:
    18 May 2013
    20 May 2008
    Link to this post
    Hi Zubair,

    Thank you for providing the information. We will get on this and as soon as we have any result we will update this forum thread.

    Thank you for your time.

    Greetings,
    Nikifor
    the Telerik team

    Instantly find answers to your questions at the new Telerik Support Center
Register for webinar
63 posts, 0 answered
1 2 3