+1-888-365-2779
Try Now
More in this section

Forums / Developing with Sitefinity / Custom Search Index

Custom Search Index

12 posts, 0 answered
  1. Fernando
    Fernando avatar
    31 posts
    Registered:
    08 Jun 2009
    05 Aug 2010
    Link to this post
    Hi,
    I'm developing an index for the sitefinity Libraries, that will index the name, decription and other information about the library items. I have found a webinar talking about that and it was very helpful. I created my own IndexInfo, IndexProvider, IndexSettings and IndexViewControl and all they seem to be working fine. But when I try to index the files on the Libraries, some results are not included on the Index. For example, on the method GetContentToIndex() on my IndexProvider I return an array of IIndexerInfo with 200 items, but just a few items are indexed.
    Is there a reason for that, return some items but some are not indexed, or is it a bug?

    Thanks
  2. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    09 Aug 2010
    Link to this post
    Hi Fernando,

    Thank you for using our services.

    We do not have reports of possible bugs that would affect the index. After all library items are IContent objects - the same as News, Blogs, etc. Could you please check if your index breaks some where? Of if you are indexing meta data, could you check if the items not indexed actually do not have the respective metafields set?

    Sincerely yours,
    Radoslav Georgiev
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  3. Fernando
    Fernando avatar
    31 posts
    Registered:
    08 Jun 2009
    24 Sep 2010
    Link to this post
    Hi Radoslav,
    I was away working on other things and that is why I delayed so much to continue this topic, but I still have problems here.

    I will post my classes here then detail my problem to let it as clear as possible.
    I'm indexing metadata as you can see in the GetData method in my IndexerInfo:

    public class FileIndexerInfo : IIndexerInfo
        {
            public long Size { get; set; }
            public string Name { get; set; }
            public string Extension { get; set; }
            public string Author { get; set; }
            public string Description { get; set; }
            public string Category { get; set; }
     
            #region IIndexerInfo Members
     
            public string Culture {
                get { return fCulture; }
                set { fCulture = value; }
            }
            private string fCulture;
     
            public Encoding Encoding {
                get { return fEncoding; }
                set { fEncoding = value; }
            }
            private Encoding fEncoding;
     
            public byte[] GetData() {
                 return Encoding.GetBytes(string.Format("<author>{0}</author><size>{1}</size><title>{2}</title><extension>{3}</extension><description>{4}</description><category>{5}</category>",
                     System.Web.HttpContext.Current.Server.HtmlEncode(Author),
                     System.Web.HttpContext.Current.Server.HtmlEncode(Size.ToString()),
                     System.Web.HttpContext.Current.Server.HtmlEncode(Name),
                     System.Web.HttpContext.Current.Server.HtmlEncode(Extension),
                     System.Web.HttpContext.Current.Server.HtmlEncode(Description),
                     System.Web.HttpContext.Current.Server.HtmlEncode(Category)));
            }
     
            public Guid ItemID {
                get { return Guid.Empty; }
                set { fItemID = value; }
            }
            private Guid fItemID;
     
            public string MimeType {
                get { return fMimeType; }
                set { fMimeType = value; }
            }
            private string fMimeType;
     
            public string Path {
                get { return fPath; }
                set { fPath = value; }
            }
            private string fPath;
     
            public string ResolveIndexPath() {
                return Path;
            }
     
            #endregion
        }

    It seems everything right with my metadata as per my tests, but if you see something wrong it might be the problem.

    This is my IndexProvider:

    public class FileIndexProvider : IIndexingServiceClient
        {
            #region IIndexingServiceClient Members
     
            public string Description {
                get { return "This is the index of all files on the Sitefinity's libraries"; }
            }
     
            public IIndexerInfo[] GetContentToIndex() {
                LibraryManager manager = new LibraryManager();
                IList list = manager.GetContent();
                List<FileIndexerInfo> info = new List<FileIndexerInfo>();
     
                foreach (CmsContentBase content in list) {
                    FileIndexerInfo fileInfo = new FileIndexerInfo();
                    fileInfo.Culture = CultureInfo.CurrentCulture.Name;
                    fileInfo.Encoding = Encoding.Default;
                    fileInfo.ItemID = content.ID;
                    fileInfo.MimeType = content.MimeType;
                    fileInfo.Path = content.UrlWithExtension;
                    fileInfo.Name = content.GetMetaData("Name").ToString();
                    fileInfo.Size = (long)content.GetMetaData("Size");
                    fileInfo.Extension = content.GetMetaData("Extension").ToString();
                    fileInfo.Author = content.GetMetaData("Author").ToString();
                    fileInfo.Description = content.GetMetaData("Description").ToString();
                    fileInfo.Category = content.GetMetaData("Category").ToString();
                    info.Add(fileInfo);
                }
                return info.ToArray();
            }
     
            public string[] GetUrlsToIndex() {
                return new string[0];
            }
     
            public event EventHandler<IndexEventArgs> Index;
     
            public void Initialize(IDictionary<string, string> settings) {
            }
     
            public string Name {
                get { return "FileIndex"; }
            }
     
            #endregion
        }

    All I do here is to return what I want to index. On the method GetContentToIndex() I populate one List of IIndexerInfo with items corresponding to each file of my Library.

    I don't think these classes could cause my problems, but I'll post them too:

    My IndexSettingsControl:

    public class FileIndexSettings : CompositeControl, ISettingsControl
        {
            #region ISettingsControl Members
     
            public IDictionary<string, string> GetSettings() {
                return settings;
            }
     
            public void InitSettings(IDictionary<string, string> indexSettings) {
                settings = indexSettings;
            }
     
            private IDictionary<string, string> settings;
     
            #endregion
     
            protected override void Render(HtmlTextWriter writer) {
                writer.Write("<div><strong>FileIndex</strong></div>");
            }
        }

    And my IndexViewControl:

    public class FileIndexViewControl : CompositeControl, ISearchViewControl
        {
            #region ISearchViewControl Members
     
            public void InitializeSettings(IDictionary<string, string> indexSettings) {
            }
     
            #endregion
     
            protected override void Render(HtmlTextWriter writer) {
                writer.Write("<div><strong>FileIndex</strong></div>");
            }
        }

    I have described one of my problems (neither all items are indexed, I return an array with x items and just y items are indexed (where x > y)).
    But that is a problem that happens sometimes when I'm able to index. Depending on what I try to index I receive messages like these:

    doc counts differ for segment _0: fieldsReader shows 5 but segmentInfo shows 10

    or

    Could not find file '[mypath]\App_Data\Search\Testing-docs\Index\_j.cfs'.

    Testing these classes in lot of ways I realized some things. If I don't try to index all my files I might not receive error messages like the two above. For example, if in my FileIndexProvider I change my loop to be as follow I don't receive the errors:

    int i = 0;
    foreach (CmsContentBase content in list) {
        if (i++ >= 18) break;
        FileIndexerInfo fileInfo = new FileIndexerInfo();
        fileInfo.Culture = CultureInfo.CurrentCulture.Name;
        fileInfo.Encoding = Encoding.Default;
        fileInfo.ItemID = content.ID;
        fileInfo.MimeType = content.MimeType;
        fileInfo.Path = content.UrlWithExtension;
        fileInfo.Name = content.GetMetaData("Name").ToString();
        fileInfo.Size = (long)content.GetMetaData("Size");
        fileInfo.Extension = content.GetMetaData("Extension").ToString();
        fileInfo.Author = content.GetMetaData("Author").ToString();
        fileInfo.Description = content.GetMetaData("Description").ToString();
        fileInfo.Category = content.GetMetaData("Category").ToString();
        info.Add(fileInfo);
    }

    In these case I send an array with 18 items, but just 10 are indexed. But that is not the point at this moment. The problem here is that if I change that 18 to a 19 I receive the former error message (doc counts differ for segment _0: fieldsReader shows 5 but segmentInfo shows 10).

    I wanted to see if that was caused by the quantity of items, so I changed again my loop:

    for (int i = list.Count - 1; i >= 345; i--) {
        CmsContentBase content = (CmsContentBase)list[i];
        FileIndexerInfo fileInfo = new FileIndexerInfo();
        fileInfo.Culture = CultureInfo.CurrentCulture.Name;
        fileInfo.Encoding = Encoding.Default;
        fileInfo.ItemID = content.ID;
        fileInfo.MimeType = content.MimeType;
        fileInfo.Path = content.UrlWithExtension;
        fileInfo.Name = content.GetMetaData("Name").ToString();
        fileInfo.Size = (long)content.GetMetaData("Size");
        fileInfo.Extension = content.GetMetaData("Extension").ToString();
        fileInfo.Author = content.GetMetaData("Author").ToString();
        fileInfo.Description = content.GetMetaData("Description").ToString();
        fileInfo.Category = content.GetMetaData("Category").ToString();
        info.Add(fileInfo);
    }

    This time I'm iterating my List from the ending to the begining. Doing that I can return an array with 197 items and them all are indexed here. But if I change the 345 to a 344 I start to receive the two errors messages I wrote about.

    I really don't understand why this is happening (the errors and the fact of not indexing some items), but I've tested in other websites with more files (the one I described has about 540) on the Libraries and others with less files and these problems are always present.
    Do you know what could be the reasons and how could I resolve the problems?

    Thanks,
    Fernando Melo
  4. Fernando
    Fernando avatar
    31 posts
    Registered:
    08 Jun 2009
    27 Sep 2010
    Link to this post
    I still not know why my problem happens, but now I know what causes them.
    When I use MimeType on my FileIndexProvider (fileInfo.MimeType = ""; instead of fileInfo.MimeType = content.MimeType;) the erros stop and I can index all my files.
  5. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    27 Sep 2010
    Link to this post
    Hello Fernando,

    Thank you for getting back to me.

    Could you please try the following - put the code in the foreach loop in a try catch statement to capture the exception for the doc counts. Then when the exception is caught see what is the particular content item that is being indexed, see if there is a problem with that particular item, is it always this item (or set of items). Try to create a list containing this item and run it through the indexer to see if it will cause it to break. It seems that there is some pattern with the mime types of times breaking the indexer.

    Best wishes,
    Radoslav Georgiev
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  6. Coskun SUNALI
    Coskun SUNALI avatar
    7 posts
    Registered:
    16 Aug 2007
    05 Oct 2010
    Link to this post
    Hi Radoslav,

    I do agree with what Fernando says here. A custom file indexer being used to index files does not work fine as long as we return the correct mimeType within the indexer info class.

    I might add some other information on top of what Fernando said. As he already mentioned, returning an empty string for the mimeType helps but also returning the same value of mimeType works too. E.g.: "application/pdf".

    I tested that by uploading only PDF files to a library and indexing it and it worked fine but when I uploaded a Word document (application/vnd.openxmlformats-officedocument.wordprocessingml.document), the indexer stopped working. And then I deleted all the PDF files and only left the Word document and things started working again.

    It looks like the indexing client or manager (I am not sure about the exact term you use for it) stops working when it has to index different types of files with different mimeType values.

    Looking forward to hear from you.

    Best regards,
    Coskun Sunali
  7. Radoslav Georgiev
    Radoslav Georgiev avatar
    3370 posts
    Registered:
    01 Feb 2016
    05 Oct 2010
    Link to this post
    Hello Coskun SUNALI,

    Indeed there seems to be a problem with the mime types. The mime type of the IIndexerInfo objects is not used by the indexer when it constructs the indexes. So it is save to simply set the mime type to empty string or just remove it from this loop:
    foreach (CmsContentBase content in list)
    {
        FileIndexerInfo fileInfo = new FileIndexerInfo();
        fileInfo.Culture = CultureInfo.CurrentCulture.Name;
        fileInfo.Encoding = Encoding.Default;
        fileInfo.ItemID = content.ID;
        fileInfo.MimeType = content.MimeType;
        fileInfo.Path = content.UrlWithExtension;
        fileInfo.Name = content.GetMetaData("Name").ToString();
        fileInfo.Size = (long)content.GetMetaData("Size");
        fileInfo.Extension = content.GetMetaData("Extension").ToString();
        fileInfo.Author = content.GetMetaData("Author").ToString();
        fileInfo.Description = content.GetMetaData("Description").ToString();
        fileInfo.Category = content.GetMetaData("Category").ToString();
        info.Add(fileInfo);
    }

    If you need to include the documents mime type in the index you should take an approach as the item's meta fields.
     
    Kind regards,
    Radoslav Georgiev
    the Telerik team
    Do you want to have your say when we set our development plans? Do you want to know when a feature you care about is added or when a bug fixed? Explore the Telerik Public Issue Tracking system and vote to affect the priority of the items
  8. Coskun SUNALI
    Coskun SUNALI avatar
    7 posts
    Registered:
    16 Aug 2007
    05 Oct 2010
    Link to this post
    Thank you for the clarification.

    Best regards,
    Coskun Sunali
  9. David Martinez
    David Martinez avatar
    13 posts
    Registered:
    24 May 2010
    29 Nov 2010
    Link to this post
    hi Fernando, could you help me with some tips for implementing a search custom provider?, I reviewed your post, I see that your search is about Sitefinity files, in my case it would on a alternative database, any help I thank you (if you speak Spanish would be excellent)
  10. Fernando
    Fernando avatar
    31 posts
    Registered:
    08 Jun 2009
    29 Nov 2010
    Link to this post
    I don't speak spanish, but maybe I can help you, depending on what you wanna know. Describe your problem/doubts.
    If that helps, I speak Portuguese*
  11. David Martinez
    David Martinez avatar
    13 posts
    Registered:
    24 May 2010
    29 Nov 2010
    Link to this post
    thanks, I see in your post you implement a custom provider for Sitefinity files, I require that the search is not in files or pages of the site, it should be in database, my question is in which methods and classes should I call to database, i think in the GetData method of the class inherits from IIndexerInfo, in my case (CustomIIndexerInfo), is that correct?
    thanks again
  12. Fernando
    Fernando avatar
    31 posts
    Registered:
    08 Jun 2009
    30 Nov 2010
    Link to this post
    You can use the same basic structure as I did. All you have to change is the method GetContentToIndex() on your IIndexingServiceClient.

    There you will select all your data and create a lot of IIndexerInfo. The GetData() is very important, but I think it shouldn't access nothing, the best way is  it using its own data.
Register for webinar
12 posts, 0 answered