Categories
Bloggers
Blogs RSS feed

Accent Insensitive Search in Sitefinity

by Stefani Tacheva
One way to implement an accent insensitive search is to replace the default analyzer used by Lucene in Sitefinity with one that replaces accented characters with the corresponding unaccented ones. Fortunately Lucene provides this functionality out of the box with the ASCIIFoldingFilter class. Below you could find an example of how to implement an analyzer that applies this filter.

To make Lucene use your custom analyzer in Sitefinity you need to register it in the ObjectFactory. This way it will be used both during indexing and during search. This would mean that the search index would store all characters with accents removed and during search all accents will also be removed.

Altogether you will need the following code in your Global.asax.cs:

using System;
using Telerik.Microsoft.Practices.Unity;
using Telerik.Sitefinity.Abstractions;
using Telerik.Sitefinity.Data;
using Telerik.Sitefinity.Utilities.Lucene.Net.Analysis;
using Telerik.Sitefinity.Utilities.Lucene.Net.Analysis.Standard;
   
namespace SitefinityWebApp
{
    public class Global : System.Web.HttpApplication
    {
        protected void Application_Start(object sender, EventArgs e)
        {
            Bootstrapper.Initialized += this.Bootstrapper_Initialized;
        }
   
        private void Bootstrapper_Initialized(object sender, ExecutedEventArgs e)
        {
            if (e.CommandName != "Bootstrapped")
                return;
   
            ObjectFactory.Container.RegisterType<Analyzer, AccentInsensitiveAnalyzer>(
                new ContainerControlledLifetimeManager(),
                new InjectionConstructor(new InjectionParameter<string[]>(null)));
        }
    }
   
    public class AccentInsensitiveAnalyzer : StandardAnalyzer
    {
        public AccentInsensitiveAnalyzer(string[] stopWords)
            : base(stopWords)
        {
        }
   
        public override TokenStream TokenStream(string fieldName, System.IO.TextReader reader)
        {
            TokenStream stream = new StandardTokenizer(reader);
            stream = new StandardFilter(stream);
            stream = new ASCIIFoldingFilter(stream);
            return stream;
        }
    }
}

Furthermore, we have a feature request added in our Feedback portal for having this feature available out of the box with Sitefinity. You could review its description on the following URL and vote for its popularity:

http://feedback.telerik.com/Project/153/Feedback/Details/99130-to-be-able-to-perform-insensitive-accent-search-in-sitefinity

As a result:

6 comments

Leave a comment
  1. Emmanuel May 27, 2014

    Hi Stefani,

    Very useful post. Is it possible to set up case insensitive search as well?

    Thank you.

  2. Mark May 27, 2014
    Emmanuel, lucene also has a LowerCaseFilter. I immagine you could add stream = new LowerCaseFilter(stream); directly below the other filter.
  3. Stefani Tacheva May 28, 2014

    Hi Emmanuel,

    As Mark said you need to use the LowerCaseFilter.

    Regards,

    Stefani Tacheva

  4. Emmanuel May 28, 2014

    Thank you both for your answers. 

    It works.

    Regards,

    Emmanuel

  5. Siva Kumar K Dec 05, 2014
    So the document should be saved in ASCII right?
  6. Siva Kumar K Dec 05, 2014
    So the document should be saved in ASCII right?

    Leave a comment