+1-888-365-2779
Try Now
More in this section
Categories
Bloggers
Blogs RSS feed

How to: Exclude a Page from Sitefinity Internal Search

by Veselin Vasilev
As you probably know, in Sitefinity CMS it is easy to disable page indexing from external search crawlers (like Google bot, etc.) by unchecking the "Allow search engines to index this page" property. However, that page will still be indexed by the internal Sitefinity search engine and will appear in the list of search results on your web site.

Use the steps below to gain more control over what pages are indexed automatically by Sitefinity. 

1. In Visual Studio create a class that inherits the PageInboundPipe class from the Telerik.Sitefinity.Publishing.Pipes namespace. Override its LoadPageNodes method:

public class PagePipeNoIndex : PageInboundPipe
{
    protected override IEnumerable<PageNode> LoadPageNodes()
    {
        return base.LoadPageNodes().Where(n => this.CanProcessItem(n));
    }
 
    public override bool CanProcessItem(object item)
    {
        if (item == null)
            return false;
 
        if (item is PageData)
        {
            var pageData = item as PageData;
            if (pageData.NavigationNode.IsBackend)
            {
                return false;
            }
            if (!pageData.Crawlable)
            {
                return false;
            }
        }
 
        if (item is PageNode)
        {
            var pageNode = (PageNode)item;
 
            if (pageNode.IsBackend)
                return false;
 
            if ((pageNode.NodeType != NodeType.Standard && pageNode.NodeType != NodeType.External) || !pageNode.Page.Crawlable)
            {
                return false;
            }
        }
 
        return base.CanProcessItem(item);
    }       
}

This method is invoked every time Sitefinity needs to update its pages' search index (e.g. a new page is created or an old page is updated). It will check the value of the Crawlable property which corresponds to the status of the "Allow search engines to index this page" checkbox and will not add the item to the index if it is unchecked. 

2. Replace the internal page pipe with our custom pipe from above - this is done in Global.asax.cs file as follows:

public class Global : System.Web.HttpApplication
{
    protected void Application_Start(object sender, EventArgs e)
    {
        Bootstrapper.Initialized += Bootstrapper_Initialized;
    }
 
    void Bootstrapper_Initialized(object sender, Telerik.Sitefinity.Data.ExecutedEventArgs e)
    {
        if (e.CommandName == "Bootstrapped")
        {
            ReplacePagePipeWithCustomPagePipe();
        }
    }
 
    private void ReplacePagePipeWithCustomPagePipe()
    {
        //Remove the default page pipe
        PublishingSystemFactory.UnregisterPipe(PageInboundPipe.PipeName);
 
        //This code will add the PagePipeNoIndex to the registered pipes with the original page pipe name
        //so when the publishing system try's to use the page pipe will use the new one
        PublishingSystemFactory.RegisterPipe(PageInboundPipe.PipeName, typeof(PagePipeNoIndex));
    }
...
}

That's it, build the project and from now on if you uncheck the "Allow search engines to index this page" checkbox the page will be hidden from both the external and internal search crawlers.

To learn more about the Publishing system in Sitefinity CMS please check this blog post or the online documentation.

7 comments

Leave a comment
  1. John Giovine Jul 31, 2013
    Veselin,

    I have used your code in our site to prevent certain pages from being included in the search index.  It worked perfectly.
    Now, we are upgrading to SF 6.0 and I'm getting the error below after saving content.  Not sure how to fix it.  The error happens on the last return statement.   return base.CanProcessItem(item);

    Cannot infer manager type, because content type `Telerik.Sitefinity.Ecommerce.Catalog.Model.ProductUrlData` is not mapped to a manager type. You can do so in configuration, via the ManagerTypeAttribute, or in code subscribing to ManagerBase.NeedsManagerType.

    Here are the values of Item.
    -        item    {Telerik.Sitefinity.Publishing.WrapperObjectWithDataItemLoader}    object {Telerik.Sitefinity.Publishing.WrapperObjectWithDataItemLoader}
    +        base    {Telerik.Sitefinity.Publishing.WrapperObjectWithDataItemLoader}    Telerik.Sitefinity.Publishing.WrapperObject {Telerik.Sitefinity.Publishing.WrapperObjectWithDataItemLoader}
    +        ItemId    {eefbbb01-29a9-6bf2-86a2-ff0000e78017}    System.Guid
    +        ManagerType    null    System.Type
    +        parent    {Telerik.Sitefinity.Publishing.PublishingSystemEventInfo}    Telerik.Sitefinity.Publishing.PublishingSystemEventInfo
            ProviderName    "OpenAccessDataProvider"    string
            TransactionName    null    string
    +        WrappedObject    '(((Telerik.Sitefinity.Publishing.WrapperObjectWithDataItemLoader)(item))).WrappedObject' threw an exception of type 'System.NotSupportedException'    object {System.NotSupportedException}
    +        Static members        
    +        Dynamic View    Expanding the Dynamic View will get the dynamic members for the object    

  2. Vesselin Aug 09, 2013
    Hi John,

    have you followed the upgrade instructions?
    http://www.sitefinity.com/documentation/documentationarticles/installation-and-administration-guide/upgrade

    Looks like this error is caused by some problem during upgrading the ecommerce module and not to the search.
  3. Bert Dec 12, 2013
    Hi, is the PageInboundPipe only used for indexing then? Just to make sure this doesn't have any other consequences...
  4. STB Feb 05, 2014
    Is this still the best (only) way to exclude pages from search indexes on Sitefinity?  This isn't a viable alternative for us.  We need to exclude ALL pages from web crawler search, and exclude a SUBSET of pages from internal search.

    How id this possible?
  5. Vesselin Feb 06, 2014
    @STB: No, this is not the only way, it is just an illustration how you can achieve that. 

    Instead of using the Crawlable property, you can use a list of pages you want to exclude from the internal search.
    Alternatively, you can use custom attributes for Pages, so you might add an attribute "Include in internal search" and then based on the value you can either add or remove it from the search index. 
    More on custom attributes for pages here: http://www.youtube.com/watch?v=kQgj_f_Wbl8

  6. Charl Feb 18, 2014
    We have started a discussion on this feature and would like to invite you to share your feedback here: 

    http://www.sitefinity.com/developer-network/forums/sitefinity-preview/upcoming-features/exclude-pages-from-sitefinity-search

    We will be happy to hear your specific cases and scenarios in regards to excluding pages from SiteFinity search. 

    Thank you in advance.
    Charl









  7. Muhammed Abdusalam Sep 11, 2014

    Hi vesselin,

    It works fine, Thank you very much.

    Regards

    Muhammed Abdussalam

    Leave a comment