More in this section

Forums / Developing with Sitefinity / 404 errors from crawler stats (Part 2)

404 errors from crawler stats (Part 2)

8 posts, 0 answered
  1. Duncan Evans
    Duncan Evans avatar
    122 posts
    Registered:
    07 Jul 2009
    24 Sep 2009
    Link to this post
    I am doing some SEO for our website. I went to http://validator.w3.org/checklink and ran our home page because our SEO application was picking up tons of broken links (which on the surface looked fine)

    If you run http://www.therecoveryplace.net/home.aspx for broken links at w3.org you will see what i am talkign about.

    All out .sflb.ashx links are coming back 404 not found by these crawlers... eg:

    http://www.therecoveryplace.net/Libraries/Logos/logo_bluecrossshield.sflb.ashx

    Navigating to this link produces the image as expected. But can someone tell me why these crawlers see this as a 404 broken link? Whats going on here and how can i fix it?

    Duncan
  2. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    26 Sep 2009
    Link to this post
    Hi Duncan Evans,

    Thank you for posting your question. I am not sure why this is the case - I see that the link checker requests the sflb.ashx items with HEAD request, while the browsers do it with GET.
    I also think that it (the link checker) thinks that it will face a regular web page, while the server returns an item with Mime Type different than the one the checker is working with. By using HEAD, the server should return the headers only, and in this case the header is not text/xml or html but image/jpeg (example). 

    I will check further on Monday. 

    Sincerely yours,
    Georgi
    the Telerik team

    Instantly find answers to your questions on the new Telerik Support Portal.
    Watch a video on how to optimize your support resource searches and check out more tips on the blogs.
  3. Duncan Evans
    Duncan Evans avatar
    122 posts
    Registered:
    07 Jul 2009
    04 Oct 2009
    Link to this post
    Any update on this Georgi?
  4. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    07 Oct 2009
    Link to this post
    Hi Duncan Evans,

    Apologies for the late reply.

    Yes, I found the issue, at least I found why it is reproducible on my side.
    The "verbs" of handler. By default, if you are running in classic mode, the handlers are most probably set to "GET" verbs only. The browsers do GET so their request is processed, while the w3c validator does HEAD, which is not defined for the handler, and the server returns 404 (the request doesn't go for processing by the handler). 

    Please check the web.config file:
    integrated mode:
    <handlers> 
    .. 
    <add name="SitefinityLibraryAdd" path="*.sflb.ashx" verb="*" preCondition="integratedMode"  

    verb="*" will enable all verbs (Post, get, delete, HEAD..etc)

    classic mode:
     <httpHandlers> 
         <add verb="*" path="*.sflb.ashx" type="Telerik.Cms.Engine.ContentHttpHandler, Telerik.Cms.Engine" /> 

    I am almost sure that you are running the web site in classic mode, with verb set to GET.

    Let me know if I am correct. 

    Sincerely yours,
    Georgi
    the Telerik team

    Instantly find answers to your questions on the new Telerik Support Portal.
    Watch a video on how to optimize your support resource searches and check out more tips on the blogs.
  5. Duncan Evans
    Duncan Evans avatar
    122 posts
    Registered:
    07 Jul 2009
    07 Oct 2009
    Link to this post
    <handlers> 
    ... 
    <add name="SitefinityLibrary" path="*.sflb" verb="*" preCondition="integratedMode" type="Telerik.Cms.Engine.ContentHttpHandler, Telerik.Cms.Engine"/> 
    <httpHandlers> 
    ... 
    <add verb="GET" path="*.sflb" type="Telerik.Cms.Engine.ContentHttpHandler, Telerik.Cms.Engine"/> 
    <add verb="GET" path="*.sflb.ashx" type="Telerik.Cms.Engine.ContentHttpHandler, Telerik.Cms.Engine"/> 

    Looks like you might be right. Should i change both of these to *?

    Duncan
  6. Duncan Evans
    Duncan Evans avatar
    122 posts
    Registered:
    07 Jul 2009
    07 Oct 2009
    Link to this post
    PS: this is the default setting out of the box, maybe a good idea to default it to * for SEO purposes? Just a thought.
  7. Duncan Evans
    Duncan Evans avatar
    122 posts
    Registered:
    07 Jul 2009
    07 Oct 2009
    Link to this post
    That seems to do the trick... Now all i am getting is 404 errors for all the web resources:

    http://www.therecoveryplace.net/WebResource.axd?d=WufYt30cmyBwcArJZEKjmw2&t=633765184547189986

    Don't suppose there is a trick around these :)

    Duncan
  8. Georgi
    Georgi avatar
    3583 posts
    Registered:
    28 Oct 2016
    09 Oct 2009
    Link to this post
    Hello Duncan,

    You are quite right. We will make these changes to web.config file, so the handlers become active for HEAD as well. We will most probably make them active for any verbs, just like the ones in Integrated Pipeline mode. This will be good for the SEO, since some of the crawlers do HEAD first in order to check how old the items are, then proceed with the index (with GET). 

    As for the ScriptResource - the situation is the same :) :
    <add verb="GET,HEAD" path="ScriptResource.axd"

    We have updated your Telerik account with some points, thank you for bringing this!

    Regards,
    Georgi
    the Telerik team

    Instantly find answers to your questions on the new Telerik Support Portal.
    Watch a video on how to optimize your support resource searches and check out more tips on the blogs.
Register for webinar
8 posts, 0 answered