+1-888-365-2779
Try Now
More in this section

Forums / 3.x Pre-release forums (retired) / Weird Database design in news module

Weird Database design in news module

7 posts, 0 answered
  1. fregas baratis
    fregas baratis avatar
    185 posts
    Registered:
    14 Nov 2002
    12 Mar 2007
    Link to this post

     

     

    All,

    I'm sorry but I am going to have to criticize.  I like a lot of stuff that i've seen in sitefinity so far, but I came across something just now that made me reconsider using it.  I noticed that the News module does not seem to have its own table.  Instead, it stores its data using the [sf_CmsContentBase], [sf_CmsTextContent] and [sf_CmsContentMetaData] tables.  Furthermore almost all the article's fields are stored as separate ROWS in the CmsContentMetaData table.  So Title becomes a row, summary becomes a row, Expiration becomes a row, etc.  This design has several implications:

     

    1.      For me, it makes automating a data import into this module a lot more complicated.  I have an old News Releases from an old classic asp site that I need to migrate over to the new Sitefinity 3.0 site.  I don’t want to have to enter them all in by hand.  The news release table in the old site is simple and straight forward.   But to plug this data into the Sitefinity tables, I cannot simply run a query or DTS package.  I will have to design something that will insert the data into all three tables, and worse, the fields such as Author, Expiration Date, etc. will each have to get inserted as separate rows with foreign key references back to the parent tables.

    2.      Reporting on modules that use this data structure will obviously be very complicated as the developer will have join on all the related tables to get all the relevant data, then merge the separate rows back into one row as separate fields—essentially doing a “pivot” like one does in Excel.  This is not trivial.

    3.      Performance could suffer quite a bit.  It appears that the News module is lazy loading all those relevant “meta data fields” such as Author, Expiration, Summary etc.  Lets say that your news module has 100 news items you want to display.  I know that’s a lot on one page, but its not a huge amount.  What Sitefinity is doing is running one query for the content of the news module (apparently just the html body) and then running a query for each news item it finds.  So in this example, it will run 100 additional queries due to lazy loading.  This is a common problem in OR Mapping and can be solved by doing pre-fetching (I think Nolics calls them “Groups” but I could be wrong) but mostly, this lazy loading is unnecessary—Author, Expiration, Summary should all be fields in one “News” table—not in a separate one.  You’re creating a one-to-many relationship for these values where really only a one-to-one is needed.  These should all be mapped to one table and one class in the News module engine thus avoiding the need to lazy load.

     

    I have also seen many places in various tables where one column SEEMS to be storing multiple values separated by semi-colons.  This is a major database “no-no” as it violates Third Normal Form.  Multiple values should not be stored in one column.  I suspect that the weird Meta-data approach on the News module above also violates 3NF as well. 


    I realize that there is more than one way to skin a cat but by not structuring your data in a standard way, you are losing much of the benefits of using a Relational DBMS like SQL Server!  Having a well-normalized data structure is essential.   As I mentioned above, Querying, automation, integration, reporting, all become more complicated when you violate these basic relational database principles.
     

    As I said, I have been mostly happy with the improvements in Sitefinity 3.0, and you’re company has been very responsive and courteous to us developers.  But this data structure really scares me.  I am hoping that this is an exception rather than the rule in how Sitefinity and its modules are built.  I don’t mean to be harsh, but you are really painting yourselves (and us as developers) into a corner when you build tables in this manner.  I don’t know how much is Sitefinity and how much is the Nolics O/R Mapper.  I’m all for using ORM’s to simplify data access and object mapping, but they shouldn’t cause you to create a sub-standard database design.

    Sorry to be so blunt. I appreciate your team's openness to developer input.  You have always been responsive when we've had a question or comment. Hopefully i will be heard and this will be changed in future releases.

    Thank you,

    Craig

  2. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    13 Mar 2007
    Link to this post
    Hi Craig,

    I agree with you, dynamic data structures will always be slower compared to the traditional design for relational databases but let’s face it, retrieving data form the database is just a fraction of the time needed to render a page. Besides with support for output caching, performance should not be a concern at all.

    It is also true it will be much more difficult to use DTS but why would you? It will take less than ten lines of code to import or export data using the Content Manager. Yes SQL’s built-in reporting will not do much good but most contemporary reporting tools support object data sources.

    With the upcoming technologies like LINQ and Lambda Expressions, object databases are advancing quickly and everyday’s increasing computer power dynamism seems more important to me than saving a few milliseconds per request. This design allows fields to be added and removed dynamically by simply declaring them in the web.confg.

    Of course, if you are not happy with this approach, it is not that difficult and it is not that much work to create your own Content provider that will handle data storage differently. All you need to do is inherit ContentProviderBase and implement about twenty methods for querying and storage.

    We may supply different data providers in the future. For example, providers specifically designed to handle millions of records, but this is not a priority for now as the current provider implementation is just perfect for most real live applications.

    Best wishes,
    Bob
    the telerik team
  3. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    14 Mar 2007
    Link to this post
    Craig,

    I do hope my previous post didn’t sound too self-assertive. As I come back to it now, I think in my attempt to explain the motives for our approach, I missed to point out how important your feedback is to us all. It’s really crucial for us to hear all your opinions and comments. All criticism is good criticism for us, as it gives new perspective and helps us make our next choices. And our ultimate choice is to really meet your demands.

    Thanks again,
    Bob
    the telerik team
  4. fregas baratis
    fregas baratis avatar
    185 posts
    Registered:
    14 Nov 2002
    15 Mar 2007
    Link to this post

    Bob,

     

    I appreciate you coming back and saying that.  Your company has been very open to our input.  However, I respectfully disagree with almost everything you've said in your original post.  Below are my responses:

    “I agree with you, dynamic data structures will always be slower compared to the traditional design for relational databases but let’s face it, retrieving data form the database is just a fraction of the time needed to render a page. Besides with support for output caching, performance should not be a concern at all.”

    I think you’re dead wrong here.  I would argue that the GREAT MAJORITY of performance issues in any application (desktop and web) are with remote calls over the wire, such as remote procedures, web services, etc.  This includes database calls, which is basically the most common remote call that web pages make.  Excluding bandwidth issues due to lots of html, images or large viewstate, rendering a page on the server takes very little time.  However, retrieving database records over and over can kill performance.  We’re not talking milliseconds here…we’re talking seconds or even minutes.  Note that this may not have anything to do with how long it takes for the database to process the query--that may in fact take very little time.  It has to do with the time it takes to return those records over the wire from the database server to the application because its asking for the data over and over instead of in one batch.

    Many of the books on O/R Mapping will tell you that one of the major performance killers are many unneeded lazy loads or other data centric problems with retrieval.  This matches my own experience  A few extra lazy loads or database calls, even 10 or 20 on a page will not hurt.  But a hundred or a thousand will.  Note, that i'm not against using O/R Mapping or lazy load features within them--they need to be used judiciously.  And in  your case, you’re lazy loading child records that should be fields in the parent.

    “Yes SQL’s built-in reporting will not do much good but most contemporary reporting tools support object data sources.”

    Can you mention some reporting services that will support calling .NET objects without a lot of code wrangling? I’m not being aggressive here…I really would like to know.  I’ve mostly used sql reporting service and I don’t think it will natively use a business object or collection as a datasource but i could be wrong. 

    But lets assume it does.  Lets say a year from now my client has 5000 records of some sort, whether News, or Blogs or some other module that is using the meta data structure you currently have in News.  Lets say they want a report with some simple aggregation (sums, averages) on this data.  If I run the report by tapping into the Nolics business objects I will not be able to use built in Sum or Avg function in SQL Server because i'm calling your objects to get my data.  I will instead have to load the entire resultset to get the totals.  When i get each object, it will then have to lazy load to get all its related fields such as Title, Summary, etc to retrieve the field is that i want a count, sum or average on.  This could potentially mean 5000 queries on one report, each call going back and forth over the wire to the sql server (a call out to send the sql or stored proc name, and  data coming back in), and each query returning 5-10 records of its own, for a total of 25,000 -50,000 records.  This is going to run abysmally slow.  The same thing could happen on the pages that are using the module. 

    If all the fields were instead in one "News" table or "Blog" table, 5000 records (or objects, if the report package supports it) could be returned in one database call (one query.)  Maybe not lightening fast, but acceptable performance given the amount of data.  My point is that the database should be doing the work in these set-based and batch operations, because that’s what its designed to do.  C# and most 3rd party reporting software is not designed to do this work…they’re relying on the database.

    But even after i get all the data, you have not addressed the problem that my report will then have to take all the rows and pivot them as i mentioned in my earlier post.  I will have to take each batch of 5-10 rows (that really represent fields such as Title, summary, ExpirationDate, etc) and flip them to display them left to right on the report.  I’m sure that this can be done somehow, but its not exactly simple.

    It is also true it will be much more difficult to use DTS but why would you? It will take less than ten lines of code to import or export data using the Content Manager.”

    I WOULD like to look at the content import feature and the ability to change the News items data provider.  Can you point me in the right direction?  I agree your data structure will work fine for a small amount of news items.  But if I have to import thousands or hundred thousands of records into one of your modules, DTS or SSIS will be much, much faster than using a content importer written in C#.  Also, most of us developer types are used to working with normalized data with things like SQL, SSIS, and DTS.

    “With the upcoming technologies like LINQ and Lambda Expressions, object databases are advancing quickly and everyday’s increasing computer power dynamism seems more important to me than saving a few milliseconds per request. This design allows fields to be added and removed dynamically by simply declaring them in the web.confg.”

     You mentioned LINQ and LAMBDA expressions.  I'm confused about what Lambda expressions have to do with this discussion since these are just anonymous delegates and don't really affect database access.  Again, I’m not trying to be mean, I just really don’t know how these are related…could just be my ignorance.

    I can understand as you said that you did this for flexibility…I looked in the web.config and I can see where you are adding the columns, and indeed it is very flexible.  But for large projects or larger amounts of data it will cause problems.  Also, it seems to be that you are basically recreating in the web.config and in your framework what the database itself was designed for!  The relational databases were designed to hold things like “Title”, “Summary”, “DateExpired” AS Fields, not rows.  One of the developers at my company argued that your team is well on their way to the “inner platform” anti-pattern if you keep creating modules this way:

    http://fucoder.com/2007/02/the-inner-platform-effect/

    http://worsethanfailure.com/Articles/The_Inner-Platform_Effect.aspx

     

    I think you were also referring to my comment about multiple values being stored in a column delimited by semi-colons.  LINQ may have some better ways of calling "unique" data structures but its not out yet, and also it will still rely on a great degree on a properly normalized relational database.  Its supposed to ease querying and mapping from objects to a relational database not “fix” a non standard relational database design.  Again, columns were designed to hold ONE value, not multiples.  You should split those semi-colon delimited values into separate columns or rows (I can’t say which because I don’t understand fully what you’re using them for.)  What you’re doing in the database would be like taking a C# class and creating a field and storing a comma or semi-colon delimited string instead of just using an array or collection. 

    Again, I’m sorry for the harshness and length of this post.  As you can probably tell, I’m very passionate about proper database design.  I have both read and worked for companies that ignore good design and have seen them pay for it.  I enjoy your company’s openness and I see a lot of great improvements in Sitefinity.  I just think that you are ignoring very basic database design principles.

    Thanks,

    Craig


  5. Bob
    Bob avatar
    330 posts
    Registered:
    24 Sep 2012
    15 Mar 2007
    Link to this post
    Hello Craig,

    I crated a quick implementation of a data provider for the News module that stores data fully normalized. Here is a link to the solution with the source code for the new provider and a test web site demonstrating both providers: http://www.sitefinity.com/sf3/StaticNewsProvider.zip .

    Next week I will post a project with performance tests on both approaches and then I will answer your comments.

    By the way, where did you see multiple values separated by semi-colons within a single column? We do not use this approach although I wouldn’t stigmatize it. It is quite widely used though.

    Sincerely yours,
    Bob
    the telerik team
  6. fregas baratis
    fregas baratis avatar
    185 posts
    Registered:
    14 Nov 2002
    15 Mar 2007
    Link to this post
    Thank you so much.  I'll download that and take a look.  I'd like to emphasize that your approach will work for small amounts of data just fine.   However, its not just a performance issue--its the extra complexity in reinventing something that the DB was designed to do in a very reasonable and logical way, and the tools surrounding it (SQL, reporting, DTS, SSIS, etc). 

    I should have also mentioned that there are two distinct issues here.  One is the design of the Articles data structure where fields are turned into rows.  The other was the fact that there was repeated lazy loads going on with every iteration of a news article on the listing page.  I think Nolics can avoid the lazy loads by doing some sort of "group" or "pre-fetch" where it gets the related meta-data in one query.  So that should be a very easy optimization to make later on.  The repeated lazy load is a very common and understandable problem with any kind of database design where you need related data all in one batch.

    As far as the semi-colon delimited approach, i wasn't 100% sure thats what you guys are doing.  I was looking at Sql Query profiler to see what queries were running and it appears that you're returning the semi-colon results thru your stored procs (or Nolics') but I now i can't find any semi-colon data in actual table columns so that was a good discovery.  Its probably something you or Nolics is just returning hardcoded to the application--but not actual data.  No big deal.

    Sadly, bad database design such as putting delimited fields in a colunn IS very common.  Common does not equal the same thing as "right" or "good" unfortunately.  Database design principles such as normalization and the normal forms (1nf,2nf,etc) exist for good reason.  They also coincide with the D.R.Y. principle which is applicable to IT in general.
  7. fregas baratis
    fregas baratis avatar
    185 posts
    Registered:
    14 Nov 2002
    15 Mar 2007
    Link to this post
    Not to add insult to injury, but i posted in this forum that the News list is currently not sortable:
    http://www.sitefinity.com/support/forums/support-forum-thread/b1043S-tekgc.aspx

    I could be wrong, but i SUSPECT that your database design will make it somewhat more difficult to sort these news items.  First, you will have to do an inner join on the meta data table.  Then you'll have to figure out which row has the field you want to sort on.  then you'll also have to find the right column that holds the value: DatetimeValue, IntegerValue, etc.

    Unless Nolics is doing this for you, it will be a much more complicated query than:

    Select *
    from news
    order by DatePublished







Register for webinar
7 posts, 0 answered