Exploring the world of software development, technology, data science, and life

Big Data at Strange Loop

Ok, time to finally review the talks I attended at last month’s Strange Loop conference in St Louis.  The last two weeks were a tad bit busy (a bit more on that later), so this post was delayed a bit.  But lets start with the sessions on big data, starting with the first keynote of the conference.  Data was a common theme at the conference and was one of the conference’s dedicated tracks.  That shouldn’t be much of a surprise to anyone following the software industry these days, as the need to analyze huge amounts of data is becoming more essential for businesses.  So Erik Meijer (architect at Microsoft) kicked things off with his talk, “Category Theory, Monads, and Duality in (Big) Data.”  I can’t find a link to the slides of the talk, but here is the paper it is based on.

Even though the title contained references to category theory and monads, you didn’t need a PhD in Mathematics to get what Erik Meijer was getting at.  And that was a very good thing, since his talk was very useful.  The essence of it was a comparison between traditional table-based SQL databases and the new breed of so-called NoSQL object databases, specifically that they are not as different as we tend to think.  In fact, he proposes replacing the ambiguous term NoSQL with CoSQL, to show how the two have a mathematical duality between them.  Basically in table based databases, you have entities (each of which can stand on their own) using foreign-primary key relationships to point to their parents.  Meanwhile in object based CoSQL databases, you have parent entities pointing to their children, who really have no context outside of their parents.

It was a really interesting talk, and not just because it had a lot of abstract math in it (I guess that may be an odd phrase for most of the world to hear).  He finished with a plea for developers to make their design decisions not on emotion or on what appears hot today, but on which design better modeled their data.  Both have advantages.  While object based CoSQL databases are more open and composable and tend to horizontally scalable, the rigidity of table based SQL databases offer plenty of advantages as well, when the problem domain calls for it.

Leave a Reply

Your email address will not be published. Required fields are marked *