NoSQL and Data Management – More on MarkLogic
It has been a couple of years since I last caught up with MarkLogic (click here for 2014 event blog post) and their brand of NoSQL database so given some of my colleagues recent work with NoSQL databases I thought I would have a catch up at their event in London this week. Many of the big themes have evolved but remain broadly consistent, and in particular on many data warehousing projects there were three main points they were making:
- New input source fields – should not require expensive ETL reworks
- New user queries – should not require database re-indexing
- Higher workloads – should not require expensive new hardware
As with a lot of NoSQL-related companies then MarkLogic paints relational databases as legacy technology. Firstly there was some use of the old (but in application-specific cases valid) argument of the object databases that unfolding all that relational schema to rehydrate your entity object is expensive and a more direct mapping from business object to application object is both easy to understand and quicker to code. Put another way it is easier and faster to build applications if the business data you need in your app looks like the data you see in the database. This argument formed the basis behind MarkLogic’s pitch about their document database capabilities.
There was a lot of criticism of the relational data model, its rigidity and how systems integration leads to ever more complicated schemas, which in turn leads to the creation of more and ever-larger data silos (silos loomed large, see event photo above!). On a related note (no pun intended) they then described how the relationships in a relational database were hard to understand and lacked meaning. Which then led into their pitch around the triplestore capabilities of MarkLogic and the benefits of a semantic representation of data to reduce data integration effort and increase the meaning of data within databases. Semantics have long been a source of (ironic?) misunderstanding for me (see past post on this) but certainly something of interest given the potential industry adoption of the EDM Council’s FIBO. Anyway I am not all of the way there (some ideas take my brain a long time to assimilate!) but they at least showed some reasonable examples of the semantic querying and inference of relationships between data.
In summary one of the main points about MarkLogic seems to be that they conform with the standards of document databases (JSON, XML etc) and the standards of semantic database (RDF triples etc) but offer a powerful and non-standard combination of the two. So they cover both a flexible way of representing data plus a flexible way of representing relationships between data, done in a database that can scale-out as workload rises. They continued to do a downer on open-source (“what version do you have installed” aimed mainly at MongoDB) and pushed their own ACID compliance (again aimed at MongoDB from what I can tell).
I thought that the dismissal of relational databases was slightly over-cooked – the relational data model is a proven solution to known-schema relational problems and relational databases are not standing still either. Speaking with some of the users at the event then the MarkLogic sweet spot seemed to be having an extremely flexible, scalable database that could cope well with both structured and unstructured data types together within a single platform. I have heard some question marks on ease of use of the product, but it would seem that the next release is looking to provide more sophisticated database management tools so maybe that is being addressed. Overall a good day from what I saw of it – shame I couldn’t stay for drinks but the lunch was good if that was any indication!