Graph database

In one of the side projects I am involved into in my free time that helps me learn a bunch of things that are not really in my normal workflow, I faced a problem of working with some big sets of data (millions of rows per table, multiple tables, bunch of relations). Trying old ways of improving performance like tuning DB engines (MySQL, MariaDB), optimising code to use low-level queries, multi-row inserts, tweaks on data models, etc, didn’t give the desired results, so I went out googling and discovered graph databases. Not something new in general as Graph Theory is well known, but the use case is pretty interesting.

Not that I am already deep into it, but feels like I will spend some time looking into the technology. First I got a hint on Neo4j somewhere in StackOverflow, but didn’t like something about it and went on further googling of the subj. Ended up at Top 15 Free Graph Databases I first stopped on OrientDB, installed it for the test and played around and while it looks very promising, I have couple of issues with it:

  • Java: it is a personal issue of me not being in love with Java at all. Installing JDK on the server to run DB is something I would do only in case it is absolutely required for my complete happiness. Not that I had any issue during testing, I still don’t trust it somehow internally
  • Poor documentation: you can see pretty extensive documentation on their web-site, but it is a bit hard to navigate and find things around and when you seek Google help for what you need, you mostly end-up on 404, so either old version links are in Google and are not on the site or something else weird is going on.
  • Driver interfaces (outside of Java world) are bad documented or/and bad implemented, at least for PHP. Both, official PHPOrient and Doctrine ODM show only small snippets of usage with no clear overview of what is possible (apart from basic things)

While there are cons above, there are obviously some pros, like almost native SQL, easy install (even involving Java), nice tool-set, etc.

After reviewing the list of 15 databases, the second choice was ArangoDB, which is:

  • Written in C++
  • Has very good and solid documentation with lots of examples (even comparisons for people who came from traditional SQL background)
  • Has lots of pre-build packages for different operating systems and YUM repo for RedHat followers
  • Convincing benchmark comparison of different DB engines and scenarios (not gonna state about truth, as benchmarks are always tricky, but who doesn’t like graphs?)

Still need to put my hands on, but I think this is some nice journey.

If you are in the topic, please leave your thoughts and ideas in the comments or send me them via any other possible path of communication to save time and effort :-)