This post was supposed to be about graph databases and key-value stores but it's going to be only about key-value stores because I got more interested in trying out Redis than Neo4J.
In the value part one can store simple data like a name of a user "John Doe" or email address "johndoe@foobar.com" but these small bits of information aren't neccesarily enough. Another approach is to store json data that could be something like this:
{ "name":"John Doe", "email":"johndoe@foobar.com", "nick":"JD" }
With this kind of data structure it's possible to save all sorts of stuff but to do it so that the data is searchable the key has to be something meaningful. If the keys are just sequence of numbers like [1,2,3,4,5,6...] to search for "John Doe" from a database with thousands of key-value pairs it wouldn't be efficient as the keys would have to fetched and the data parsed until John is found.
Let's pretend that the json data above is user data for a online service and users log in by their email address and a password. To choose something unique and searchable I would use the email address and to make it even more specific I would use a key that looked something like this:
"user:email:johndoe@foobar.com"
Now all we need to know is the email address (that we get in the login) and all the users data can be fetched with that.
Lets say I've created a simple blog platform and the blog posts are in this kind of structure where first is the key, post meaning this is a blog post, email of the user and a random uuid and as a value a json data set:
"post:johndoe@foobar.com:dsada23132" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"
As a new feature the platform gets a commenting option and I want the comments be under the same post key so that they can be fetched at the same time as the post but I don't want to put them in the same json data. The new data structure would be something like this:
"post:johndoe@foobar.com:dsada23132" "post" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"
"post:johndoe@foobar.com:dsada23132" "comments" "[{"name":"Jane Doe", "date":"20130102","text":"Nice one!"}, {"name":"Jack Doe", "date":"20130102","text":"Boring..."}]"
With the field values post and comments I separated the data from each other but kept it under the same key.
"post:johndoe@foobar.com:*"
And after that I could get all the data of a specific entry with a get all command:
"post:johndoe@foobar.com:dsada23132"
Or if I wanted to get just the post not the comments the search would have a field with it:
"post:johndoe@foobar.com:dsada23132" "post"
Redis
Redis is a key-value store that keeps it's database in memory but it also stores it's database on disk after a predefined time and number of changes in database. By default the values are like this:- 900 seconds and at least 1 change
- 300 seconds and at least 10 changes
- 60 seconds and at least 10 000 changes
More on Redis can be found at their website http://redis.io/ and if your interest to give it a quick try I suggest their online tutorial at http://try.redis.io/.
Important about querying
This is a important detail with key-value stores. In a key-value store the data can be searched only by the key. There are solutions that enable searching by the data, like lucene or solr, but that's a whole different search engine and not the actual key-value store.
It might appear strange or constraining but it just means that key-value stores dont suit everywhere and that the key must be chosen with care.
Key and values
Keys and values sounds simple and actually sounds pretty familiar to software developers. Key-value pair is basically a map, something like this in Java:
Map<String, String> myMap = new HashMap<String, String>();
Map<String, String> myMap = new HashMap<String, String>();
{ "name":"John Doe", "email":"johndoe@foobar.com", "nick":"JD" }
With this kind of data structure it's possible to save all sorts of stuff but to do it so that the data is searchable the key has to be something meaningful. If the keys are just sequence of numbers like [1,2,3,4,5,6...] to search for "John Doe" from a database with thousands of key-value pairs it wouldn't be efficient as the keys would have to fetched and the data parsed until John is found.
Let's pretend that the json data above is user data for a online service and users log in by their email address and a password. To choose something unique and searchable I would use the email address and to make it even more specific I would use a key that looked something like this:
"user:email:johndoe@foobar.com"
Now all we need to know is the email address (that we get in the login) and all the users data can be fetched with that.
Values as hash maps
This is something I really like about Redis, the value can be a map of values. Sounds a bit bizarre but is actually pretty simple once you get a hold of it.Lets say I've created a simple blog platform and the blog posts are in this kind of structure where first is the key, post meaning this is a blog post, email of the user and a random uuid and as a value a json data set:
"post:johndoe@foobar.com:dsada23132" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"
As a new feature the platform gets a commenting option and I want the comments be under the same post key so that they can be fetched at the same time as the post but I don't want to put them in the same json data. The new data structure would be something like this:
"post:johndoe@foobar.com:dsada23132" "post" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"
"post:johndoe@foobar.com:dsada23132" "comments" "[{"name":"Jane Doe", "date":"20130102","text":"Nice one!"}, {"name":"Jack Doe", "date":"20130102","text":"Boring..."}]"
With the field values post and comments I separated the data from each other but kept it under the same key.
Searching data
Data can be searched only by the keys so if we know the key we can search with it like in the simpler key-value data with the email address. In the blog example the searching could be done with part of the key. If we wanted to get all John's blog posts we would do a search like this:"post:johndoe@foobar.com:*"
And after that I could get all the data of a specific entry with a get all command:
"post:johndoe@foobar.com:dsada23132"
Or if I wanted to get just the post not the comments the search would have a field with it:
"post:johndoe@foobar.com:dsada23132" "post"
Summary
There's much more of key-value stores and Redis that I didn't mention here and it can all be found at their web site but these are the important bits of my post.
- Searching only by the key
- Choose the key with care
- Data can be simple... or not
I've done some brief experimenting with Java and Redis and some of the results can be found under my gthub account https://github.com/jorilytter/redis-test