Skip to main content

Key-value stores from Redis point of view

This post was supposed to be about graph databases and key-value stores but it's going to be only about key-value stores because I got more interested in trying out Redis than Neo4J.

Redis

Redis is a key-value store that keeps it's database in memory but it also stores it's database on disk after a predefined time and number of changes in database. By default the values are like this:

  • 900 seconds and at least 1 change
  • 300 seconds and at least 10 changes
  • 60 seconds and at least 10 000 changes
More on Redis can be found at their website http://redis.io/ and if your interest to give it a quick try I suggest their online tutorial at http://try.redis.io/.

Important about querying

This is a important detail with key-value stores. In a key-value store the data can be searched only by the key. There are solutions that enable searching by the data, like lucene or solr, but that's a whole different search engine and not the actual key-value store.

It might appear strange or constraining but it just means that key-value stores dont suit everywhere and that the key must be chosen with care.

Key and values

Keys and values sounds simple and actually sounds pretty familiar to software developers. Key-value pair is basically a map, something like this in Java:
Map<String, String> myMap = new HashMap<String, String>();

In the value part one can store simple data like a name of a user "John Doe" or email address "johndoe@foobar.com" but these small bits of information aren't neccesarily enough. Another approach is to store json data that could be something like this:
{ "name":"John Doe", "email":"johndoe@foobar.com", "nick":"JD" }

With this kind of data structure it's possible to save all sorts of stuff but to do it so that the data is searchable the key has to be something meaningful. If the keys are just sequence of numbers like [1,2,3,4,5,6...] to search for "John Doe" from a database with thousands of key-value pairs it wouldn't be efficient as the keys would have to fetched and the data parsed until John is found. 
Let's pretend that the json data above is user data for a online service and users log in by their email address and a password. To choose something unique and searchable I would use the email address and to make it even more specific I would use a key that looked something like this:
"user:email:johndoe@foobar.com"

Now all we need to know is the email address (that we get in the login) and all the users data can be fetched with that.


Values as hash maps

This is something I really like about Redis, the value can be a map of values. Sounds a bit bizarre but is actually pretty simple once you get a hold of it.

Lets say I've created a simple blog platform and the blog posts are in this kind of structure where first is the key, post meaning this is a blog post, email of the user and a random uuid and as a value a json data set:
"post:johndoe@foobar.com:dsada23132" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"

As a new feature the platform gets a commenting option and I want the comments be under the same post key so that they can be fetched at the same time as the post but I don't want to put them in the same json data. The new data structure would be something like this:
"post:johndoe@foobar.com:dsada23132" "post" "{"title":"first post", "date":"20130101","text":"lorem ipsum...."}"
"post:johndoe@foobar.com:dsada23132" "comments" "[{"name":"Jane Doe", "date":"20130102","text":"Nice one!"}, {"name":"Jack Doe", "date":"20130102","text":"Boring..."}]"

With the field values post and comments I separated the data from each other but kept it under the same key.


Searching data

Data can be searched only by the keys so if we know the key we can search with it like in the simpler key-value data with the email address. In the blog example the searching could be done with part of the key. If we wanted to get all John's blog posts we would do a search like this: 
"post:johndoe@foobar.com:*"

And after that I could get all the data of a specific entry with a get all command:
"post:johndoe@foobar.com:dsada23132"


Or if I wanted to get just the post not the comments the search would have a field with it:
"post:johndoe@foobar.com:dsada23132" "post"

Summary

There's much more of key-value stores and Redis that I didn't mention here and it can all be found at their web site but these are the important bits of my post.
  • Searching only by the key
  • Choose the key with care
  • Data can be simple... or not
I've done some brief experimenting with Java and Redis and some of the results can be found under my gthub account https://github.com/jorilytter/redis-test

Popular posts from this blog

Simple code: Naming things

There are two hard things in programming and naming is one them. If you don't believe me ask Martin Fowler https://www.martinfowler.com/bliki/TwoHardThings.html . In this post I'll be covering some general conventions for naming things to improve readability and understandabilty of the code. There are lots of things that need a name in programming. Starting from higher abstractions to lower we need to name a project, API or library, we probably need to name the source code repository, when we get to the code we need to name our modules or packages, we give names to classes, objects, interfaces and in those we name our functions or methods and within those we name our variables. Overall a lot of things to name. TLDR; Basic rule There's a single basic convention to follow to achiveve better, more descriptive naming of things. Give it a meaningful name i.e. don't use shorthands like gen or single letter variables like a, x, z instead tell what it represents, what it does...

Simple code: Integration tests

Integration test is something that tests a functionality that is dependant on a external system e.g. a database, HTTP API or message queue. Integration vs unit tests The line is thin in my opinion. The integration part can be faked or a embedded services can be used in place of the actual integration point and with these solutions the interaction with the external system is bounded in the test context and the tests can be executed in isolation so they are very much like unit tests. The only difference with this type of integration test and unit test is that the startup time of the embedded or faked system usually takes some seconds and that adds total execution time of the tests. Even though the total test exection time is longer all the tests need to pass and all the cases need to be covered whether there's external systems involved or not so the importance is equal between the test types. This is why I wouldn't separate unit and integration tests from each other within the co...

Simple code: Simplicity

Simplest solutions are usually the best solutions. We as software developers work with hard problems and solve a lot of small problems every day. Solving a hard problem itself is a hard job. Though in my opinion it's not enough to solve a hard problem in any possible way but a hard problem should be solved with a simple solution. When a developer comes up with a simple solution to a hard problem then they can declare the problem solved. First a disclaimer. Coming up with a simple solution to a hard problems is itself a very hard problem and takes a lot of time, effort and practice. I've seen my share of "clever" solutions for hard problems and the problem with those is that usually the solution itself is so hard to understand that depending on the size of the problem it may take a developer from hours to days or even weeks to understand how that "clever" solution works. It's a rare occasion when a developer has come up with a simple solution to a hard pr...