Thursday, 13 October 2011

First look at Apache Cassandra

I just did the 10000m review of Cassandra, and it is impressive.
The fact that it scales natively is great.
What I don't understand is why they say it is so different from table-based databases.
Sure, it is non-relational, but so what, you can get used to that.

OK, I am actually used to the AppEngine Datastore, that helps a lot; but still, for being so "non table" it is very table like.

I'm going to give you the quick overview of how i visualize the cassandra structure.

- Keyspace
Think of them as "databases" in MySQL
- Column Family
Works almost like a table in MySQL
- Super Column
Works almost like a row in MySQL
- Column
Like a field in MySQL

Now the cool stuff starts arriving.
- Super Column Family
This is like a table, but every field is a new table. You can add a super column family in Cassandra instead of a column family, then have it contain a number of column families.
- Keys
In cassandra everything is about keys. you can not retrieve anything without a key. Every (super) column family has a name, and every entry has an index (that is a name too); in essence the columns has a key too (names).
So if I had a user named CVi (Me) stored in the database as a user it would be located in the Column Family "users", under the key "CVi". And it would have a number of Columns like email, phone number, real name, etc.
{
"users":{
    "CVi":{
        "name":"Christoffer Viken",
        "email":"..."
        "Phone number":"..."
    }
}
}
In MySQL terms, think of a table, with one entry, Primary key on that entry is "CVi"; now remember that the only way to access that row is by using the primary key.
Now to really mess things up, let us introduce the Super Column Family. now the data looks something like this:
{
"users":{
    "CVi":{
        "Profile":{
             "name":"Christoffer Viken",
             "email":"..."
        },
        "Contact":{
            "email":"Contact Email"
        },
        "logon":{
            "password":"jdKTyNXN9ah$nyqpWaTvMWnTeS2z5ThYxqQAyuZ$W29HhnDdBtw7HPvHSA6aCMvR",
            "twitter":"90989364"
        }
    }
}
}
Super Column Family: users
Column Family: CVi
Super Columns: profile, contact, logon
Columns(profile): Name, email

Now remember this is schemaless. There is no structure what so ever. You need to enforce that yourself. That is what is the hardest thing to wrap your head around.

Anyway, I'm having fun playing and I'm looking forward to to playing some more, i'll check back to you later.
Post a Comment