Skip to content
Alexander Holbreich
Go back

Introduction to the data model of cassandra db

My last post was about Cassandra Set Up. Current article discusses Cassandras data model and objects. In essence Cassandra is a hybrid between a key-value and a column-oriented NoSQL databases. Key value nature is represented by a row object, in which value would be generally organized in columns. In short, cassandra knows following objects

Keyspace

Keyspaces are easy to understand, they are a first level collection to other objects. Every model begins by keyspace.

Rows and Columns

Cassandra organizes data in columns and rows of these. Rows are accumulated in collection object called column family.

A similarity to SQL Tables is noticeable here. Looking at columns we see that all of them have implicit external given timestamp (“ts”). Further we see that there is no rigid obligations for rows in a same colum family to have the same set of columns and column types. Also there is no obligation to provide a value for a column, it could be just name (and timestamp). Moreover cassandra allows to specify additional aspects per column, things like TTL. But it’s not so interesting for understanding a model generally.

Super Column

As we see such super column is a combination of simple columns with one single name. Such inclusion provides additional abstraction and access level. That actually also adds unnecessary complexity.

Hence super columns are not longer favoured. Nowadays it is recommended to manipulate C* data model by CQL and to use composite keys instead of super columns (more on this in the next tutorial).

Column families

As a typical NoSQL database, Cassandra does not enforce relationships between column families the way that relational databases do between tables. Therefore Apache Cassandra has no definition of foreign keys. Each column family has a self-contained set of columns that are intended to be accessed together to satisfy queries of your application. In addition there is not rigid schema, hence don’t think of column family as of some sort of relation tables, it’s better to think of them as structures like

Map<RowKey, SortedMap<ColumnKey, ColumnValue>>

and in case of super columun family as:

Map<RowKey, SortedMap<SuperColumnKey, SortedMap<ColumnKey, ColumnValue>>>

Data Types

And of course there are predefined data types in cassandra, in which

You can assign predefined data types when you create your column family  (which is recommended), but Cassandra does not require it. Internally Cassandra stores column names and values as hex byte arrays (BytesType). This is the default client encoding.

Following table shows built-in Cassandra types:

Indexes

The understanding of Indexes in Cassandra is requisite. There are two kinds of them.

Primary index determines cluster-wide row distribution. Secondary indexes is very important for custom queries. Cassandra’s native index is like a hashed index and has limitation on range queries.

Let me know if you would like to read more on the topic.


Share this post on:

Previous Post
Start using Docker
Next Post
Installing Apache Cassandra