Web Development: Getting Started With Cassandra: CQL Data Types and Using GoCQL

Getting Started With Cassandra: CQL Data Types and Using GoCQL

In the first part of this tutorial series, I covered the very basics of Cassandra and used CQLSH to communicate with the database system via shell. In this second part, I will cover in brief the major datatypes available in CQL. Then I will cover the essentials of gocql, a Golang client package which implements the Cassandra driver for Golang. I will cover how to create a session connection with Cassandra with some configuration options and then how to run various queries using the session.

Cassandra provides support for basic datatypes which are available in almost all database systems. Apart from this, it also provides for complex collection types which can store combinations of simple data in the form of list, set, and map. Apart from this, CQL also has support for user-defined types, allowing developers to have their own datatypes which are easy to read and understand.

Basic Data Types

ascii: Represents a string of ASCII characters. Insertion of any non-ASCII character into a column of this type would result in an error.

bigint: Represents a 64-bit signed long. Used to store long numbers. This should be used only when we are sure we need such long numbers because this occupies more space as compared to int.

blob: Used to store arbitrary bytes. This is represented as hexadecimal, and any data without any validation can be stored in this field.

boolean: Stores true or false.

counter: Represents a 64-bit signed integer, but the value of this column cannot be set. There are only two operations on this column, increment and decrement. In a table with a counter column, only counter types and primary key are allowed. There are no INSERT statements allowed in a table with counter column(s); only UPDATE can be used. For example:

> CREATE TABLE website_tracker (
    id int PRIMARY KEY,
    url text,
    visitor_count counter
    );
    
> UPDATE website_tracker
    SET visitor_count = visitor_count + 1
    WHERE id = 1;
    
> SELECT * FROM website_tracker;
id | url   | count
----+------+------
 1 | a.com | 1

(1 rows)

date: Represents a date value without a time value. Cassandra encodes the same as an integer value since epoch. Dates can be represented as strings in format yyyy-mm-dd.

decimal: Represents a variable-precision decimal value. Best for storing currency or financial values.

double: Stores a 64-bit floating point value.

float: Stores a 32-bit floating point value.

inet: Represents an IP address string in IPv4 or IPv6 format.

int: Represents a 32-bit signed integer. Used mostly when storing integer values.

smallint: Represents a 2-byte (16-bit) integer. Can be preferred over int for storing small integer values to save space.

text: Represents a UTF-8 encoded string. Should be used when we want to store non-ASCII characters.

time: Represents a time value. Represented as a string in the format 01:02:03.123 and stored 64-bit signed integer which represents nanoseconds elapsed since midnight.

timestamp: Stores both date and time components with millisecond precision. Can be represented as text in the format 2016-12-01 01:02:03.123.

tinyint: Represents a 1-byte (8-bit) integer. Can be preferred over int or smallint for storing small integer values to save space.

timeuuid: Stores version 1 UUID.

uuid: UUID in standard format. This is a larger value as compared to timeuuid.

varchar: Similar to text. Both can be used interchangeably.

variant: An integer value with arbitrary precision. It is advised to use a datatype with required precision.

Collection Data Types

set: This type stores a collection of values. The values are stored as unordered, but CQLSH would return them in a sorted manner. For example, strings would be sorted alphabetically. Let's modify the table we created above:

> ALTER TABLE website_tracker ADD tagsSet set<text>;

> UPDATE website_tracker SET tagsSet = {'tag1'} WHERE id = 1;

> SELECT tagsSet FROM website_tracker WHERE id = 1;

 tagsSet
----------
 {'tag1'}
 
> UPDATE website_tracker SET tagsSet = tagsSet + {'gat2'} WHERE id = 1;

> SELECT tagsSet FROM website_tracker WHERE id = 1;

 tagsSet
------------------
 {'gat2', 'tag1'}

You can use the usual set operations like difference to remove elements. To clear out or replace the complete set, do SET tags = {<something>}.

list: A list also stores a collection of values but stores them in ordered fashion, which is by the order of insertion by default. Let's try to do the same thing that we did above with sets with a list now:

> ALTER TABLE website_tracker ADD tagsList list<text>;

> UPDATE website_tracker SET tagsList = ['tag1'] WHERE id = 1;

> SELECT tagsList FROM website_tracker WHERE id = 1;

 tagsList
----------
 ['tag1']
 
> UPDATE website_tracker SET tagsList = tagsList + ['gat2'] WHERE id = 1;

> SELECT tagsList FROM website_tracker WHERE id = 1;

 tagsList
------------------
 ['tag1', 'gat2']

In a list, values can be prepended, subtracted (as in sets), inserted/replaced/deleted by index value (SET tags[1] = '<somevalue>'), etc.

map: A map contains a collection of key-value pairs. These can be anything except a counter type. Let's have a small description for each tag.

> ALTER TABLE website_tracker ADD tagsMap map<text, text>;

> UPDATE website_tracker SET tagsMap = {'tag1': 'Tag One'} WHERE id = 1;

> SELECT tagsMap FROM website_tracker WHERE id = 1;

 tagsMap
----------------------
 {'tag1': 'Tag One'}
 
> UPDATE website_tracker SET tagsMap['tag2'] = 'Tag Two' WHERE id = 1;

> SELECT tagsMap FROM website_tracker WHERE id = 1;

 tagsMap
------------------
 {'tag1': 'Tag One', 'tag2': 'Tag Two'}

User-Defined Data Types

It is possible in Cassandra to define our own types. This gives a lot of flexibility and makes overall maintenance of data easier. Let's say we want to store the registration address of the website.

> CREATE TYPE address (
        ... street text,
        ... city text,
        ... state text);
        
> ALTER TABLE website_tracker ADD reg_address address;

In order to use a user-defined type in a nested collection, we need to specify it as a frozen collection.

> ALTER TABLE website_tracker ADD reg_addresses map<text, frozen<address>>;

Using GoCQL

I am assuming that you have some knowledge of using Golang and configuring and installing packages.

Installation

To install the gocql package, run the following command from shell:

$ go get http://ift.tt/1cXjEki

Now I will create a Go script which will explain the concepts needed to understand gocql.

Writing the Script

main.go

package main

import (
    "http://ift.tt/1cXjEki"
        "log"
        "time"
)

func PerformOperations() {
        // Provide the cassandra cluster instance here.
        cluster := gocql.NewCluster("127.0.0.1")

        // The authenticator is needed if password authentication is
        // enabled for your Cassandra installation. If not, this can
        // be removed.
        cluster.Authenticator = gocql.PasswordAuthenticator{
                Username: "some_username",
                Password: "some_password",
        }

        // gocql requires the keyspace to be provided before the session is created.
        // In future there might be provisions to do this later.
        cluster.Keyspace = "keyspace_name"

        // This is time after which the creation of session call would timeout.
        // This can be customised as needed.
        cluster.Timeout = 5 * time.Second

        cluster.ProtoVersion = 4
        session, err := cluster.CreateSession()
        if err != nil {
                log.Fatalf("Could not connect to cassandra cluster: %v", err)
        }

        // Check if the table already exists. Create if table does not exist
        keySpaceMeta, _ := session.KeyspaceMetadata("keyspace_name")

        if _, exists := keySpaceMeta.Tables["person"]; exists != true {
                // Create a table
                session.Query("CREATE TABLE person (" +
                        "id text, name text, phone text, " +
                        "PRIMARY KEY (id))").Exec()
        }

        // DIY: Update table with something if it already exist.

        // Insert record into table using prepared statements
        session.Query("INSERT INTO person (id, name, phone) VALUES (?, ?, ?)",
                "shalabh", "Shalabh Aggarwal", "1234567890").Exec()

        // DIY: Update existing record

        // Select record and run some process on data fetched
        var name string
        var phone string
        if err := session.Query(
                "SELECT name, phone FROM person WHERE id='shalabh'").Scan(
                &name, &phone); err != nil {
                if err != gocql.ErrNotFound {
                        log.Fatalf("Query failed: %v", err)
                }
        }
        log.Printf("Name: %v", name)
        log.Printf("Phone: %v", phone)

        // Fetch multiple rows and run process over them
        iter := session.Query("SELECT name, phone FROM person").Iter()
        for iter.Scan(&name, &phone) {
                log.Printf("Iter Name: %v", name)
                log.Printf("Iter Phone: %v", phone)
        }

        // DIY: Delete record
}

func main() {
        PerformOperations()
}

Most of the working concepts are explained in the above code itself. Some points worth noting are the different operations used along with session.Query(). Apart from the three operations below, there are many more supported which can be seen in the documentation.

Exec(): This would just execute the query without returning any rows. Returns error if any.
Scan(): This would execute the query while copying the values of columns from the first row matched in the query to the variables passed. It would discard any rows apart from the first one.
Iter(): This would execute the query and return an iterator which would then just work like how Scan() works for each row fetched.

Running the Script

To run the script, execute the command below in shell.

$ go run main.go

2017/02/03 12:53:40 Name: Shalabh Aggarwal
2017/02/03 12:53:40 Phone: 1234567890
2017/02/03 12:53:40 Iter Name: Shalabh Aggarwal
2017/02/03 12:53:40 Iter Phone: 1234567890

Conclusion

In this second part of this tutorial series, we covered various built-in data types available with Cassandra. We also saw how collection types work and how user-defined types can be used to make an overall schema flexible. We also saw how we can interact with Cassandra programmatically in Golang using gocql. This package offers a lot more functionality which can be explored on your own.

Web Development

Tuesday, February 21, 2017