Tuesday, February 21, 2017

Getting Started With Cassandra: CQL Data Types and Using GoCQL

Getting Started With Cassandra: CQL Data Types and Using GoCQL

In the first part of this tutorial series, I covered the very basics of Cassandra and used CQLSH to communicate with the database system via shell. In this second part, I will cover in brief the major datatypes available in CQL. Then I will cover the essentials of gocql, a Golang client package which implements the Cassandra driver for Golang. I will cover how to create a session connection with Cassandra with some configuration options and then how to run various queries using the session.

Cassandra provides support for basic datatypes which are available in almost all database systems. Apart from this, it also provides for complex collection types which can store combinations of simple data in the form of list, set, and map. Apart from this, CQL also has support for user-defined types, allowing developers to have their own datatypes which are easy to read and understand.

Basic Data Types

  • ascii: Represents a string of ASCII characters. Insertion of any non-ASCII character into a column of this type would result in an error.
  • bigint: Represents a 64-bit signed long. Used to store long numbers. This should be used only when we are sure we need such long numbers because this occupies more space as compared to int.
  • blob: Used to store arbitrary bytes. This is represented as hexadecimal, and any data without any validation can be stored in this field.
  • boolean: Stores true or false.
  • counter: Represents a 64-bit signed integer, but the value of this column cannot be set. There are only two operations on this column, increment and decrement. In a table with a counter column, only counter types and primary key are allowed. There are no INSERT statements allowed in a table with counter column(s); only UPDATE can be used. For example:
  • date: Represents a date value without a time value. Cassandra encodes the same as an integer value since epoch. Dates can be represented as strings in format yyyy-mm-dd.
  • decimal: Represents a variable-precision decimal value. Best for storing currency or financial values.
  • double: Stores a 64-bit floating point value.
  • float: Stores a 32-bit floating point value.
  • inet: Represents an IP address string in IPv4 or IPv6 format.
  • int: Represents a 32-bit signed integer. Used mostly when storing integer values.
  • smallint: Represents a 2-byte (16-bit) integer. Can be preferred over int for storing small integer values to save space.
  • text: Represents a UTF-8 encoded string. Should be used when we want to store non-ASCII characters.
  • time: Represents a time value. Represented as a string in the format 01:02:03.123 and stored 64-bit signed integer which represents nanoseconds elapsed since midnight.
  • timestamp: Stores both date and time components with millisecond precision. Can be represented as text in the format 2016-12-01 01:02:03.123.
  • tinyint: Represents a 1-byte (8-bit) integer. Can be preferred over int or smallint for storing small integer values to save space.
  • timeuuid: Stores version 1 UUID.
  • uuid: UUID in standard format. This is a larger value as compared to timeuuid.
  • varchar: Similar to text. Both can be used interchangeably.
  • variant: An integer value with arbitrary precision. It is advised to use a datatype with required precision.

Collection Data Types

  • set: This type stores a collection of values. The values are stored as unordered, but CQLSH would return them in a sorted manner. For example, strings would be sorted alphabetically. Let's modify the table we created above:

You can use the usual set operations like difference to remove elements. To clear out or replace the complete set, do SET tags = {<something>}.

  • list: A list also stores a collection of values but stores them in ordered fashion, which is by the order of insertion by default. Let's try to do the same thing that we did above with sets with a list now:

In a list, values can be prepended, subtracted (as in sets), inserted/replaced/deleted by index value (SET tags[1] = '<somevalue>'), etc.

  • map: A map contains a collection of key-value pairs. These can be anything except a counter type. Let's have a small description for each tag.

User-Defined Data Types

It is possible in Cassandra to define our own types. This gives a lot of flexibility and makes overall maintenance of data easier. Let's say we want to store the registration address of the website.

In order to use a user-defined type in a nested collection, we need to specify it as a frozen collection.

Using GoCQL

I am assuming that you have some knowledge of using Golang and configuring and installing packages.

Installation

To install the gocql package, run the following command from shell:

Now I will create a Go script which will explain the concepts needed to understand gocql.

Writing the Script

main.go

Most of the working concepts are explained in the above code itself. Some points worth noting are the different operations used along with session.Query(). Apart from the three operations below, there are many more supported which can be seen in the documentation.

  • Exec(): This would just execute the query without returning any rows. Returns error if any.
  • Scan(): This would execute the query while copying the values of columns from the first row matched in the query to the variables passed. It would discard any rows apart from the first one.
  • Iter(): This would execute the query and return an iterator which would then just work like how Scan() works for each row fetched.

Running the Script

To run the script, execute the command below in shell.

Conclusion

In this second part of this tutorial series, we covered various built-in data types available with Cassandra. We also saw how collection types work and how user-defined types can be used to make an overall schema flexible. We also saw how we can interact with Cassandra programmatically in Golang using gocql. This package offers a lot more functionality which can be explored on your own.


No comments:

Post a Comment