Understanding Cassandra’s Thrift API in PHP

For the past few months I’ve been focused on developing a simple, and fast abstraction layer on top of Cassandra’s Thrift API. One of my priorities was ease of use for other developers at my company, requiring to hide Thrift from them. Thus, I had to understand how the Thrift API calls are done for each request. As I watched over Cassandra’s IRC Channel I noticed lots of newbies have issues with understanding the API calls. This post is to show you how you can read Thrift’s auto-generated code and understand how to formulate your calls and parameters for each API call to Cassandra, so that you become up to speed faster. I am using PHP in my examples and I have been developing on Cassandra 0.7 trunc.

Update:I have updated this post with changes in Cassandra 0.7 beta2 and verified the Thrift API version works.

If you’d used Thrift yourself to generate the interface code, you’d notice a folder gets created where you run thrift -gen command called gen-php. Inside that folder, there are 3 files:

Cassandra.php is the main point of entry to study Cassandra Thrift API calls. You’d see it begin with this interface:

interface CassandraIf {...}

followed by a class implementing that interface:

class CassandraClient implements CassandraIf {...}

In your program, you will instantiate CassandraClient probably this way:

$socket = new TSocket(array('127.0.0.1'),array('9160'),TRUE);
$client = new CassandraClient(new TBinaryProtocolAccelerated(new TFramedTransport($socket)));

I said “probably” because depending on your Cassandra configuration you may want to instantiate TBufferedTransport instead of TFramedTransport, or depending on you PHP setup, you may want to use TBinaryProtocol instead of TBinaryProtocolAccelerated. But anyway, that is not the point of this article, so shift your focus back to those three files we’ve auto-generated with Thrift.

So, this file will be our starting point of study. The next file will be:

cassandra_types.php which defines many classes all prefixed with the package keyword cassandra_ . These are actually the object types you’ll need to construct and pass to the API calls in CassandraClient.

And finally the last file is:

cassandra_constants.php which I will give it the least attention as you won’t need to interact with it at all except when you want to check the API version and that is the most important line there. The API version tell you which version of the generated code you’re using. This version has to match the Thrift server’s version in order to be sure that definitions of methods and their functionality is the same and the API calls made to the server make sense to the server and server response makes senses to the API client.

Now that we know where to look for resources that we need for this exercise, let’s start by scratching our heads to formulate the most confused API call, batch_mutate.

Step1: Let’s take a look at CassandraClient->batch_mutate()‘s definition in Cassandra.php:

public function batch_mutate($mutation_map, $consistency_level)  {
    $this->send_batch_mutate($mutation_map, $consistency_level);
    $this->recv_batch_mutate();
}

Step2: If you follow the code inside send_batch_mutate, you’ll see the arguments are mapped to a class named cassandra_Cassandra_batch_mutate_args(). Looking inside the same file, you’ll see the definition of cassandra_Cassandra_batch_mutate_args. Lets just focus on its constructor:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class cassandra_Cassandra_batch_mutate_args {
  static $_TSPEC;

  public $mutation_map = null;
  public $consistency_level =   1;

  public function __construct($vals=null) {
    if (!isset(self::$_TSPEC)) {
      self::$_TSPEC = array(
        1 => array(
          'var' => 'mutation_map',
          'type' => TType::MAP,
          'ktype' => TType::STRING,
          'vtype' => TType::MAP,
          'key' => array(
            'type' => TType::STRING,
          ),
          'val' => array(
            'type' => TType::MAP,
            'ktype' => TType::STRING,
            'vtype' => TType::LST,
            'key' => array(
              'type' => TType::STRING,
            ),
            'val' => array(
              'type' => TType::LST,
              'etype' => TType::STRUCT,
              'elem' => array(
                'type' => TType::STRUCT,
                'class' => 'cassandra_Mutation',
                ),
              ),
            ),
          ),
        2 => array(
          'var' => 'consistency_level',
          'type' => TType::I32,
          ),
        );
    }
    if (is_array($vals)) {
      if (isset($vals['mutation_map'])) {
        $this->mutation_map = $vals['mutation_map'];
      }
      if (isset($vals['consistency_level'])) {
        $this->consistency_level = $vals['consistency_level'];
      }
    }
  }

Thrift translates data structures defined for a specific system into something called _TSPEC. In this case, our system which Thrift talks to is Cassandra, and our specific data structure is what carries batch_mutate‘s arguments.

Step3: Here is comes the difficult part and that is to understand Thrift types. Thrift’s Wiki has a decent explanation of the types, so I recommend a visit there before proceeding, or alternatively you can read my port about interpreting Thrift’s Data Types and TSPEC. Let’s now focus on the batch_mutate args structure. From reading the code above, you can see that mutation_map is a map of map of lits of cassandra_Mutation. Confusing enough, what does that mean? Maps in PHP are equivalent to hashed arrays which are arrays with unique string keys, and lists are arrays with numeric indexes. But what would be the index names for the first array? That is when I got very confused and made a trip to Cassandra API wiki which says

the outer map key is a row key, the inner map key is the column family name

So, I figured what both keys are, thus code-wise it will look something like this:

1
2
3
4
5
$mutation_map =
    array('row_key1'=>array('Keyspace1'=>
                array($cassandra_Mutation1,$cassandra_Mutation2,...))
         'row_key2'=>array('Keyspace2'=>
                array($cassandra_Mutation3,$cassandra_Mutation4,...)));

Step4: Cassandra’s actual data types are all prefixed with keyword cassandra_ and are defined inside the file I previously mentions called cassandra_types.php. In this step we will look at class cassandra_Mutation:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
class cassandra_Mutation {
  static $_TSPEC;

  public $column_or_supercolumn = null;
  public $deletion = null;

  public function __construct($vals=null) {
    if (!isset(self::$_TSPEC)) {
      self::$_TSPEC = array(
        1 => array(
          'var' => 'column_or_supercolumn',
          'type' => TType::STRUCT,
          'class' => 'cassandra_ColumnOrSuperColumn',
          ),
        2 => array(
          'var' => 'deletion',
          'type' => TType::STRUCT,
          'class' => 'cassandra_Deletion',
          ),
        );
    }
}

We trace this down to the other Cassandra types we need in the same file:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
class cassandra_ColumnOrSuperColumn {
  static $_TSPEC;

  public $column = null;
  public $super_column = null;

  public function __construct($vals=null) {
    if (!isset(self::$_TSPEC)) {
      self::$_TSPEC = array(
        1 => array(
          'var' => 'column',
          'type' => TType::STRUCT,
          'class' => 'cassandra_Column',
          ),
        2 => array(
          'var' => 'super_column',
          'type' => TType::STRUCT,
          'class' => 'cassandra_SuperColumn',
          ),
        );
    }
}

class cassandra_Deletion {
  static $_TSPEC;

  public $timestamp = null;
  public $super_column = null;
  public $predicate = null;

  public function __construct($vals=null) {
    if (!isset(self::$_TSPEC)) {
      self::$_TSPEC = array(
        1 => array(
          'var' => 'timestamp',
          'type' => TType::I64,
          ),
        2 => array(
          'var' => 'super_column',
          'type' => TType::STRING,
          ),
        3 => array(
          'var' => 'predicate',
          'type' => TType::STRUCT,
          'class' => 'cassandra_SlicePredicate',
          ),
        );
    }
}

OK, you get it now? You can even dive deeper into cassandra_SlicePredicate and others, but I think I have made my point and don’t need to copy-paste more code from Thrift.

Step 4: Not in the last step, we will need to work on creating these structure bottom-up and pass the final result to method batch_mutate. So, let’s create some example columns and insert them into Cassandra:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
//This function is very important in generating correct timestamps for Cassandra
//Read my other post about Cassandra timestamps and PHP
function cass_time() {
     return intval(microtime(true)*1000000);
}

//Let's produce some columns we want to insert
$columnA = new cassandra_Column(array('name'=>'column a','value'=>'column a value','timestamp'=> cass_time()));
$columnB = new cassandra_Column(array('name'=>'column b','value'=>'column b value','timestamp'=> cass_time()));

//In our design we will use one super column which has the columns
$columns = array($columnA,$columnB);
$sc = new cassandra_SuperColumn(array('columns'=>$columns));

//We need to form this object, giving it our super column instance because it is what mutation object wants
$c_or_sc = new cassandra_ColumnOrSuperColumn(array('super_column'=>$sc));

//Now create a mutation and give it our ColumnOrSuperColumn object
$mutation = new cassandra_Mutation(array('column_or_supercolumn'=>$c_or_sc));

//Now we create the mutation map as shown in Step 3
$mutation_map = array();
$mutation_map['row_key']['Super1'][] = $mutation;

//Viola! Let's create a client and call batch_mutate()
$client = new CassandraClient( new TBinaryProtocolAccelerated( new TFramedTransport(new TSocketPool(array('127.0.0.1'))));

$client->set_keyspace('Keyspace1');
$client->batch_mutate($mutation_map,cassandra_ConsistencyLevel::ONE);

Here are the noteworthy facts about the code snipped above:

    I am using the default Keyspace1 and ColumnFamily Super1 which ships with default Cassandra configuration .yaml in Cassandra 0.7
    Thrift API version I have is 19.2.0 which ships with Cassandra 0.7 beta2. You can check this inside cassandra_constants.php file.
    In Line 13 of the code above, I chose to use Super1 and thus had to create a Super Column instance to pass to cassandra_ColumnOrSuperColumn. If you were to use a non-super column family, then you had to create multiple cassandra_ColumnOrSuperColumn and set its column property and map it to the mutation object.
    batch_mutate call is mutually exclusive, meaning that all the mutations inside the mutation_map are saved into Cassandra or failure will be reported.
    In Line 26, I use TSocketPool() because I usually have a cluster of more than one node and that helps me have a pool of the nodes in my cluster, otherwise, using TSocket will do the same for a single node.

Hope this helped you navigate some Thrift code more efficiently and build your client faster. Comments are welcomed and I’ll take your feedback into improving this post for everyone.

Tagged as: ,