Cassandra storage type

Cassandra storage type

Cassandra storage type

This documentation aims to point out specifics related to the cassandra storageType. Cassandra in general is not general purpose database and aims at providing efficient access to Big Data by carefully describing the data and specific access patterns. Zendro provides a standardized API for all defined models and assumes certain considerations when defining your data model. Important for the cassandra storageType is the presence of a unique primary Key attribute providing access to a specific record identified by that key. Aside from simple primary keys cassandra offers compound primary keys to define efficient access on a cluster of servers. For now we do not support these kind of primary key definitions. A primary key must refer to one attribute.

Restrictions

In comparision to relational databases, access and querying of data stored in a cassandra database has several restrictions.

Operators

Zendro directly implements querying of the cassandra database directly via CQL (Cassandra Query Language) statements using the datastax cassandra-driver. CQL implements only a subset of logical operators available for example in relational databases. The operators included in Zendro are.

zendro-operator	operation
eq	`=`
lt	`<`
gt	`>`
lte	`<=`
gte	`>=`
ne	`!=`
in	`in`
contains	`contains`*
and	`and`

* contains relates to cassandra Collections.

Note that cassandra specifically does not implement the logical or operator.

Pagination

Cassandra, due to its distributed nature does not implement traditional limit-offset like pagination. That implies that only the cursor-pagination based graphql Connection type readAllCursor query is supported. zendro implements the cursor-based-pagination via the base64 encoded record.

Note that cassandra does not support backward pagination, so valid arguments given to the pagination argument can be first and after.

Sorting

Cassandra only supports the sorting of the results from a query by specifically defining this via the compound primary key and the definition of a column by which to partition the data. Since zendro, for now, doen’t implement those kind of primary keys, no sorting of cassandra results is possible. The default is defined by cassandras internal token function.

Associations

Associations with the targetStorageType set to Cassandra have some restrictions on searching. Since the association is resolved via adding a search for either eq to the respective foreignkey or in the foreignkey array in case the association is of type many_to_many. Cassandra does not allow multiple Equal restrictions on the id field, the driver will throw an Error. To circumvent that a workaround where searches on the idField are merged with the search on the foreignkey(s) is implemented.

Be aware that the workaround only works because cassandra does not support the OR operator. There are also the following pitfalls to consider:

cassandra does not allow SELECT queries on indexed columns with IN clause for the PRIMARY KEY.
If there are multiple Equal restrictions on the id field cassandra will throw an Error.
multiple searches on the idAttribute field will still throw the above Error since search nodes are merged one-by-one with the foreignkey(s).
This workaround is only for associations where the foreignKey is stored on the side of the cassandra model, since IN clauses are only allowed on the primarykey column, not on any foreignkey column.

Access Control

CQL implements an optional ALLOW FILTERING argument to its queries that allows server side filtering of the result set. Since these types of queries may exhibit unpredictable performance issues, depending on several aspects such as secondary indeces on columns etc., it is required to specifically allow server side filtering directly in the query.

Zendro implements this via Access Control of the data models. Only users with the editor role have priviledged access to send those queries that require server side filtering.

Collection types

Cassandra implements several different data types regarding collections, namley sets, lists and maps each with different characteristics. Zendro makes use of sets and lists depending on the use case.

Foreign Key arrays

Zendro implements many_to_many relations between models via paired end foreign keys. In a cassandra datamodel this is solved via a set, which represents a sorted list of unique values of a specific data-type.

Array type attributes

Zendro supports Array type attributes by defining them in the JSON data-model-definition within square brackets, e.g. [String]. In cassandra models zendro internally solves this via list data-types, which represents a (sorted) collection of non-unique values.

Distributed data models

Due to the nature of restrictions related to the ordering of result sets in cassandra (see above) it is not possible to define a distributed data model that is stored in both a relational and a cassandra database, i.e. has sql and cassandra adapters. It is however possible to define distributed data models that only live in cassandra databases. These need to set a cassandraResctrictions flag to ensure the correct behaviour of the distributed setup:

"model": "dog",
"storageType" : "distributed-data-model",
"registry": ["dog_instance1, dog_instance2"],
"cassandraRestrictions": true,
...

Cassandra storage type

Table of contents

Cassandra storage type

Restrictions

Operators

Pagination

Sorting

Associations

Access Control

Collection types

Foreign Key arrays

Array type attributes

Distributed data models