Cassandra storage type
Table of contents
Cassandra storage type
This documentation aims to point out specifics related to the cassandra storageType. Cassandra in general is not general purpose database and aims at providing efficient access to Big Data by carefully describing the data and specific access patterns. Zendro provides a standardized API for all defined models and assumes certain considerations when defining your data model. Important for the cassandra storageType is the presence of a unique primary Key attribute providing access to a specific record identified by that key. Aside from simple primary keys cassandra offers compound primary keys to define efficient access on a cluster of servers. For now we do not support these kind of primary key definitions. A primary key must refer to one attribute.
Restrictions
In comparision to relational databases, access and querying of data stored in a cassandra database has several restrictions.
Operators
Zendro directly implements querying of the cassandra database directly via CQL (Cassandra Query Language) statements using the datastax cassandra-driver. CQL implements only a subset of logical operators available for example in relational databases. The operators included in Zendro are.
zendro-operator | operation |
---|---|
eq | = |
lt | < |
gt | > |
lte | <= |
gte | >= |
ne | != |
in | in |
contains | contains * |
and | and |
* contains relates to cassandra Collections.
Note that cassandra specifically does not implement the logical or
operator.
Pagination
Cassandra, due to its distributed nature does not implement traditional limit-offset like pagination. That implies that only the cursor-pagination based graphql Connection type readAllCursor
query is supported. zendro implements the cursor-based-pagination via the base64
encoded record.
Note that cassandra does not support backward pagination, so valid arguments given to the pagination argument can be first
and after
.
Sorting
Cassandra only supports the sorting of the results from a query by specifically defining this via the compound primary key and the definition of a column by which to partition the data. Since zendro, for now, doen’t implement those kind of primary keys, no sorting of cassandra results is possible. The default is defined by cassandras internal token
function.
Associations
Associations with the targetStorageType
set to Cassandra have some restrictions on searching. Since the association is resolved via adding a search for either eq
to the respective foreignkey or in
the foreignkey array in case the association is of type many_to_many
. Cassandra does not allow multiple Equal restrictions on the id field, the driver will throw an Error. To circumvent that a workaround where searches on the idField are merged with the search on the foreignkey(s) is implemented.
Be aware that the workaround only works because cassandra does not support the OR operator. There are also the following pitfalls to consider:
-
cassandra does not allow SELECT queries on indexed columns with IN clause for the PRIMARY KEY.
- If there are multiple Equal restrictions on the id field cassandra will throw an Error.
- multiple searches on the idAttribute field will still throw the above Error since search nodes are merged one-by-one with the foreignkey(s).
- This workaround is only for associations where the foreignKey is stored on the side of the cassandra model, since IN clauses are only allowed on the primarykey column, not on any foreignkey column.
Access Control
CQL implements an optional ALLOW FILTERING
argument to its queries that allows server side filtering of the result set. Since these types of queries may exhibit unpredictable performance issues, depending on several aspects such as secondary indeces on columns etc., it is required to specifically allow server side filtering directly in the query.
Zendro implements this via Access Control of the data models. Only users with the editor role have priviledged access to send those queries that require server side filtering.
Collection types
Cassandra implements several different data types regarding collections, namley sets
, lists
and maps
each with different characteristics. Zendro makes use of sets
and lists
depending on the use case.
Foreign Key arrays
Zendro implements many_to_many
relations between models via paired end foreign keys. In a cassandra datamodel this is solved via a set
, which represents a sorted list of unique values of a specific data-type.
Array type attributes
Zendro supports Array type attributes by defining them in the JSON data-model-definition within square brackets, e.g. [String]
. In cassandra models zendro internally solves this via list
data-types, which represents a (sorted) collection of non-unique values.
Distributed data models
Due to the nature of restrictions related to the ordering of result sets in cassandra (see above) it is not possible to define a distributed data model that is stored in both a relational and a cassandra database, i.e. has sql and cassandra adapters. It is however possible to define distributed data models that only live in cassandra databases. These need to set a cassandraResctrictions
flag to ensure the correct behaviour of the distributed setup:
"model": "dog",
"storageType" : "distributed-data-model",
"registry": ["dog_instance1, dog_instance2"],
"cassandraRestrictions": true,
...