blog/ribw/nosql-databases-basic-operations-and-architecture/index.html (view raw)
1<!DOCTYPE html><html lang=en><head><meta charset=utf-8><meta name=description content="Official Lonami's website"><meta name=viewport content="width=device-width, initial-scale=1.0, user-scalable=yes"><title> Cassandra: Basic Operations and Architecture | Lonami's Blog </title><link rel=stylesheet href=/style.css><body><article><nav class=sections><ul><li><a href=/>lonami's site</a><li><a href=/blog class=selected>blog</a><li><a href=/golb>golb</a></ul></nav><main><h1 class=title>Cassandra: Basic Operations and Architecture</h1><div class=time><p>2020-03-05T02:00:36+00:00<p>last updated 2020-03-24T17:57:05+00:00</div><p>This is the second post in the NoSQL Databases series, with a brief description on the basic operations (such as insertion, retrieval, indexing…), and complete execution along with the data model / architecture.<p>Other posts in this series:<ul><li><a href=/blog/ribw/nosql-databases-an-introduction/>Cassandra: an Introduction</a><li><a href=/blog/ribw/nosql-databases-basic-operations-and-architecture/>Cassandra: Basic Operations and Architecture</a> (this post)</ul><hr><p>Cassandra uses it own Query Language for managing the databases, it is known as **CQL **(<strong>Cassandra Query Language</strong>). Cassandra stores data in <strong><em>tables</em></strong>, as in relational databases, and these tables are grouped in <strong><em>keyspaces</em></strong>. A keyspace defines a number of options that applies to all the tables it contains. The most used option is the **replication strategy. **It is recommended to have only one keyspace by application.<p>It is important to mention that <strong>tables and keyspaces</strong> are** case insensitive**, so myTable is equivalent to mytable, but it is possible to <strong>force case sensitivity</strong> using <strong>double-quotes</strong>.<p>To begin with the basic operations it is necessary to deploy Cassandra:<ol><li>Open a terminal in the root of the Apache Cassandra folder downloaded in the previous post.<li>Run the command:</ol><pre><code>$ bin/cassandra
2</code></pre><p>Once Cassandra is deployed, it is time to open a** CQL Shell**, in <strong>other terminal</strong>, with the command:<pre><code>$ bin/cqlsh
3</code></pre><p>It is possible to check if Cassandra is deployed if the SQL Shell prints the next message:<p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/uwqQgQte-cuYb_pePFOuY58re23kngrDKNgL1qz4yOfnBDZkqMIH3fFuCrye.png> <em>CQL Shell</em><h2 id=create-insert>Create/Insert</h2><h3 id=ddl-data-definition-language>DDL (Data Definition Language)</h3><h4 id=create-keyspace>Create keyspace</h4><p>A keyspace is created using a **CREATE KEYSPACE **statement:<pre><code>$ **CREATE** KEYSPACE [ **IF** **NOT** **EXISTS** ] keyspace_name **WITH** options;
4</code></pre><p>The supported “<strong>options</strong>” are:<ul><li>“<strong>replication</strong>”: this is **mandatory **and defines the <strong>replication strategy</strong> and the <strong>replication factor</strong> (the number of nodes that will have a copy of the data). Within this option there is a property called “<strong>class</strong>” in which the <strong>replication strategy</strong> is specified (“SimpleStrategy” or “NetworkTopologyStrategy”)<li>“<strong>durable_writes</strong>”: this is <strong>not mandatory</strong> and it is possible to use the <strong>commit logs for updates</strong>. Attempting to create an already existing keyspace will return an error unless the **IF NOT EXISTS **directive is used.</ul><p>The example associated to this statement is create a keyspace with name “test_keyspace” with “SimpleStrategy” as “class” of replication and a “replication_factor” of 3.<pre><code>**CREATE** KEYSPACE test_keyspace
5 **WITH** **replication** = {'class': 'SimpleStrategy',
6 'replication_factor' : 3};
7</code></pre><p>The **USE **statement allows to <strong>change</strong> the current <strong>keyspace</strong>. The syntax of this statement is very simple:<pre><code>**USE** keyspace_name;
8</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/RDWIG2RwvEevUFQv6TGFtGzRm4_9ERpxPf0feriflaj3alvWw3FEIAr_ZdF1.png> <em>USE statement</em><p>It is also possible to get the metadata from a keyspace with the **DESCRIBE **statement.<pre><code>**DESCRIBE** KEYSPACES | KEYSPACE keyspace_name;
9</code></pre><h4 id=create-table>Create table</h4><p>Creating a new table uses the **CREATE TABLE **statement:<pre><code>**CREATE** **TABLE** [ **IF** **NOT** **EXISTS** ] table_name
10 '('
11 column_definition
12 ( ',' column_definition )*
13 [ ',' **PRIMARY** **KEY** '(' primary_key ')' ]
14 ')' [ **WITH** table_options ];
15</code></pre><p>With “column_definition” as: column_name cql_type [ STATIC ] [ PRIMARY KEY]; “primary_key” as: partition_key [ ‘,’ clustering_columns ]; and “table_options” as: COMPACT STORAGE [ AND table_options ] or CLUSTERING ORDER BY ‘(‘ clustering_order ‘)’ [ AND table_options ] or “options”.<p>Attempting to create an already existing table will return an error unless the <strong>IF NOT EXISTS</strong> directive is used.<p>The <strong>CQL types</strong> are described in the References section.<p>For example, we are going to create a table called “species_table” in the keyspace “test_keyspace” in which we will have a “species” text (as PRIMARY KEY), a “common_name” text, a “population” varint, a “average_size” int and a “sex” text. Besides, we are going to add a comment to the table: “Some species records”;<pre><code>**CREATE** **TABLE** species_table (
16 species text **PRIMARY** **KEY**,
17 common_name text,
18 population varint,
19 average_size **int**,
20 sex text,
21) **WITH** **comment**='Some species records';
22</code></pre><p>It is also possible to get the metadata from a table with the **DESCRIBE **statement.<pre><code>**DESCRIBE** **TABLES** | **TABLE** [keyspace_name.]table_name;
23</code></pre><h3 id=dml-data-manipulation-language>DML (Data Manipulation Language)</h3><h4 id=insert-data>Insert data</h4><p>Inserting data for a row is done using an **INSERT **statement:<pre><code>**INSERT** **INTO** table_name ( names_values | json_clause )
24 [ **IF** **NOT** **EXISTS** ]
25 [ **USING** update_parameter ( **AND** update_parameter )* ];
26</code></pre><p>Where “names_values” is: names VALUES tuple_literal; “json_clause” is: JSON string [ DEFAULT ( NULL | UNSET ) ]; and “update_parameter” is usually: TTL.<p>For example we are going to use both VALUES and JSON clauses to insert data in the table “species_table”. In the VALUES clause it is necessary to supply the list of columns, not as in the JSON clause that is optional.<p>Note: TTL (Time To Live) and Timestamp are metrics for expiring data, so, when the time set is passed, the operation is expired.<p>In the VALUES clause we are going to insert a new specie called “White monkey”, with an average size of 3, its common name is “Copito de nieve”, population 0 and sex “male”.<pre><code>**INSERT** **INTO** species_table (species, common_name, population, average_size, sex)
27 **VALUES** ('White monkey', 'Copito de nieve', 0, 3, 'male');
28</code></pre><p>In the JSON clause we are going to insert a new specie called “Cloned sheep”, with an average size of 1, its common name is “Dolly the sheep”, population 0 and sex “female”.<pre><code>**INSERT** **INTO** species_table JSON '{"species": "Cloned Sheep",
29 "common_name": "Dolly the Sheep",
30 "average_size":1,
31 "population":0,
32 "sex": "female"}';
33</code></pre><p>Note: all updates for an **INSERT **are applied **atomically **and in <strong>isolation.</strong><h2 id=read>Read</h2><p>Querying data from data is done using a **SELECT **statement:<pre><code>**SELECT** [ JSON | **DISTINCT** ] ( select_clause | '*' )
34 **FROM** table_name
35 [ **WHERE** where_clause ]
36 [ **GROUP** **BY** group_by_clause ]
37 [ **ORDER** **BY** ordering_clause ]
38 [ PER **PARTITION** **LIMIT** (**integer** | bind_marker) ]
39 [ **LIMIT** (**integer** | bind_marker) ]
40 [ ALLOW FILTERING ];
41</code></pre><p>The **CQL SELECT **statement is very **similar **to the **SQL SELECT **statement due to the fact that both allows filtering (<strong>WHERE</strong>), grouping data (<strong>GROUP BY</strong>), ordering the data (<strong>ORDER BY</strong>) and limit the number of data (<strong>LIMIT</strong>). Besides, **CQL offers **a **limit per partition **and allow the **filtering **of <strong>data</strong>.<p>Note: as in SQL it it possible to set alias to the data with the statement <strong>AS.</strong><p>For example we are going to retrieve all the information about those values from the tables “species_table” which “sex” is “male”. Allow filtering is mandatory when there is a WHERE statement.<pre><code>**SELECT** * **FROM** species_table **WHERE** sex = 'male' ALLOW FILTERING;
42</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/s6GrKIGATvOSD7oGRNScUU5RnLN_-3X1JXvnVi_wDT_hrmPMZdnCdBI8DpIJ.png> <em>SELECT statement</em><p>Furthermore, we are going to test the SELECT JSON statement. For this, we are going to retrieve only the species name with a population of 0.<pre><code>**SELECT** JSON species **FROM** species_table **WHERE** population = 0 ALLOW FILTERING;
43</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/Up_eHlqKQp2RI5XIbgPOvj1B5J3gLxz7v7EI0NDRgezQTipecdfDT6AQoso0.png> <em>SELECT JSON statement</em><h2 id=update>Update</h2><h3 id=ddl-data-definition-language-1>DDL (Data Definition Language)</h3><h4 id=alter-keyspace>Alter keyspace</h4><p>The statement **ALTER KEYSPACE **allows to modify the options of a keyspace:<pre><code>**ALTER** KEYSPACE keyspace_name **WITH** options;
44</code></pre><p>Note: the supported **options **are the same than for creating a keyspace, “<strong>replication</strong>” and “<strong>durable_writes</strong>”.<p>The example associated to this statement is to modify the keyspace with name “test_keyspace” and set a “replication_factor” of 4.<pre><code>**ALTER** KEYSPACE test_keyspace
45 **WITH** **replication** = {'class': 'SimpleStrategy', 'replication_factor' : 4};
46</code></pre><h4 id=alter-table>Alter table</h4><p>Altering an existing table uses the **ALTER TABLE **statement:<pre><code>**ALTER** **TABLE** table_name alter_table_instruction;
47</code></pre><p>Where “alter_table_instruction” can be: ADD column_name cql_type ( ‘,’ column_name cql_type )<em>; or DROP column_name ( column_name )</em>; or WITH options<p>The example associated to this statement is to ADD a new column to the table “species_table”, called “extinct” with type “boolean”.<pre><code>**ALTER** **TABLE** species_table **ADD** extinct **boolean**;
48</code></pre><p>Another example is to DROP the column called “sex” from the table “species_table”.<pre><code>**ALTER** **TABLE** species_table **DROP** sex;
49</code></pre><p>Finally, alter the comment with the WITH clause and set the comment to “All species records”.<pre><code>**ALTER** **TABLE** species_table **WITH** **comment**='All species records';
50</code></pre><p>These changes can be checked with the **DESCRIBE **statement:<pre><code>**DESCRIBE** **TABLE** species_table;
51</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/xebKPqkWkn97YVHpRVXZYWvRUfeRUyCH-vPDs67aFaEeU53YTRbDOFscOlAr.png> <em>DESCRIBE table</em><h3 id=dml-data-manipulation-language-1>DML (Data Manipulation Language)</h3><h4 id=update-data>Update data</h4><p>Updating a row is done using an **UPDATE **statement:<pre><code>**UPDATE** table_name
52[ **USING** update_parameter ( **AND** update_parameter )* ]
53**SET** assignment ( ',' assignment )*
54**WHERE** where_clause
55[ **IF** ( **EXISTS** | condition ( **AND** condition )*) ];
56</code></pre><p>Where the update_parameter is: ( TIMESTAMP | TTL) (integer | bind_marker)<p>It is important to mention that the **WHERE **clause is used to select the row to update and **must <strong>include ** all columns</strong> composing the <strong>PRIMARY KEY.</strong><p>We are going to test this statement updating the column “extinct” to true to the column with name ‘White monkey’.<pre><code>**UPDATE** species_table **SET** extinct = **true** **WHERE** species='White monkey';
57</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/IcaCe6VEC5c0ZQIygz-CiclzFyt491u7xPMg2muJLR8grmqaiUzkoQsVCoHf.png> <em>SELECT statement</em><h2 id=delete>Delete</h2><h3 id=ddl-data-definition-language-2>DDL (Data Definition Language)</h3><h4 id=drop-keyspace>Drop keyspace</h4><p>Dropping a keyspace can be done using the **DROP KEYSPACE **statement:<pre><code>**DROP** KEYSPACE [ **IF** **EXISTS** ] keyspace_name;
58</code></pre><p>For example, drop the keyspace called “test_keyspace_2” if it exists:<pre><code>**DROP** KEYSPACE **IF** **EXISTS** test_keyspace_2;
59</code></pre><p>As this keyspace does not exists, this sentence will do nothing.<h4 id=drop-table>Drop table</h4><p>Dropping a table uses the **DROP TABLE **statement:<pre><code>**DROP** **TABLE** [ **IF** **EXISTS** ] table_name;
60</code></pre><p>For example, drop the table called “species_2” if it exists:<pre><code>**DROP** **TABLE** **IF** **EXISTS** species_2;
61</code></pre><p>As this table does not exists, this sentence will do nothing.<h4 id=truncate-table>Truncate (table)</h4><p>A table can be truncated using the **TRUNCATE **statement:<pre><code>**TRUNCATE** [ **TABLE** ] table_name;
62</code></pre><p>Do not execute this command now, because if you do it, you will need to insert the previous data again.<p>Note: as tables are the only object that can be truncated the keyword TABLE can be omitted.<p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/FOkhfpxlWFQCzcdfeWxLTy7wx5inDv0xwVeVhE79Pqtk3yYzWsZJnz_SBhUi.png> <em>TRUNCATE statement</em><h3 id=dml-data-manipulation-language-2>DML (Data Manipulation Language)</h3><h4 id=delete-data>Delete data</h4><p>Deleting rows or parts of rows uses the **DELETE **statement:<pre><code>**DELETE** [ simple_selection ( ',' simple_selection ) ]
63 **FROM** table_name
64 [ **USING** update_parameter ( **AND** update_parameter )* ]
65 **WHERE** where_clause
66 [ **IF** ( **EXISTS** | condition ( **AND** condition )*) ]
67</code></pre><p>Now we are going to delete the value of the column “average_size” from “Cloned Sheep”.<pre><code>**DELETE** average_size **FROM** species_table **WHERE** species = 'Cloned Sheep';
68</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/CyuQokVL5J9TAelq-WEWhNl6kFtbIYs0R1AeU5NX4EkG-YQI81mNHdnf2yWN.png> <em>DELETE value statement</em><p>And we are going to delete the same row as mentioned before.<pre><code>**DELETE** **FROM** species_table **WHERE** species = 'Cloned Sheep';
69</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/jvQ5cXJ5GTVQ6giVhBEpPJmrJw-zwKKyB9nsTm5PRcGSTzkmh-WO4kTeuLpB.png> <em>DELETE row statement</em><h2 id=batch>Batch</h2><p>Multiple <strong>INSERT</strong>, **UPDATE **and **DELETE **can be executed in a <strong>single statement</strong> by grouping them through a **BATCH **statement.<pre><code>**BEGIN** [ UNLOGGED | COUNTER ] BATCH
70 [ **USING** update_parameter ( **AND** update_parameter )* ]
71 modification_statement ( ';' modification_statement )*
72 APPLY BATCH;
73</code></pre><p>Where modification_statement can be a insert_statement or an update_statement or a delete_statement.<ul><li>**UNLOGGED **means that either all operations in a batch eventually complete or none will.<li><strong>COUNTER</strong> means that the updates are not idempotent, so each time we execute the updates in a batch, we will have different results. For example:</ul><pre><code>**BEGIN** BATCH
74 **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
75 **VALUES** ('Blue Shark', 'Tiburillo', 30, 10, **false**);
76 **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
77 **VALUES** ('Cloned sheep', 'Dolly the Sheep', 1, 1, **true**);
78 **UPDATE** species_table **SET** population = 2 **WHERE** species='Cloned sheep';
79 **DELETE** **FROM** species_table **WHERE** species = 'White monkey';
80APPLY BATCH;
81</code></pre><p><img src=https://lonami.dev/blog/ribw/nosql-databases-basic-operations-and-architecture/EL9Dac26o0FqkVoeAKmopEKQe0wWq-xYI14b9RzGxtUkFJA3i2eTiR6qkuuJ.png> <em>BATCH statement</em><h2 id=index>Index</h2><p>CQL support creating secondary indexes on tables, allowing queries on the table to use those indexes.<p>**Creating **a secondary index on a table uses the **CREATE INDEX **statement:<pre><code>**CREATE** [ CUSTOM ] **INDEX** [ **IF** **NOT** **EXISTS** ] [ index_name ]
82 **ON** table_name '(' index_identifier ')'
83 [ **USING** string [ **WITH** OPTIONS = map_literal ] ];
84</code></pre><p>For example we are going to create a index called “population_idx” that is related to the column “population” in the table “species_table”.<pre><code>**CREATE** **INDEX** population_idx **ON** species_table (population);
85</code></pre><p>**Dropping **a secondary index uses the <strong>DROP INDEX</strong> statement:<pre><code>**DROP** **INDEX** [ **IF** **EXISTS** ] index_name;
86</code></pre><p>Now, we are going to drop the previous index:<pre><code>**DROP** **INDEX** **IF** **EXISTS** population_idx;
87</code></pre><h2 id=references>References</h2><ul><li><a href=https://cassandra.apache.org/doc/latest/cql/ddl.html>Cassandra CQL</a><li><a href=https://techdifferences.com/difference-between-ddl-and-dml-in-dbms.html>Differences between DML and DDL</a><li><a href=https://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cqlReferenceTOC.html>Datastax CQL</a><li><a href=https://cassandra.apache.org/doc/latest/cql/types.html#grammar-token-cql-type>Cassandra CQL Types</a><li><a href=https://cassandra.apache.org/doc/latest/cql/indexes.html>Cassandra Index</a></ul></main><footer><div><p>Share your thoughts, or simply come hang with me <a href=https://t.me/LonamiWebs><img src=img/telegram.svg alt=Telegram></a> <a href=mailto:totufals@hotmail.com><img src=img/mail.svg alt=Mail></a></div></footer></article><p class=abyss>Glaze into the abyss… Oh hi there!