all repos — gemini-redirect @ bb1c3c6a132b8617271fc574b1805b283f943179

blog/ribw/nosql-databases-basic-operations-and-architecture/index.html (view raw)

  1<!DOCTYPE html>
  2<html>
  3<head>
  4<meta charset="utf-8" />
  5<meta name="viewport" content="width=device-width, initial-scale=1" />
  6<title>Cassandra: Basic Operations and Architecture</title>
  7<link rel="stylesheet" href="../css/style.css">
  8</head>
  9<body>
 10<main>
 11<p>This is the second post in the NoSQL Databases series, with a brief description on the basic operations (such as insertion, retrieval, indexing…), and complete execution along with the data model / architecture.</p>
 12<div class="date-created-modified">Created 2020-03-05<br>
 13Modified 2020-03-24</div>
 14<p>Other posts in this series:</p>
 15<ul>
 16<li><a href="/blog/ribw/nosql-databases-an-introduction/">Cassandra: an Introduction</a></li>
 17<li><a href="/blog/ribw/nosql-databases-basic-operations-and-architecture/">Cassandra: Basic Operations and Architecture</a> (this post)</li>
 18</ul>
 19<hr />
 20<p>Cassandra uses it own Query Language for managing the databases, it is known as **CQL **(<strong>Cassandra Query Language</strong>). Cassandra stores data in <strong><em>tables</em></strong>, as in relational databases, and these tables are grouped in <strong><em>keyspaces</em></strong>. A keyspace defines a number of options that applies to all the tables it contains. The most used option is the **replication strategy. **It is recommended to have only one keyspace by application.</p>
 21<p>It is important to mention that <strong>tables and keyspaces</strong> are** case insensitive**, so myTable is equivalent to mytable, but it is possible to <strong>force case sensitivity</strong> using <strong>double-quotes</strong>.</p>
 22<p>To begin with the basic operations it is necessary to deploy Cassandra:</p>
 23<ol>
 24<li>Open a terminal in the root of the Apache Cassandra folder downloaded in the previous post.</li>
 25<li>Run the command:</li>
 26</ol>
 27<pre><code>$ bin/cassandra
 28</code></pre>
 29<p>Once Cassandra is deployed, it is time to open a** CQL Shell**, in <strong>other terminal</strong>, with the command: </p>
 30<pre><code>$ bin/cqlsh
 31</code></pre>
 32<p>It is possible to check if Cassandra is deployed if the SQL Shell prints the next message:</p>
 33<p><img src="uwqQgQte-cuYb_pePFOuY58re23kngrDKNgL1qz4yOfnBDZkqMIH3fFuCrye.png" alt="" />
 34<em>CQL Shell</em></p>
 35<h2 class="title" id="create_insert"><a class="anchor" href="#create_insert">¶</a>Create/Insert</h2>
 36<h3 id="ddl_data_definition_language_"><a class="anchor" href="#ddl_data_definition_language_">¶</a>DDL (Data Definition Language)</h3>
 37<h4 id="create_keyspace"><a class="anchor" href="#create_keyspace">¶</a>Create keyspace</h4>
 38<p>A keyspace is created using a **CREATE KEYSPACE **statement:</p>
 39<pre><code>$ **CREATE** KEYSPACE [ **IF** **NOT** **EXISTS** ] keyspace_name **WITH** options;
 40</code></pre>
 41<p>The supported “<strong>options</strong>” are:</p>
 42<ul>
 43<li>“<strong>replication</strong>”: this is **mandatory **and defines the <strong>replication strategy</strong> and the <strong>replication factor</strong> (the number of nodes that will have a copy of the data). Within this option there is a property called “<strong>class</strong>” in which the <strong>replication strategy</strong> is specified (“SimpleStrategy” or “NetworkTopologyStrategy”)</li>
 44<li>“<strong>durable_writes</strong>”: this is <strong>not mandatory</strong> and it is possible to use the <strong>commit logs for updates</strong>.
 45Attempting to create an already existing keyspace will return an error unless the **IF NOT EXISTS **directive is used. </li>
 46</ul>
 47<p>The example associated to this statement is create a keyspace with name “test_keyspace” with “SimpleStrategy” as “class” of replication and a “replication_factor” of 3.</p>
 48<pre><code>**CREATE** KEYSPACE test_keyspace
 49    **WITH** **replication** = {'class': 'SimpleStrategy',
 50                        'replication_factor' : 3};
 51</code></pre>
 52<p>The **USE **statement allows to <strong>change</strong> the current <strong>keyspace</strong>. The syntax of this statement is very simple: </p>
 53<pre><code>**USE** keyspace_name;
 54</code></pre>
 55<p><img src="RDWIG2RwvEevUFQv6TGFtGzRm4_9ERpxPf0feriflaj3alvWw3FEIAr_ZdF1.png" alt="" />
 56<em>USE statement</em></p>
 57<p>It is also possible to get the metadata from a keyspace with the **DESCRIBE **statement.</p>
 58<pre><code>**DESCRIBE** KEYSPACES | KEYSPACE keyspace_name;
 59</code></pre>
 60<h4 id="create_table"><a class="anchor" href="#create_table">¶</a>Create table</h4>
 61<p>Creating a new table uses the **CREATE TABLE **statement:</p>
 62<pre><code>**CREATE** **TABLE** [ **IF** **NOT** **EXISTS** ] table_name
 63    '('
 64        column_definition
 65        ( ',' column_definition )*
 66        [ ',' **PRIMARY** **KEY** '(' primary_key ')' ]
 67    ')' [ **WITH** table_options ];
 68</code></pre>
 69<p>With “column_definition” as: column_name cql_type [ STATIC ] [ PRIMARY KEY]; “primary_key” as: partition_key [ ‘,’ clustering_columns ]; and “table_options” as: COMPACT STORAGE [ AND table_options ] or CLUSTERING ORDER BY ‘(‘ clustering_order ‘)’ [ AND table_options ] or “options”.</p>
 70<p>Attempting to create an already existing table will return an error unless the <strong>IF NOT EXISTS</strong> directive is used.</p>
 71<p>The <strong>CQL types</strong> are described in the References section.</p>
 72<p>For example, we are going to create a table called “species_table” in the keyspace “test_keyspace” in which we will have a “species” text (as PRIMARY KEY), a “common_name” text, a “population” varint, a “average_size” int and a “sex” text. Besides, we are going to add a comment to the table: “Some species records”;</p>
 73<pre><code>**CREATE** **TABLE** species_table (
 74    species text **PRIMARY** **KEY**,
 75    common_name text,
 76    population varint,
 77    average_size **int**,
 78    sex text,
 79) **WITH** **comment**='Some species records';
 80</code></pre>
 81<p>It is also possible to get the metadata from a table with the **DESCRIBE **statement.</p>
 82<pre><code>**DESCRIBE** **TABLES** | **TABLE** [keyspace_name.]table_name;
 83</code></pre>
 84<h3 id="dml_data_manipulation_language_"><a class="anchor" href="#dml_data_manipulation_language_">¶</a>DML (Data Manipulation Language)</h3>
 85<h4 id="insert_data"><a class="anchor" href="#insert_data">¶</a>Insert data</h4>
 86<p>Inserting data for a row is done using an **INSERT **statement:</p>
 87<pre><code>**INSERT** **INTO** table_name ( names_values | json_clause )
 88                      [ **IF** **NOT** **EXISTS** ]
 89                      [ **USING** update_parameter ( **AND** update_parameter )* ];
 90</code></pre>
 91<p>Where “names_values” is: names VALUES tuple_literal; “json_clause” is: JSON string [ DEFAULT ( NULL | UNSET ) ]; and “update_parameter” is usually: TTL.</p>
 92<p>For example we are going to use both VALUES and JSON clauses to insert data in the table “species_table”. In the VALUES clause it is necessary to supply the list of columns, not as in the JSON clause that is optional.</p>
 93<p>Note: TTL (Time To Live) and Timestamp are metrics for expiring data, so, when the time set is passed, the operation is expired.</p>
 94<p>In the VALUES clause we are going to insert a new specie called “White monkey”, with an average size of 3, its common name is “Copito de nieve”, population 0 and sex “male”.</p>
 95<pre><code>**INSERT** **INTO** species_table (species, common_name, population, average_size, sex)
 96                **VALUES** ('White monkey', 'Copito de nieve', 0, 3, 'male');
 97</code></pre>
 98<p>In the JSON clause we are going to insert a new specie called “Cloned sheep”, with an average size of 1, its common name is “Dolly the sheep”, population 0 and sex “female”.</p>
 99<pre><code>**INSERT** **INTO** species_table JSON '{&quot;species&quot;: &quot;Cloned Sheep&quot;,
100                              &quot;common_name&quot;: &quot;Dolly the Sheep&quot;,
101                              &quot;average_size&quot;:1,
102                              &quot;population&quot;:0,
103                              &quot;sex&quot;: &quot;female&quot;}';
104</code></pre>
105<p>Note: all updates for an **INSERT **are applied **atomically **and in <strong>isolation.</strong></p>
106<h2 id="read"><a class="anchor" href="#read">¶</a>Read</h2>
107<p>Querying data from data is done using a **SELECT **statement:</p>
108<pre><code>**SELECT** [ JSON | **DISTINCT** ] ( select_clause | '*' )
109                      **FROM** table_name
110                      [ **WHERE** where_clause ]
111                      [ **GROUP** **BY** group_by_clause ]
112                      [ **ORDER** **BY** ordering_clause ]
113                      [ PER **PARTITION** **LIMIT** (**integer** | bind_marker) ]
114                      [ **LIMIT** (**integer** | bind_marker) ]
115                      [ ALLOW FILTERING ];
116</code></pre>
117<p>The **CQL SELECT **statement is very **similar **to the **SQL SELECT **statement due to the fact that both allows filtering (<strong>WHERE</strong>), grouping data (<strong>GROUP BY</strong>), ordering the data (<strong>ORDER BY</strong>) and limit the number of data (<strong>LIMIT</strong>). Besides, **CQL offers **a **limit per partition **and allow the **filtering **of <strong>data</strong>.</p>
118<p>Note: as in SQL it it possible to set alias to the data with the statement <strong>AS.</strong></p>
119<p>For example we are going to retrieve all the information about those values from the tables “species_table” which “sex” is “male”. Allow filtering is mandatory when there is a WHERE statement.</p>
120<pre><code>**SELECT** * **FROM** species_table **WHERE** sex = 'male' ALLOW FILTERING;
121</code></pre>
122<p><img src="s6GrKIGATvOSD7oGRNScUU5RnLN_-3X1JXvnVi_wDT_hrmPMZdnCdBI8DpIJ.png" alt="" />
123<em>SELECT statement</em></p>
124<p>Furthermore, we are going to test the SELECT JSON statement. For this, we are going to retrieve only the species name with a population of 0. </p>
125<pre><code>**SELECT** JSON species **FROM** species_table **WHERE** population = 0 ALLOW FILTERING;
126</code></pre>
127<p><img src="Up_eHlqKQp2RI5XIbgPOvj1B5J3gLxz7v7EI0NDRgezQTipecdfDT6AQoso0.png" alt="" />
128<em>SELECT JSON statement</em></p>
129<h2 id="update"><a class="anchor" href="#update">¶</a>Update</h2>
130<h3 id="ddl_data_definition_language__2"><a class="anchor" href="#ddl_data_definition_language__2">¶</a>DDL (Data Definition Language)</h3>
131<h4 id="alter_keyspace"><a class="anchor" href="#alter_keyspace">¶</a>Alter keyspace</h4>
132<p>The statement **ALTER KEYSPACE **allows to modify the options of a keyspace:</p>
133<pre><code>**ALTER** KEYSPACE keyspace_name **WITH** options;
134</code></pre>
135<p>Note: the supported **options **are the same than for creating a keyspace, “<strong>replication</strong>” and “<strong>durable_writes</strong>”.</p>
136<p>The example associated to this statement is to modify the keyspace with name “test_keyspace” and set a “replication_factor” of 4.</p>
137<pre><code>**ALTER** KEYSPACE test_keyspace
138    **WITH** **replication** = {'class': 'SimpleStrategy', 'replication_factor' : 4};
139</code></pre>
140<h4 id="alter_table"><a class="anchor" href="#alter_table">¶</a>Alter table</h4>
141<p>Altering an existing table uses the **ALTER TABLE **statement:</p>
142<pre><code>**ALTER** **TABLE** table_name alter_table_instruction;
143</code></pre>
144<p>Where “alter_table_instruction” can be: ADD column_name cql_type ( ‘,’ column_name cql_type )<em>; or DROP column_name ( column_name )</em>; or WITH options</p>
145<p>The example associated to this statement is to ADD a new column to the table “species_table”, called “extinct” with type “boolean”.</p>
146<pre><code>**ALTER** **TABLE** species_table **ADD** extinct **boolean**;
147</code></pre>
148<p>Another example is to DROP the column called “sex” from the table “species_table”.</p>
149<pre><code>**ALTER** **TABLE** species_table **DROP** sex;
150</code></pre>
151<p>Finally, alter the comment with the WITH clause and set the comment to “All species records”. </p>
152<pre><code>**ALTER** **TABLE** species_table **WITH** **comment**='All species records';
153</code></pre>
154<p>These changes can be checked with the **DESCRIBE **statement:</p>
155<pre><code>**DESCRIBE** **TABLE** species_table;
156</code></pre>
157<p><img src="xebKPqkWkn97YVHpRVXZYWvRUfeRUyCH-vPDs67aFaEeU53YTRbDOFscOlAr.png" alt="" />
158<em>DESCRIBE table</em></p>
159<h3 id="dml_data_manipulation_language__2"><a class="anchor" href="#dml_data_manipulation_language__2">¶</a>DML (Data Manipulation Language)</h3>
160<h4 id="update_data"><a class="anchor" href="#update_data">¶</a>Update data</h4>
161<p>Updating a row is done using an **UPDATE **statement:</p>
162<pre><code>**UPDATE** table_name
163[ **USING** update_parameter ( **AND** update_parameter )* ]
164**SET** assignment ( ',' assignment )*
165**WHERE** where_clause
166[ **IF** ( **EXISTS** | condition ( **AND** condition )*) ];
167</code></pre>
168<p>Where the update_parameter is: ( TIMESTAMP | TTL) (integer | bind_marker)</p>
169<p>It is important to mention that the **WHERE **clause is used to select the row to update and **must <strong>include ** all columns</strong> composing the <strong>PRIMARY KEY.</strong></p>
170<p>We are going to test this statement updating the column “extinct” to true to the column with name ‘White monkey’.</p>
171<pre><code>**UPDATE** species_table **SET** extinct = **true** **WHERE** species='White monkey';
172</code></pre>
173<p><img src="IcaCe6VEC5c0ZQIygz-CiclzFyt491u7xPMg2muJLR8grmqaiUzkoQsVCoHf.png" alt="" />
174<em>SELECT statement</em></p>
175<h2 id="delete"><a class="anchor" href="#delete">¶</a>Delete</h2>
176<h3 id="ddl_data_definition_language__3"><a class="anchor" href="#ddl_data_definition_language__3">¶</a>DDL (Data Definition Language)</h3>
177<h4 id="drop_keyspace"><a class="anchor" href="#drop_keyspace">¶</a>Drop keyspace</h4>
178<p>Dropping a keyspace can be done using the **DROP KEYSPACE **statement:</p>
179<pre><code>**DROP** KEYSPACE [ **IF** **EXISTS** ] keyspace_name;
180</code></pre>
181<p>For example, drop the keyspace called “test_keyspace_2” if it exists:</p>
182<pre><code>**DROP** KEYSPACE **IF** **EXISTS** test_keyspace_2;
183</code></pre>
184<p>As this keyspace does not exists, this sentence will do nothing.</p>
185<h4 id="drop_table"><a class="anchor" href="#drop_table">¶</a>Drop table</h4>
186<p>Dropping a table uses the **DROP TABLE **statement:</p>
187<pre><code>**DROP** **TABLE** [ **IF** **EXISTS** ] table_name;
188</code></pre>
189<p>For example, drop the table called “species_2” if it exists: </p>
190<pre><code>**DROP** **TABLE** **IF** **EXISTS** species_2;
191</code></pre>
192<p>As this table does not exists, this sentence will do nothing.</p>
193<h4 id="truncate_table_"><a class="anchor" href="#truncate_table_">¶</a>Truncate (table)</h4>
194<p>A table can be truncated using the **TRUNCATE **statement:</p>
195<pre><code>**TRUNCATE** [ **TABLE** ] table_name;
196</code></pre>
197<p>Do not execute this command now, because if you do it, you will need to insert the previous data again.</p>
198<p>Note: as tables are the only object that can be truncated the keyword TABLE can be omitted.</p>
199<p><img src="FOkhfpxlWFQCzcdfeWxLTy7wx5inDv0xwVeVhE79Pqtk3yYzWsZJnz_SBhUi.png" alt="" />
200<em>TRUNCATE statement</em></p>
201<h3 id="dml_data_manipulation_language__3"><a class="anchor" href="#dml_data_manipulation_language__3">¶</a>DML (Data Manipulation Language)</h3>
202<h4 id="delete_data"><a class="anchor" href="#delete_data">¶</a>Delete data</h4>
203<p>Deleting rows or parts of rows uses the **DELETE **statement:</p>
204<pre><code>**DELETE** [ simple_selection ( ',' simple_selection ) ]
205                      **FROM** table_name
206                      [ **USING** update_parameter ( **AND** update_parameter )* ]
207                      **WHERE** where_clause
208                      [ **IF** ( **EXISTS** | condition ( **AND** condition )*) ]
209</code></pre>
210<p>Now we are going to delete the value of the column “average_size” from “Cloned Sheep”. </p>
211<pre><code>**DELETE** average_size **FROM** species_table **WHERE** species = 'Cloned Sheep';
212</code></pre>
213<p><img src="CyuQokVL5J9TAelq-WEWhNl6kFtbIYs0R1AeU5NX4EkG-YQI81mNHdnf2yWN.png" alt="" />
214<em>DELETE value statement</em></p>
215<p>And we are going to delete the same row as mentioned before. </p>
216<pre><code>**DELETE** **FROM** species_table **WHERE** species = 'Cloned Sheep';
217</code></pre>
218<p><img src="jvQ5cXJ5GTVQ6giVhBEpPJmrJw-zwKKyB9nsTm5PRcGSTzkmh-WO4kTeuLpB.png" alt="" />
219<em>DELETE row statement</em></p>
220<h2 id="batch"><a class="anchor" href="#batch">¶</a>Batch</h2>
221<p>Multiple <strong>INSERT</strong>, **UPDATE **and **DELETE **can be executed in a <strong>single statement</strong> by grouping them through a **BATCH **statement.</p>
222<pre><code>**BEGIN** [ UNLOGGED | COUNTER ] BATCH
223                            [ **USING** update_parameter ( **AND** update_parameter )* ]
224                            modification_statement ( ';' modification_statement )*
225                            APPLY BATCH;
226</code></pre>
227<p>Where modification_statement can be a insert_statement or an update_statement or a delete_statement.</p>
228<ul>
229<li>**UNLOGGED **means that either all operations in a batch eventually complete or none will.</li>
230<li><strong>COUNTER</strong> means that the updates are not idempotent, so each time we execute the updates in a batch, we will have different results.
231For example:</li>
232</ul>
233<pre><code>**BEGIN** BATCH
234   **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
235                **VALUES** ('Blue Shark', 'Tiburillo', 30, 10, **false**);
236   **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
237                **VALUES** ('Cloned sheep', 'Dolly the Sheep', 1, 1, **true**);
238   **UPDATE** species_table **SET** population = 2 **WHERE** species='Cloned sheep';
239   **DELETE** **FROM** species_table **WHERE** species =  'White monkey';
240APPLY BATCH;
241</code></pre>
242<p><img src="EL9Dac26o0FqkVoeAKmopEKQe0wWq-xYI14b9RzGxtUkFJA3i2eTiR6qkuuJ.png" alt="" />
243<em>BATCH statement</em></p>
244<h2 id="index"><a class="anchor" href="#index">¶</a>Index</h2>
245<p>CQL support creating secondary indexes on tables, allowing queries on the table to use those indexes. </p>
246<p>**Creating **a secondary index on a table uses the **CREATE INDEX **statement:</p>
247<pre><code>**CREATE** [ CUSTOM ] **INDEX** [ **IF** **NOT** **EXISTS** ] [ index_name ]
248                                **ON** table_name '(' index_identifier ')'
249                                [ **USING** string [ **WITH** OPTIONS = map_literal ] ];
250</code></pre>
251<p>For example we are going to create a index called “population_idx” that is related to the column “population” in the table “species_table”.</p>
252<pre><code>**CREATE** **INDEX** population_idx **ON** species_table (population);
253</code></pre>
254<p>**Dropping **a secondary index uses the <strong>DROP INDEX</strong> statement: </p>
255<pre><code>**DROP** **INDEX** [ **IF** **EXISTS** ] index_name;
256</code></pre>
257<p>Now, we are going to drop the previous index: </p>
258<pre><code>**DROP** **INDEX** **IF** **EXISTS** population_idx;
259</code></pre>
260<h2 id="references"><a class="anchor" href="#references">¶</a>References</h2>
261<ul>
262<li><a href="https://cassandra.apache.org/doc/latest/cql/ddl.html">Cassandra CQL</a></li>
263<li><a href="https://techdifferences.com/difference-between-ddl-and-dml-in-dbms.html">Differences between DML and DDL</a></li>
264<li><a href="https://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cqlReferenceTOC.html">Datastax CQL</a></li>
265<li><a href="https://cassandra.apache.org/doc/latest/cql/types.html#grammar-token-cql-type">Cassandra CQL Types</a></li>
266<li><a href="https://cassandra.apache.org/doc/latest/cql/indexes.html">Cassandra Index</a></li>
267</ul>
268</main>
269</body>
270</html>
271