git rekt — gemini-redirect.git (8248bf94b88af8ff84f75660dc540d9f284f2155): content/blog/ribw/nosql-databases-basic-operations-and-architecture/index.md

content/blog/ribw/nosql-databases-basic-operations-and-architecture/index.md (view raw)
  1+++
  2title = "Cassandra: Basic Operations and Architecture"
  3date = 2020-03-05T02:00:36+00:00
  4updated = 2020-03-24T17:57:05+00:00
  5+++
  6
  7This is the second post in the NoSQL Databases series, with a brief description on the basic operations (such as insertion, retrieval, indexing…), and complete execution along with the data model / architecture.
  8
  9Other posts in this series:
 10
 11* [Cassandra: an Introduction](/blog/ribw/nosql-databases-an-introduction/)
 12* [Cassandra: Basic Operations and Architecture](/blog/ribw/nosql-databases-basic-operations-and-architecture/) (this post)
 13
 14----------
 15
 16Cassandra uses it own Query Language for managing the databases, it is known as **CQL **(**Cassandra Query Language**). Cassandra stores data in **_tables_**, as in relational databases, and these tables are grouped in **_keyspaces_**. A keyspace defines a number of options that applies to all the tables it contains. The most used option is the **replication strategy. **It is recommended to have only one keyspace by application.
 17
 18It is important to mention that **tables and keyspaces** are** case insensitive**, so myTable is equivalent to mytable, but it is possible to **force case sensitivity** using **double-quotes**.
 19
 20To begin with the basic operations it is necessary to deploy Cassandra:
 21
 221. Open a terminal in the root of the Apache Cassandra folder downloaded in the previous post.
 232. Run the command:
 24
 25```
 26$ bin/cassandra
 27```
 28
 29Once Cassandra is deployed, it is time to open a** CQL Shell**, in **other terminal**, with the command:
 30
 31```
 32$ bin/cqlsh
 33```
 34
 35It is possible to check if Cassandra is deployed if the SQL Shell prints the next message:
 36
 37![](uwqQgQte-cuYb_pePFOuY58re23kngrDKNgL1qz4yOfnBDZkqMIH3fFuCrye.png)
 38_CQL Shell_
 39
 40## Create/Insert
 41
 42### DDL (Data Definition Language)
 43
 44#### Create keyspace
 45
 46A keyspace is created using a **CREATE KEYSPACE **statement:
 47
 48```
 49$ **CREATE** KEYSPACE [ **IF** **NOT** **EXISTS** ] keyspace_name **WITH** options;
 50```
 51
 52The supported “**options**” are:
 53
 54* “**replication**”: this is **mandatory **and defines the **replication strategy** and the **replication factor** (the number of nodes that will have a copy of the data). Within this option there is a property called “**class**” in which the **replication strategy** is specified (“SimpleStrategy” or “NetworkTopologyStrategy”)
 55* “**durable_writes**”: this is **not mandatory** and it is possible to use the **commit logs for updates**.
 56Attempting to create an already existing keyspace will return an error unless the **IF NOT EXISTS **directive is used.
 57
 58The example associated to this statement is create a keyspace with name “test_keyspace” with “SimpleStrategy” as “class” of replication and a “replication_factor” of 3.
 59
 60```
 61**CREATE** KEYSPACE test_keyspace
 62    **WITH** **replication** = {'class': 'SimpleStrategy',
 63                        'replication_factor' : 3};
 64```
 65
 66The **USE **statement allows to **change** the current **keyspace**. The syntax of this statement is very simple:
 67
 68```
 69**USE** keyspace_name;
 70```
 71
 72![](RDWIG2RwvEevUFQv6TGFtGzRm4_9ERpxPf0feriflaj3alvWw3FEIAr_ZdF1.png)
 73_USE statement_
 74
 75It is also possible to get the metadata from a keyspace with the **DESCRIBE **statement.
 76
 77```
 78**DESCRIBE** KEYSPACES | KEYSPACE keyspace_name;
 79```
 80
 81#### Create table
 82
 83Creating a new table uses the **CREATE TABLE **statement:
 84
 85```
 86**CREATE** **TABLE** [ **IF** **NOT** **EXISTS** ] table_name
 87    '('
 88        column_definition
 89        ( ',' column_definition )*
 90        [ ',' **PRIMARY** **KEY** '(' primary_key ')' ]
 91    ')' [ **WITH** table_options ];
 92```
 93
 94With “column_definition” as: column_name cql_type [ STATIC ] [ PRIMARY KEY]; “primary_key” as: partition_key [ ‘,’ clustering_columns ]; and “table_options” as: COMPACT STORAGE [ AND table_options ] or CLUSTERING ORDER BY ‘(‘ clustering_order ‘)’ [ AND table_options ] or “options”.
 95
 96Attempting to create an already existing table will return an error unless the **IF NOT EXISTS** directive is used.
 97
 98The **CQL types** are described in the References section.
 99
100For example, we are going to create a table called “species_table” in the keyspace “test_keyspace” in which we will have a “species” text (as PRIMARY KEY), a “common_name” text, a “population” varint, a “average_size” int and a “sex” text. Besides, we are going to add a comment to the table: “Some species records”;
101
102```
103**CREATE** **TABLE** species_table (
104    species text **PRIMARY** **KEY**,
105    common_name text,
106    population varint,
107    average_size **int**,
108    sex text,
109) **WITH** **comment**='Some species records';
110```
111
112It is also possible to get the metadata from a table with the **DESCRIBE **statement.
113
114```
115**DESCRIBE** **TABLES** | **TABLE** [keyspace_name.]table_name;
116```
117
118### DML (Data Manipulation Language)
119
120#### Insert data
121
122Inserting data for a row is done using an **INSERT **statement:
123
124```
125**INSERT** **INTO** table_name ( names_values | json_clause )
126                      [ **IF** **NOT** **EXISTS** ]
127                      [ **USING** update_parameter ( **AND** update_parameter )* ];
128```
129
130Where “names_values” is: names VALUES tuple_literal; “json_clause” is: JSON string [ DEFAULT ( NULL | UNSET ) ]; and “update_parameter” is usually: TTL.
131
132For example we are going to use both VALUES and JSON clauses to insert data in the table “species_table”. In the VALUES clause it is necessary to supply the list of columns, not as in the JSON clause that is optional.
133
134Note: TTL (Time To Live) and Timestamp are metrics for expiring data, so, when the time set is passed, the operation is expired.
135
136In the VALUES clause we are going to insert a new specie called “White monkey”, with an average size of 3, its common name is “Copito de nieve”, population 0 and sex “male”.
137
138```
139**INSERT** **INTO** species_table (species, common_name, population, average_size, sex)
140                **VALUES** ('White monkey', 'Copito de nieve', 0, 3, 'male');
141```
142
143In the JSON clause we are going to insert a new specie called “Cloned sheep”, with an average size of 1, its common name is “Dolly the sheep”, population 0 and sex “female”.
144
145```
146**INSERT** **INTO** species_table JSON '{"species": "Cloned Sheep",
147                              "common_name": "Dolly the Sheep",
148                              "average_size":1,
149                              "population":0,
150                              "sex": "female"}';
151```
152
153Note: all updates for an **INSERT **are applied **atomically **and in **isolation.**
154
155## Read
156
157Querying data from data is done using a **SELECT **statement:
158
159```
160**SELECT** [ JSON | **DISTINCT** ] ( select_clause | '*' )
161                      **FROM** table_name
162                      [ **WHERE** where_clause ]
163                      [ **GROUP** **BY** group_by_clause ]
164                      [ **ORDER** **BY** ordering_clause ]
165                      [ PER **PARTITION** **LIMIT** (**integer** | bind_marker) ]
166                      [ **LIMIT** (**integer** | bind_marker) ]
167                      [ ALLOW FILTERING ];
168```
169
170The **CQL SELECT **statement is very **similar **to the **SQL SELECT **statement due to the fact that both allows filtering (**WHERE**), grouping data (**GROUP BY**), ordering the data (**ORDER BY**) and limit the number of data (**LIMIT**). Besides, **CQL offers **a **limit per partition **and allow the **filtering **of **data**.
171
172Note: as in SQL it it possible to set alias to the data with the statement **AS.**
173
174For example we are going to retrieve all the information about those values from the tables “species_table” which “sex” is “male”. Allow filtering is mandatory when there is a WHERE statement.
175
176```
177**SELECT** * **FROM** species_table **WHERE** sex = 'male' ALLOW FILTERING;
178```
179
180![](s6GrKIGATvOSD7oGRNScUU5RnLN_-3X1JXvnVi_wDT_hrmPMZdnCdBI8DpIJ.png)
181_SELECT statement_
182
183Furthermore, we are going to test the SELECT JSON statement. For this, we are going to retrieve only the species name with a population of 0.
184
185```
186**SELECT** JSON species **FROM** species_table **WHERE** population = 0 ALLOW FILTERING;
187```
188
189![](Up_eHlqKQp2RI5XIbgPOvj1B5J3gLxz7v7EI0NDRgezQTipecdfDT6AQoso0.png)
190_SELECT JSON statement_
191
192## Update
193
194### DDL (Data Definition Language)
195
196#### Alter keyspace
197
198The statement **ALTER KEYSPACE **allows to modify the options of a keyspace:
199
200```
201**ALTER** KEYSPACE keyspace_name **WITH** options;
202```
203
204Note: the supported **options **are the same than for creating a keyspace, “**replication**” and “**durable_writes**”.
205
206The example associated to this statement is to modify the keyspace with name “test_keyspace” and set a “replication_factor” of 4.
207
208```
209**ALTER** KEYSPACE test_keyspace
210    **WITH** **replication** = {'class': 'SimpleStrategy', 'replication_factor' : 4};
211```
212
213#### Alter table
214
215Altering an existing table uses the **ALTER TABLE **statement:
216
217```
218**ALTER** **TABLE** table_name alter_table_instruction;
219```
220
221Where “alter_table_instruction” can be: ADD column_name cql_type ( ‘,’ column_name cql_type )*; or DROP column_name ( column_name )*; or WITH options
222
223The example associated to this statement is to ADD a new column to the table “species_table”, called “extinct” with type “boolean”.
224
225```
226**ALTER** **TABLE** species_table **ADD** extinct **boolean**;
227```
228
229Another example is to DROP the column called “sex” from the table “species_table”.
230
231```
232**ALTER** **TABLE** species_table **DROP** sex;
233```
234
235 Finally, alter the comment with the WITH clause and set the comment to “All species records”.
236
237```
238**ALTER** **TABLE** species_table **WITH** **comment**='All species records';
239```
240
241These changes can be checked with the **DESCRIBE **statement:
242
243```
244**DESCRIBE** **TABLE** species_table;
245```
246
247![](xebKPqkWkn97YVHpRVXZYWvRUfeRUyCH-vPDs67aFaEeU53YTRbDOFscOlAr.png)
248_DESCRIBE table_
249
250### DML (Data Manipulation Language)
251
252#### Update data
253
254Updating a row is done using an **UPDATE **statement:
255
256```
257**UPDATE** table_name
258[ **USING** update_parameter ( **AND** update_parameter )* ]
259**SET** assignment ( ',' assignment )*
260**WHERE** where_clause
261[ **IF** ( **EXISTS** | condition ( **AND** condition )*) ];
262```
263
264Where the update_parameter is: ( TIMESTAMP | TTL) (integer | bind_marker)
265
266It is important to mention that the **WHERE **clause is used to select the row to update and **must **include ** all columns** composing the **PRIMARY KEY.**
267
268We are going to test this statement updating the column “extinct” to true to the column with name ‘White monkey’.
269
270```
271**UPDATE** species_table **SET** extinct = **true** **WHERE** species='White monkey';
272```
273
274![](IcaCe6VEC5c0ZQIygz-CiclzFyt491u7xPMg2muJLR8grmqaiUzkoQsVCoHf.png)
275_SELECT statement_
276
277## Delete
278
279### DDL (Data Definition Language)
280
281#### Drop keyspace
282
283Dropping a keyspace can be done using the **DROP KEYSPACE **statement:
284
285```
286**DROP** KEYSPACE [ **IF** **EXISTS** ] keyspace_name;
287```
288
289For example, drop the keyspace called “test_keyspace_2” if it exists:
290
291```
292**DROP** KEYSPACE **IF** **EXISTS** test_keyspace_2;
293```
294
295As this keyspace does not exists, this sentence will do nothing.
296
297#### Drop table
298
299Dropping a table uses the **DROP TABLE **statement:
300
301```
302**DROP** **TABLE** [ **IF** **EXISTS** ] table_name;
303```
304
305 For example, drop the table called “species_2” if it exists:
306
307```
308**DROP** **TABLE** **IF** **EXISTS** species_2;
309```
310
311As this table does not exists, this sentence will do nothing.
312
313#### Truncate (table)
314
315A table can be truncated using the **TRUNCATE **statement:
316
317```
318**TRUNCATE** [ **TABLE** ] table_name;
319```
320
321Do not execute this command now, because if you do it, you will need to insert the previous data again.
322
323Note: as tables are the only object that can be truncated the keyword TABLE can be omitted.
324
325![](FOkhfpxlWFQCzcdfeWxLTy7wx5inDv0xwVeVhE79Pqtk3yYzWsZJnz_SBhUi.png)
326_TRUNCATE statement_
327
328### DML (Data Manipulation Language)
329
330#### Delete data
331
332Deleting rows or parts of rows uses the **DELETE **statement:
333
334```
335**DELETE** [ simple_selection ( ',' simple_selection ) ]
336                      **FROM** table_name
337                      [ **USING** update_parameter ( **AND** update_parameter )* ]
338                      **WHERE** where_clause
339                      [ **IF** ( **EXISTS** | condition ( **AND** condition )*) ]
340```
341
342Now we are going to delete the value of the column “average_size” from “Cloned Sheep”.
343
344```
345**DELETE** average_size **FROM** species_table **WHERE** species = 'Cloned Sheep';
346```
347
348![](CyuQokVL5J9TAelq-WEWhNl6kFtbIYs0R1AeU5NX4EkG-YQI81mNHdnf2yWN.png)
349_DELETE value statement_
350
351And we are going to delete the same row as mentioned before.
352
353```
354**DELETE** **FROM** species_table **WHERE** species = 'Cloned Sheep';
355```
356
357![](jvQ5cXJ5GTVQ6giVhBEpPJmrJw-zwKKyB9nsTm5PRcGSTzkmh-WO4kTeuLpB.png)
358_DELETE row statement_
359
360## Batch
361
362Multiple **INSERT**, **UPDATE **and **DELETE **can be executed in a **single statement** by grouping them through a **BATCH **statement.
363
364```
365**BEGIN** [ UNLOGGED | COUNTER ] BATCH
366                            [ **USING** update_parameter ( **AND** update_parameter )* ]
367                            modification_statement ( ';' modification_statement )*
368                            APPLY BATCH;
369```
370
371Where modification_statement can be a insert_statement or an update_statement or a delete_statement.
372
373* **UNLOGGED **means that either all operations in a batch eventually complete or none will.
374* **COUNTER** means that the updates are not idempotent, so each time we execute the updates in a batch, we will have different results.
375For example:
376
377```
378**BEGIN** BATCH
379   **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
380                **VALUES** ('Blue Shark', 'Tiburillo', 30, 10, **false**);
381   **INSERT** **INTO** species_table (species, common_name, population, average_size, extinct)
382                **VALUES** ('Cloned sheep', 'Dolly the Sheep', 1, 1, **true**);
383   **UPDATE** species_table **SET** population = 2 **WHERE** species='Cloned sheep';
384   **DELETE** **FROM** species_table **WHERE** species =  'White monkey';
385APPLY BATCH;
386```
387
388![](EL9Dac26o0FqkVoeAKmopEKQe0wWq-xYI14b9RzGxtUkFJA3i2eTiR6qkuuJ.png)
389_BATCH statement_
390
391## Index
392
393CQL support creating secondary indexes on tables, allowing queries on the table to use those indexes.
394
395**Creating **a secondary index on a table uses the **CREATE INDEX **statement:
396
397```
398**CREATE** [ CUSTOM ] **INDEX** [ **IF** **NOT** **EXISTS** ] [ index_name ]
399                                **ON** table_name '(' index_identifier ')'
400                                [ **USING** string [ **WITH** OPTIONS = map_literal ] ];
401```
402
403For example we are going to create a index called “population_idx” that is related to the column “population” in the table “species_table”.
404
405```
406**CREATE** **INDEX** population_idx **ON** species_table (population);
407```
408
409**Dropping **a secondary index uses the **DROP INDEX** statement:
410
411```
412**DROP** **INDEX** [ **IF** **EXISTS** ] index_name;
413```
414
415Now, we are going to drop the previous index:
416
417```
418**DROP** **INDEX** **IF** **EXISTS** population_idx;
419```
420
421## References
422
423* [Cassandra CQL](https://cassandra.apache.org/doc/latest/cql/ddl.html)
424* [Differences between DML and DDL](https://techdifferences.com/difference-between-ddl-and-dml-in-dbms.html)
425* [Datastax CQL](https://docs.datastax.com/en/dse/5.1/cql/cql/cql_reference/cqlReferenceTOC.html)
426* [Cassandra CQL Types](https://cassandra.apache.org/doc/latest/cql/types.html#grammar-token-cql-type)
427* [Cassandra Index](https://cassandra.apache.org/doc/latest/cql/indexes.html)