all repos — gemini-redirect @ 292825669970b7c009ac202b096f9a6960f7d265

content/blog/ribw/cassandra-an-introduction/index.md (view raw)

  1+++
  2title = "Cassandra: an Introduction"
  3date = 2020-03-05T00:00:45+00:00
  4updated = 2020-03-18T09:47:05+00:00
  5+++
  6
  7This is the first post in the Cassandra series, where we will introduce the Cassandra database system and take a look at its features and installation methods.
  8
  9Other posts in this series:
 10
 11* [Cassandra: an Introduction](/blog/ribw/cassandra-an-introduction/) (this post)
 12
 13This post is co-authored wih Classmate.
 14
 15----------
 16
 17![NoSQL database – Apache Cassandra – First delivery](cassandra-database-e1584191543401.jpg)
 18
 19## Purpose of technology
 20
 21Apache Cassandra is a **NoSQL**, **open-source**, **distributed “key-value” database**. It allows **large volumes of distributed data**. The main **goal **is provide **linear scalability and availabilitywithout compromising performance**. Besides, Cassandra **supports replication** across multiple datacenters, providing low latency.
 22
 23## How it works
 24
 25Cassandra’s distributed **architecture **is based on a series of **equal nodes** that communicate with a **P2P protocol** so that **redundancy is maximum**. It offers robust support for multiple datacenters, with **asynchronous replication** without the need for a master server.
 26
 27Besides, Cassandra’s **data model consists of partitioning the rows**, which are rearranged into **different tables**. The primary keys of each table have a first component that is the **partition key**. Within a partition, the rows are grouped by the remaining columns of the key. The other columns can be indexed separately from the primary key.
 28
 29These tables can be **created, deleted, updated and queried****at runtime without blocking** each other. However it does **not support joins or subqueries**, but instead **emphasizes denormalization** through features like collections.
 30
 31Nowadays, Cassandra uses its own query language called **CQL** (**Cassandra Query Language**), with a **similar syntax to SQL**. It also allows access from **JDBC**.
 32
 33![](s0GHpggGZXOFcdhypRWV4trU-PkSI6lukEv54pLZnoirh0GlDVAc4LamB1Dy.png)
 34_ Cassandra architecture _
 35
 36## Features
 37
 38* **Decentralized**: there are **no single points of failure**, every **node **in the cluster has the **same role** and there is **no master node**, so each node **can service any request**, besides the data is distributed across the cluster.
 39* Supports **replication **and multiple replication of **data center**: the replication strategies are **configurable**.
 40* **Scalability: **reading and writing performance increases linearly as new nodes are added, also **new nodes** can be **added without interrupting** application **execution**.
 41* **Fault tolerance: data replication** is done **automatically **in several nodes in order to recover from failures. It is possible to **replace failure nodes****without **making** inactivity time or interruptions** to the application.
 42* **Consistency: **a choice of consistency level is provided for **reading and writing**.
 43* **MapReduce support**: it is **integrated **with **Apache Hadoop** to support MapReduce.
 44* **Query language**: it has its own query language called **CQL (Cassandra Query Language) **
 45
 46## Corner in CAP theorem
 47
 48**Apache Cassandra** is usually described as an “**AP**” system because it guarantees **availability** and **partition/fault tolerance**. So it errs on the side of ensuring data availability even if this means **sacrificing consistency**. But, despite this fact, Apache Cassandra **seeks to satisfy all three requirements** (Consistency, Availability and Fault tolerance) simultaneously and can be **configured to behave** like a “**CP**” database, guaranteeing **consistency and partition/fault tolerance**.
 49
 50![](rf3n9LTOKCQVbx4qrn7NPSVcRcwE1LxR_khi-9Qc51Hcbg6BHHPu-0GZjUwD.png)
 51_Cassandra in CAP Theorem_
 52
 53## Download
 54
 55In order to download the file, with extension .tar.gz. you must visit the [download site](https://cassandra.apache.org/download/) and click on the file “[https://ftp.cixug.es/apache/cassandra/3.11.6/apache-cassandra-3.11.6-bin.tar.gz](https://ftp.cixug.es/apache/cassandra/3.11.6/apache-cassandra-3.11.6-bin.tar.gz)”. It is important to mention that the previous link is related to the 3.11.6 version.
 56
 57## Installation
 58
 59This database can only be installed on Linux distributions and Mac OS X systems, so, it is not possible to install it on Microsoft Windows.
 60
 61The first main requirement is having installed Java 8 in **Ubuntu**, the OS that we will use. Therefore, the Java 8 installation is explained below. First open a terminal and execute the next command:
 62
 63```
 64sudo apt update
 65sudo apt install openjdk-8-jdk openjdk-8-jre
 66```
 67
 68In order to establish Java as a environment variable it is needed to open the file “/.bashrc”:
 69
 70```
 71nano ~/.bashrc
 72```
 73
 74And add at the end of it the path where Java is installed, as follows:
 75
 76```
 77export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
 78export PATH=$PATH:$JAVA_HOME/bin
 79```
 80
 81 At this point, save the file and execute the next command, note that it does the same effect re-opening the terminal:
 82
 83```
 84source ~/.bashrc
 85```
 86
 87In order to check if the Java environment variable is set correctly, run the next command:
 88
 89```
 90echo $JAVA_HOME
 91```
 92
 93![](JUUmX5MIHynJR_K9EdCgKeJcpINeCGRRt2QRu4JLPtRhCVidOhcbWwVTQjyu.png)
 94_$JAVAHOME variable_
 95
 96Afterwards, it is possible to check the installed Java version with the command:
 97
 98```
 99java -version
100```
101
102![](z9v1-0hpZwjI4U5UZej9cRGN5-Y4AZl0WUPWyQ_-JlzTAIvZtTFPnKY2xMQ_.png)
103_Java version_
104
105The next requirement is having installed the latest version of Python 2.7. This can be checked with the command:
106
107```
108python --version
109```
110
111If it is not installed, to install it, it is as simple as run the next command in the terminal:
112
113```
114sudo apt install python
115```
116
117Note: it is better to use “python2” instead of “python” because in that way, you force to user Python 2.7. Modern distributions will use Python 3 for the «python» command.
118
119Therefore, it is possible to check the installed Python version with the command:
120
121```
122python --version
123```
124
125![](Ger5Vw_e1HIK84QgRub-BwGmzIGKasgiYb4jHdfRNRrvG4d6Msp_3Vk62-9i.png)
126_Python version_
127
128Once both requirements are ready, next step is to unzip the file previously downloaded, right click on the file and select “Extract here” or with the next command, on the directory where is the downloaded file.
129
130```
131tar -zxvf apache-cassandra-x.x.x-bin.tar.gz
132```
133
134In order to check if the installation is completed, you can execute the next command, in the root folder of the project. This will start Cassandra in a single node.
135
136```
137/bin/cassandra
138```
139
140It is possible to make a get some data from Cassandra with CQL (Cassandra Query Language). To check this execute the next command in another terminal.
141
142```
143/bin/cqlsh localhost
144```
145
146Once CQL is open, type the next sentence and check the result:
147
148```
149SELECT cluster_name, listen_address from system.local;
150```
151
152The output should be:
153
154![](miUO60A-RtyEAOOVFJqlkPRC18H4RKUhot6RWzhO9FmtzgTPOYHFtwxqgZEf.png)
155_Sentence output_
156
157Finally, the installation guide provided by the website of the database is attached in this [installation guide](https://cassandra.apache.org/doc/latest/getting_started/installing.html).
158
159## References
160
161* [Wikipedia](https://es.wikipedia.org/wiki/Apache_Cassandra)
162* [Apache Cassandra](https://cassandra.apache.org/)
163* [Datastax](https://www.datastax.com/blog/2019/05/how-apache-cassandratm-balances-consistency-availability-and-performance)
164* [yugabyte](https://blog.yugabyte.com/apache-cassandra-architecture-how-it-works-lightweight-transactions/)