git rekt — gemini-redirect.git (697f46b06bde25e3fbc5ea43480eedc1ac516769): blog/ribw/cassandra-an-introduction/index.html

blog/ribw/cassandra-an-introduction/index.html (view raw)
  1<!DOCTYPE html>
  2<html>
  3<head>
  4<meta charset="utf-8" />
  5<meta name="viewport" content="width=device-width, initial-scale=1" />
  6<title>Cassandra: an Introduction</title>
  7<link rel="stylesheet" href="../css/style.css">
  8</head>
  9<body>
 10<main>
 11<p>This is the first post in the Cassandra series, where we will introduce the Cassandra database system and take a look at its features and installation methods.</p>
 12<div class="date-created-modified">Created 2020-03-05<br>
 13Modified 2020-03-18</div>
 14<p>Other posts in this series:</p>
 15<ul>
 16<li><a href="/blog/ribw/cassandra-an-introduction/">Cassandra: an Introduction</a> (this post)</li>
 17</ul>
 18<p>This post is co-authored wih Classmate.</p>
 19<hr />
 20<div class="image-container">
 21<img src="cassandra-database-e1584191543401.jpg" alt="NoSQL database – Apache Cassandra – First delivery" />
 22<div class="image-caption"></div>
 23</div>
 24<p>
 25<h2 class="title" id="purpose_of_technology"><a class="anchor" href="#purpose_of_technology">¶</a>Purpose of technology</h2>
 26<p>Apache Cassandra is a <strong>NoSQL</strong>, <strong>open-source</strong>, <strong>distributed “key-value” database</strong>. It allows <strong>large volumes of distributed data</strong>. The main **goal **is provide <strong>linear scalability and availabilitywithout compromising performance</strong>. Besides, Cassandra <strong>supports replication</strong> across multiple datacenters, providing low latency. </p>
 27<h2 id="how_it_works"><a class="anchor" href="#how_it_works">¶</a>How it works</h2>
 28<p>Cassandra’s distributed **architecture **is based on a series of <strong>equal nodes</strong> that communicate with a <strong>P2P protocol</strong> so that <strong>redundancy is maximum</strong>. It offers robust support for multiple datacenters, with <strong>asynchronous replication</strong> without the need for a master server. </p>
 29<p>Besides, Cassandra’s <strong>data model consists of partitioning the rows</strong>, which are rearranged into <strong>different tables</strong>. The primary keys of each table have a first component that is the <strong>partition key</strong>. Within a partition, the rows are grouped by the remaining columns of the key. The other columns can be indexed separately from the primary key.</p>
 30<p>These tables can be <strong>created, deleted, updated and queried****at runtime without blocking</strong> each other. However it does <strong>not support joins or subqueries</strong>, but instead <strong>emphasizes denormalization</strong> through features like collections.</p>
 31<p>Nowadays, Cassandra uses its own query language called <strong>CQL</strong> (<strong>Cassandra Query Language</strong>), with a <strong>similar syntax to SQL</strong>. It also allows access from <strong>JDBC</strong>.</p>
 32<p><img src="s0GHpggGZXOFcdhypRWV4trU-PkSI6lukEv54pLZnoirh0GlDVAc4LamB1Dy.png" alt="" />
 33_ Cassandra architecture _</p>
 34<h2 id="features"><a class="anchor" href="#features">¶</a>Features</h2>
 35<ul>
 36<li><strong>Decentralized</strong>: there are <strong>no single points of failure</strong>, every **node **in the cluster has the <strong>same role</strong> and there is <strong>no master node</strong>, so each node <strong>can service any request</strong>, besides the data is distributed across the cluster.</li>
 37<li>Supports **replication **and multiple replication of <strong>data center</strong>: the replication strategies are <strong>configurable</strong>. </li>
 38<li>**Scalability: **reading and writing performance increases linearly as new nodes are added, also <strong>new nodes</strong> can be <strong>added without interrupting</strong> application <strong>execution</strong>.</li>
 39<li><strong>Fault tolerance: data replication</strong> is done **automatically **in several nodes in order to recover from failures. It is possible to <strong>replace failure nodes****without <strong>making</strong> inactivity time or interruptions</strong> to the application.</li>
 40<li>**Consistency: **a choice of consistency level is provided for <strong>reading and writing</strong>.</li>
 41<li><strong>MapReduce support</strong>: it is **integrated **with <strong>Apache Hadoop</strong> to support MapReduce.</li>
 42<li><strong>Query language</strong>: it has its own query language called **CQL (Cassandra Query Language) **</li>
 43</ul>
 44<h2 id="corner_in_cap_theorem"><a class="anchor" href="#corner_in_cap_theorem">¶</a>Corner in CAP theorem</h2>
 45<p><strong>Apache Cassandra</strong> is usually described as an “<strong>AP</strong>” system because it guarantees <strong>availability</strong> and <strong>partition/fault tolerance</strong>. So it errs on the side of ensuring data availability even if this means <strong>sacrificing consistency</strong>. But, despite this fact, Apache Cassandra <strong>seeks to satisfy all three requirements</strong> (Consistency, Availability and Fault tolerance) simultaneously and can be <strong>configured to behave</strong> like a “<strong>CP</strong>” database, guaranteeing <strong>consistency and partition/fault tolerance</strong>. </p>
 46<p><img src="rf3n9LTOKCQVbx4qrn7NPSVcRcwE1LxR_khi-9Qc51Hcbg6BHHPu-0GZjUwD.png" alt="" />
 47<em>Cassandra in CAP Theorem</em></p>
 48<h2 id="download"><a class="anchor" href="#download">¶</a>Download</h2>
 49<p>In order to download the file, with extension .tar.gz. you must visit the <a href="https://cassandra.apache.org/download/">download site</a> and click on the file “<a href="https://ftp.cixug.es/apache/cassandra/3.11.6/apache-cassandra-3.11.6-bin.tar.gz">https://ftp.cixug.es/apache/cassandra/3.11.6/apache-cassandra-3.11.6-bin.tar.gz</a>”. It is important to mention that the previous link is related to the 3.11.6 version.</p>
 50<h2 id="installation"><a class="anchor" href="#installation">¶</a>Installation</h2>
 51<p>This database can only be installed on Linux distributions and Mac OS X systems, so, it is not possible to install it on Microsoft Windows.</p>
 52<p>The first main requirement is having installed Java 8 in <strong>Ubuntu</strong>, the OS that we will use. Therefore, the Java 8 installation is explained below. First open a terminal and execute the next command:</p>
 53<pre><code>sudo apt update
 54sudo apt install openjdk-8-jdk openjdk-8-jre
 55</code></pre>
 56<p>In order to establish Java as a environment variable it is needed to open the file “/.bashrc”: </p>
 57<pre><code>nano ~/.bashrc
 58</code></pre>
 59<p>And add at the end of it the path where Java is installed, as follows: </p>
 60<pre><code>export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64/jre/
 61export PATH=$PATH:$JAVA_HOME/bin
 62</code></pre>
 63<p>At this point, save the file and execute the next command, note that it does the same effect re-opening the terminal: </p>
 64<pre><code>source ~/.bashrc
 65</code></pre>
 66<p>In order to check if the Java environment variable is set correctly, run the next command: </p>
 67<pre><code>echo $JAVA_HOME
 68</code></pre>
 69<p><img src="JUUmX5MIHynJR_K9EdCgKeJcpINeCGRRt2QRu4JLPtRhCVidOhcbWwVTQjyu.png" alt="" />
 70<em>$JAVAHOME variable</em></p>
 71<p>Afterwards, it is possible to check the installed Java version with the command: </p>
 72<pre><code>java -version
 73</code></pre>
 74<p><img src="z9v1-0hpZwjI4U5UZej9cRGN5-Y4AZl0WUPWyQ_-JlzTAIvZtTFPnKY2xMQ_.png" alt="" />
 75<em>Java version</em></p>
 76<p>The next requirement is having installed the latest version of Python 2.7. This can be checked with the command: </p>
 77<pre><code>python --version
 78</code></pre>
 79<p>If it is not installed, to install it, it is as simple as run the next command in the terminal: </p>
 80<pre><code>sudo apt install python
 81</code></pre>
 82<p>Note: it is better to use “python2” instead of “python” because in that way, you force to user Python 2.7. Modern distributions will use Python 3 for the «python» command.</p>
 83<p>Therefore, it is possible to check the installed Python version with the command:</p>
 84<pre><code>python --version
 85</code></pre>
 86<p><img src="Ger5Vw_e1HIK84QgRub-BwGmzIGKasgiYb4jHdfRNRrvG4d6Msp_3Vk62-9i.png" alt="" />
 87<em>Python version</em></p>
 88<p>Once both requirements are ready, next step is to unzip the file previously downloaded, right click on the file and select “Extract here” or with the next command, on the directory where is the downloaded file. </p>
 89<pre><code>tar -zxvf apache-cassandra-x.x.x-bin.tar.gz
 90</code></pre>
 91<p>In order to check if the installation is completed, you can execute the next command, in the root folder of the project. This will start Cassandra in a single node. </p>
 92<pre><code>/bin/cassandra
 93</code></pre>
 94<p>It is possible to make a get some data from Cassandra with CQL (Cassandra Query Language). To check this execute the next command in another terminal. </p>
 95<pre><code>/bin/cqlsh localhost
 96</code></pre>
 97<p>Once CQL is open, type the next sentence and check the result: </p>
 98<pre><code>SELECT cluster_name, listen_address from system.local;
 99</code></pre>
100<p>The output should be:</p>
101<p><img src="miUO60A-RtyEAOOVFJqlkPRC18H4RKUhot6RWzhO9FmtzgTPOYHFtwxqgZEf.png" alt="" />
102<em>Sentence output</em></p>
103<p>Finally, the installation guide provided by the website of the database is attached in this <a href="https://cassandra.apache.org/doc/latest/getting_started/installing.html">installation guide</a>. </p>
104<h2 id="references"><a class="anchor" href="#references">¶</a>References</h2>
105<ul>
106<li><a href="https://es.wikipedia.org/wiki/Apache_Cassandra">Wikipedia</a></li>
107<li><a href="https://cassandra.apache.org/">Apache Cassandra</a></li>
108<li><a href="https://www.datastax.com/blog/2019/05/how-apache-cassandratm-balances-consistency-availability-and-performance">Datastax</a></li>
109<li><a href="https://blog.yugabyte.com/apache-cassandra-architecture-how-it-works-lightweight-transactions/">yugabyte</a></li>
110</ul>
111</main>
112</body>
113</html>
114