atom.xml (view raw)
1<?xml version="1.0" encoding="UTF-8"?>
2<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
3 <title>Lonami's Site</title>
4 <link href="https://lonami.dev/atom.xml" rel="self" type="application/atom+xml"/>
5 <link href="https://lonami.dev"/>
6 <generator uri="https://www.getzola.org/">Zola</generator>
7 <updated>2020-10-03T00:00:00+00:00</updated>
8 <id>https://lonami.dev/atom.xml</id>
9 <entry xml:lang="en">
10 <title>Making a Difference</title>
11 <published>2020-08-24T00:00:00+00:00</published>
12 <updated>2020-08-24T00:00:00+00:00</updated>
13 <link href="https://lonami.dev/golb/making-a-difference/" type="text/html"/>
14 <id>https://lonami.dev/golb/making-a-difference/</id>
15 <content type="html"><p>When I've thought about what &quot;making a difference&quot; means, I've always seen it as having to do something at very large scales. Something that changes everyone's lives. But I've realized that it doesn't need the case.</p>
16<p>I'm thinking about certain people. I'm thinking about middle-school.</p>
17<p>I'm thinking about my math teacher, who I remember saying that if he made a student fail with a grade very close to passing, then he would be a bad teacher because he could just &quot;let them pass&quot;. But if he just passed that one student, he would fail as a teacher, because it's his job to get people to actually <em>learn</em> his subject. He didn't want to be mean, he was just trying to have everybody properly learn the subject. That made a difference on me, but I never got the chance to thank him.</p>
18<p>I'm thinking about my English teacher, who has had to put up with a lot of stupidity from the rest of students, making the class not enjoyable. But I thought she was nice, and she thought I was nice. I remember of a day when she was grading my assignement and debating what grade I should get. I thought to myself, she should just grade whatever she considered fair. But she went something along the lines of &quot;what the heck, you deserve it&quot;, and graded in my favour. I think of her as an honest person who also just wants to make other people learn, despite the other students not liking her much. I never got a chance to thank her.</p>
19<p>I'm thinking about my philosophy teacher, who was a nice chap. He tried to make the lectures fun and had some really interesting ways of thinking. He was nice to talk to overall, but I never got to thank him for what he taught us.</p>
20<p>I'm thinking about one of my lecturers at university who has let me express my feelings to her and helped me make the last push I needed to finish my university degree (I was really dreading some subjects and considering dropping out, but those days are finally over).</p>
21<p>I'm thinking about all the people who has been in a long-distance relationship with me. None of the three I've had have worked out in the long-term so far, and I'm in a need of a break from those. But they were really invaluable to help me grow and learn a lot about how things actually work. I'm okay with the first two people now, maybe the third one can be my friend once more in the future as well. I'm sure I've told them how important they have been to me and my life.</p>
22<p>I'm thinking about all the people who I've met online and have had overall a healthy relation, sharing interesting things between each other, playtime, thoughts, and other various lessons.</p>
23<p>What I'm trying to get across is that you may be more impactful than you think you really are. And even if people don't say it, some are extremely thankful of your interactions with them. You can see this post as a sort of a &quot;call for action&quot; to be more thankful to the people that have affected you in important ways. If people take things for granted because they Just Work, the person who made those things should be proud of this achievement.</p>
24<p>Thanks to all of them, to everyone who has shared good moments with me, and to all the people who enjoy the things I make.</p>
25</content>
26 </entry>
27 <entry xml:lang="en">
28 <title>Data Mining, Warehousing and Information Retrieval</title>
29 <published>2020-07-03T00:00:00+00:00</published>
30 <updated>2020-07-03T00:00:00+00:00</updated>
31 <link href="https://lonami.dev/blog/university/" type="text/html"/>
32 <id>https://lonami.dev/blog/university/</id>
33 <content type="html"><p>During university, there were a few subjects where I had to write blog posts for (either as evaluable tasks or just for fun). I thought it was really fun and I wanted to preserve that work here, with the hopes it's interesting to someone.</p>
34<p>The posts series were auto-generated from the original HTML files and manually anonymized later.</p>
35<ul>
36<li><a href="/blog/mdad">Data Mining and Data Warehousing</a></li>
37<li><a href="/blog/ribw">Information Retrieval and Web Search</a></li>
38</ul>
39</content>
40 </entry>
41 <entry xml:lang="en">
42 <title>My new computer</title>
43 <published>2020-06-19T00:00:00+00:00</published>
44 <updated>2020-07-03T00:00:00+00:00</updated>
45 <link href="https://lonami.dev/blog/new-computer/" type="text/html"/>
46 <id>https://lonami.dev/blog/new-computer/</id>
47 <content type="html"><p>This post will be mostly me ranting about setting up a new laptop, but I also just want to share my upgrade. If you're considering installing Arch Linux with dual-boot for Windows, maybe this post will help. Or perhaps you will learn something new to troubleshoot systems in the future. Let's begin!</p>
48<p>Last Sunday, I ordered a Asus Rog Strix G531GT-BQ165 for 900€ (on a 20% discount) with the following specifications:</p>
49<ul>
50<li>Intel® Core i7-9750H (6 cores, 12MB cache, 2.6GHz up to 4.5GHz, 64-bit)</li>
51<li>16GB RAM (8GB*2) DDR4 2666MHz</li>
52<li>512GB SSD M.2 PCIe® NVMe</li>
53<li>Display 15.6&quot; (1920x1080/16:9) 60Hz</li>
54<li>Graphics NVIDIA® GeForce® GTX1650 4GB GDDR5 VRAM</li>
55<li>LAN 10/100/1000</li>
56<li>Wi-Fi 5 (802.11ac) 2x2 RangeBoost</li>
57<li>Bluetooth 5.0</li>
58<li>48Wh battery with 3 cells</li>
59<li>3 x USB 3.1 (GEN1)</li>
60</ul>
61<p>I was mostly interested in a general upgrade (better processor, disk, more RAM), although the graphics card is a really nice addition which will allow me to take some time off on more games. After using it for a bit, I really love the feel of the keyboard, and I love the lack of numpad! (No sarcasm, I really don't like numpads.)</p>
62<p>This is an upgrade from my previous laptop (Asus X554LA-XX822T), which I won in a competition before entering university in a programming challenge. It has served me really well for the past five years, and had the following specifications:</p>
63<ul>
64<li>Intel® Core™ i5-5200U</li>
65<li>4GB RAM DDR3L 1600MHz (which I upgraded to have 8GB)</li>
66<li>1TB HDD</li>
67<li>Display 15.6&quot; (1366x768/16:9)</li>
68<li>Intel® HD Graphics 4400</li>
69<li>LAN 10/100/1000</li>
70<li>Wifi 802.11 bgn</li>
71<li>Bluetooth 4.0</li>
72<li>Battery 2 cells</li>
73<li>1 x USB 2.0</li>
74<li>2 x USB 3.0</li>
75</ul>
76<p>Prior to this one, I had a Lenovo (also won in the same competition of the previous year), and prior to that (just for the sake of history), it was HP Pavilion, AMD A4-3300M processor, which unfortunately ended with heating problems. But that's very old now.</p>
77<h2 id="laptop-arrival">Laptop arrival</h2>
78<p>The laptop arrived 2 days ago at roughly 19:00, which I put charged for 3 hours as the book said. The day after, nightmares began!</p>
79<p>Trying to boot it the first two times was fun, as it comes with a somewhat loud sound on boot. I don't know why they would do this, and I immediately turned it off in the BIOS.</p>
80<h2 id="installation-journey">Installation journey</h2>
81<p>I spent all of yesterday trying to setup Windows and Arch Linux (and didn't even finish, it took me this morning too and even now it's only half functional). I absolutely <em>hate</em> the amount of partitions the Windows installer creates on a clean disk. So instead, I first went with Arch Linux, and followed the <a href="https://wiki.archlinux.org/index.php/Installation_guide">installation guide on the Arch wiki</a>. Pre-installation, setting up the wireless network, creating the partitions and formatting them went all good. I decided to avoid GRUB at first and go with rEFInd, but alas I missed a big warning on the wiki and after reboot (I would later find out) it was not mounting root properly, so all I had was whatever was in the Initramfs. Reboot didn't work, so I had to hold the power button.</p>
82<p>Anyway, once the partitions were created, I went to install Windows (there was a lot of back and forth burning different <code>.iso</code> images on the USB, which was a bit annoying because it wasn't the fastest thing in the world). This was pretty painless, and the process was standard: select advanced to let me choose the right partition, pick the one, say &quot;no&quot; to everything in the services setup, and done. But this was the first Windows <code>.iso</code> I tried. It was an old revision, and the drivers were causing issues when running (something weird about their <code>.dll</code>, manually installing the <code>.ini</code> driver files seemed to work?). The Nvidia drivers didn't want to be installed on such an old revision, after updating everything I could via Windows updates. So back I went to burning a newer Windows <code>.iso</code> and going through the same process again…</p>
83<p>Once Windows was ready and I verified that I could boot to it correctly, it was time to have a second go at Arch Linux. And I went through the setup at least three times, getting it wrong every single time, formatting root every single time, redownloading the packages every single pain. If only had I known earlier what the issue was!</p>
84<p>Why bother with Arch? I was pretty happy with Linux Mint, and I lowkey wanted to try NixOS, but I had used Arch before and it's a really nice distro overall (up-to-date, has AUR, quite minimal, imperative), except for trying to install rEFInd while chrooted…</p>
85<p>In the end I managed to get something half-working, I still need to properly configure WiFi and pulseaudio in my system but hey it works.</p>
86<p>I like to be able to dual-boot Windows and Linux because Linux is amazing for productivity, but unfortunately, some games only work fine on Windows. Might as well have both systems and use one for gaming, while the other is my daily driver.</p>
87<h2 id="setting-up-arch-linux">Setting up Arch Linux</h2>
88<p>This is the process I followed to install Arch Linux in the end, along with a brief explanation on what I think the things are doing and why we are doing them. I think the wiki could do a better job at this, but I also know it's hard to get it right for everyone. Something I do dislike is the link colour, after opening a link it becomes gray and it's a lot easier to miss the fact that it is a link in the first place, which was tough when re-reading it because some links actually matter a lot. Furthermore, important information may just be a single line, also easy to skim over. Anyway, on to the installation process…</p>
89<p>The first thing we want to do is configure our keyboard layout or else the keys won't correspond to what we expect:</p>
90<pre><code class="language-sh" data-lang="sh">loadkeys es
91</code></pre>
92<p>Because we're on a recent system, we want to verify that UEFI works correctly. If we see files listed, then it works fine:</p>
93<pre><code class="language-sh" data-lang="sh">ls /sys/firmware/efi/efivars
94</code></pre>
95<p>The next thing we want to do is configure the WiFi, because I don't have any ethernet cable nearby. To do this, we check what network interfaces our laptop has (we're looking for the one prefixed with &quot;w&quot;, presumably for wireless, such as &quot;wlan0&quot; or &quot;wlo1&quot;), we set it up, scan for available wireless network, and finally connect. In my case, the network has WPA security so we rely on <code>wpa_supplicant</code> to connect, passing the SSID (network name) and password:</p>
96<pre><code class="language-sh" data-lang="sh">ip link
97ip link set &lt;IFACE&gt; up
98iw dev &lt;IFACE&gt; scan | less
99wpa_supplicant -B -i &lt;IFACE&gt; -c &lt;(wpa_passphrase &lt;SSID&gt; &lt;PASS&gt;)
100</code></pre>
101<p>After that's done, pinging an IP address like &quot;1.1.1.1&quot; should Just Work™, but to be able to resolve hostnames, we need to also setup a nameserver. I'm using Cloudflare's, but you could use any other:</p>
102<pre><code class="language-sh" data-lang="sh">echo nameserver 1.1.1.1 &gt; /etc/resolv.conf
103ping archlinux.org
104^C
105</code></pre>
106<p>If the ping works, then network works! If you still have issues, you may need to <a href="https://wiki.archlinux.org/index.php/Network_configuration#Static_IP_address">manually configure a static IP address</a> and add a route with the address of your, well, router. This basically shows if we have any address, adds a static address (so people know who we are), shows what route we have, and adds a default one (so our packets know where to go):</p>
107<pre><code class="language-sh" data-lang="sh">ip address show
108ip address add &lt;YOUR ADDR&gt;/24 broadcast + dev &lt;IFACE&gt;
109ip route show
110ip route add default via &lt;ROUTER ADDR&gt; dev &lt;IFACE&gt;
111</code></pre>
112<p>Now that we have network available, we can enable NTP to synchronize our system time (this may be required for network operations where certificates have a validity period, not sure; in any case nobody wants a wrong system time):</p>
113<pre><code class="language-sh" data-lang="sh">timedatectl set-ntp true
114</code></pre>
115<p>After that, we can manage our disk and partitions using <code>fdisk</code>. We want to define partitions to tell the system where it should live. To determine the disk name, we first list them, and then edit it. <code>fdisk</code> is really nice and reminds you at every step that help can be accessed with &quot;m&quot;, which you should constantly use to guide you through.</p>
116<pre><code class="language-sh" data-lang="sh">fdisk -l
117fdisk /dev/&lt;DISK&gt;
118</code></pre>
119<p>The partitions I made are the following:</p>
120<ul>
121<li>A 100MB one for the EFI system.</li>
122<li>A 32GB one for Linux' root <code>/</code> partition.</li>
123<li>A 200GB one for Linux' home <code>/home</code> partition.</li>
124<li>The rest was unallocated for Windows because I did this first.</li>
125</ul>
126<p>I like to have <code>/home</code> and <code>/</code> separate because I can reinstall root without losing anything from home (projects, music, photos, screenshots, videos…).</p>
127<p>After the partitions are made, we format them in FAT32 and EXT4 which are good defaults for EFI, root and home. They need to have a format, or else they won't be usable:</p>
128<pre><code class="language-sh" data-lang="sh">mkfs.fat -F32 /dev/&lt;DISK&gt;&lt;PART1&gt;
129mkfs.ext4 /dev/&lt;DISK&gt;&lt;PART2&gt;
130mkfs.ext4 /dev/&lt;DISK&gt;&lt;PART3&gt;
131</code></pre>
132<p>Because the laptop was new, there was no risk to lose anything, but if you're doing a install on a previous system, be very careful with the partition names. Make sure they match with the ones in <code>fdisk -l</code>.</p>
133<p>Now that we have usable partitions, we need to mount them or they won't be accessible. We can do this with <code>mount</code>:</p>
134<pre><code class="language-sh" data-lang="sh">mount /dev/&lt;DISK&gt;&lt;PART2&gt; /mnt
135mkdir /mnt/efi
136mount /dev/&lt;DISK&gt;&lt;PART1&gt; /mnt/efi
137mkdir /mnt/home
138mount /dev/&lt;DISK&gt;&lt;PART3&gt; /mnt/home
139</code></pre>
140<p>Remember to use the correct partitions while mounting. We mount everything so that the system knows which partitions we care about, which we will let know about later on.</p>
141<p>Next step is to setup the basic Arch Linux system on root, which can be done with <code>pacstrap</code>. What follows the directory is a list of packages, and you may choose any you wish (at least add <code>base</code>, <code>linux</code> and <code>linux-firmware</code>). These can be installed later, but I'd recommend having them from the beginning, just in case:</p>
142<pre><code class="language-sh" data-lang="sh">pacstrap /mnt base linux linux-firmware sudo vim-minimal dhcpcd wpa_supplicant man-db man-pages intel-ucode grub efibootmgr os-prober ntfs-3g
143</code></pre>
144<p>Because my system has an intel CPU, I also installed <code>intel-ucode</code>.</p>
145<p>Next up is generating the <code>fstab</code> file, which we tell to use UUIDs to be on the safe side through <code>-U</code>. This file is important, because without it the system won't know what partitions exist and will happily only boot with the initramfs, without anything of what we just installed at root. Not knowing this made me restart the entire installation process a few times.</p>
146<pre><code class="language-sh" data-lang="sh">genfstab -U /mnt &gt;&gt; /mnt/etc/fstab
147</code></pre>
148<p>After that's done, we can change our root into our mount point and finish up configuration. We setup our timezone (so DST can be handled correctly if needed), synchronize the hardware clock (to persist the current time to the BIOS), uncomment our locales (exit <code>vim</code> by pressing ESC, then type <code>:wq</code> and press enter), generate locale files (which some applications need), configure language and keymap, update the hostname of our laptop and what indicate what <code>localhost</code> means…</p>
149<pre><code class="language-sh" data-lang="sh">ln -sf /usr/share/zoneinfo/&lt;REGION&gt;/&lt;CITY&gt; /etc/localtime
150hwclock --systohc
151vim /etc/locale.gen
152locale-gen
153echo LANG=es_ES.UTF-8 &gt; /etc/locale.conf
154echo KEYMAP=es &gt; /etc/vconsole.conf
155echo &lt;HOST&gt; /etc/hostname
156cat &lt;&lt;EOF &gt; /etc/hosts
157127.0.0.1 localhost
158::1 localhost
159127.0.1.1 &lt;HOST&gt;.localdomain &lt;HOST&gt;
160EOF
161</code></pre>
162<p>Really, we could've done all of this later, and the same goes for setting root's password with <code>passwd</code> or creating users (some of the groups you probably want are <code>power</code> and <code>wheel</code>).</p>
163<p>The important part here is installing GRUB (which also needed the <code>efibootmgr</code> package):</p>
164<pre><code class="language-sh" data-lang="sh">grub-install --target=x86_64-efi --efi-directory=/efi --bootloader-id=GRUB
165</code></pre>
166<p>If we want GRUB to find our Windows install, we also need the <code>os-prober</code> and <code>ntfs-3g</code> packages that we installed earlier with <code>pacstrap</code>, and with those we need to mount the Windows partition somewhere. It doesn't matter where. With that done, we can generate the GRUB configuration file which lists all the boot options:</p>
167<pre><code class="language-sh" data-lang="sh">mkdir /windows
168mount /dev/&lt;DISK&gt;&lt;PART5&gt; /windows
169grub-mkconfig -o /boot/grub/grub.cfg
170</code></pre>
171<p>(In my case, I installed Windows before completing the Arch install, which created an additional partition in between).</p>
172<p>With GRUB ready, we can exit the chroot and reboot the system, and if all went well, you should be greeted with a choice of operating system to use:</p>
173<pre><code class="language-sh" data-lang="sh">exit
174reboot
175</code></pre>
176<p>If for some reason you need to find what mountpoints were active prior to rebooting (to <code>unmount</code> them for example), you can use <code>findmnt</code>.</p>
177<p>Before GRUB I tried rEFInd, which as I explained had issues with for missing a warning. Then I tried systemd-boot, which did not pick up Arch at first. That's where the several reinstalls come from, I didn't want to work with a half-worked system so I mostly redid the entire process quite a few times.</p>
178<h2 id="migrating-to-the-new-laptop">Migrating to the new laptop</h2>
179<p>I had a external disk formatted with NTFS. Of course, after moving every file I cared about from my previous Linux install caused all the permissions to reset. All my <code>.git</code> repositories, dirty with file permission changes! This is going to take a while to fix, or maybe I should just <code>git config core.fileMode false</code>. Here is a <a href="https://stackoverflow.com/a/2083563">lovely command</a> to sort them out on a per-repository basis:</p>
180<pre><code class="language-sh" data-lang="sh">git diff --summary | grep --color 'mode change 100644 =&gt; 100755' | cut -d' ' -f7- | xargs -d'\n' chmod -x
181</code></pre>
182<p>I never realized how much I had stored over the years, but it really was a lot. While moving things to the external disk, I tried to do some cleanup, such as removing some build artifacts which needlessly occupy space, or completely skipping all the binary application files. If I need those I will install them anyway. The process was mostly focused on finding all the projects and program data that I did care about, or even some game saves. Nothing too difficult, but definitely time consuming.</p>
183<h2 id="tuning-arch">Tuning Arch</h2>
184<p>Now that our system is ready, install <code>pacman-contrib</code> to grab a copy of the <code>rankmirrors</code> speed. It should help speed up the download of whatever packages you want to install, since it will help us <a href="https://wiki.archlinux.org/index.php/Mirrors#List_by_speed">rank the mirrors by download speed</a>. Making a copy of the file is important, otherwise whenever you try to install something it will fail saying it can't find anything.</p>
185<pre><code class="language-sh" data-lang="sh">cp /etc/pacman.d/mirrorlist /etc/pacman.d/mirrorlist.backup
186sed -i 's/^#Server/Server/' /etc/pacman.d/mirrorlist.backup
187rankmirrors -n 6 /etc/pacman.d/mirrorlist.backup | tee /etc/pacman.d/mirrorlist
188</code></pre>
189<p>This will take a while, but it should be well worth it. We're using <code>tee</code> to see the progress as it goes.</p>
190<p>Some other packages I installed after I had a working system in no particular order:</p>
191<ul>
192<li><code>xfce4</code> and <code>xorg-server</code>. I just love the simplicity of XFCE.</li>
193<li><code>xfce4-whiskermenu-plugin</code>, a really nice start menu.</li>
194<li><code>xfce4-pulseaudio-plugin</code> and <code>pavucontrol</code>, to quickly adjust the audio with my mouse.</li>
195<li><code>xfce4-taskmanager</code>, a GUI alternative I generally prefer to <code>htop</code>.</li>
196<li><code>pulseaudio</code> and <code>pulseaudio-alsa</code> to get nice integration with XFCE4 and audio mixing.</li>
197<li><code>firefox</code>, which comes with fonts too. A really good web browser.</li>
198<li><code>git</code>, to commit <del>crimes</del> code.</li>
199<li><code>code</code>, a wonderful editor which I used to write this blog entry.</li>
200<li><code>nano</code>, so much nicer to write a simple commit message.</li>
201<li><code>python</code> and <code>python-pip</code>, my favourite language to toy around ideas or use as a calculator.</li>
202<li><code>telegram-desktop</code>, for my needs on sharing memes.</li>
203<li><code>cmus</code> and <code>mpv</code>, a simple terminal music player and media player.</li>
204<li><code>openssh</code>, to connect into any VPS I have access to.</li>
205<li><code>base-devel</code>, necessary to build most projects I'll find myself working with (or even compiling some projects Rust which I installed via <code>rustup</code>).</li>
206<li><code>flac</code>, <code>libmad</code>, <code>opus</code>, and <code>libvorbis</code>, to be able to play more audio files.</li>
207<li><code>inkscape</code>, to make random drawings.</li>
208<li><code>ffmpeg</code>, to convert media or record screen.</li>
209<li><code>xclip</code>, to automatically copy screenshots to my clipboard.</li>
210<li><code>gvfs</code>, needed by Thunar to handle mounting and having a trash (perma-deletion by default can be nasty sometimes).</li>
211<li><code>noto-fonts</code>, <code>noto-fonts-cjk</code>, <code>noto-fonts-extra</code> and <code>noto-fonts-emoji</code>, if you don't want missing gliphs everywhere.</li>
212<li><code>xfce4-notifyd</code> and <code>libnotify</code>, for notifications.</li>
213<li><code>cronie</code>, to be able to <code>crontab -e</code>. Make sure to <code>system enable cronie</code>.</li>
214<li><code>xarchiver</code> (with <code>p7zip</code>, <code>zip</code>, <code>unzip</code> and <code>unrar</code>) to uncompress stuff.</li>
215<li><code>xreader</code> to read <code>.pdf</code> files.</li>
216<li><code>sqlitebrowser</code> is always nice to tinker around with SQLite databases.</li>
217<li><code>jre8-openjdk</code> if you want to run Java applications.</li>
218<li><code>smartmontools</code> is nice with a SSD to view your disk statistics.</li>
219</ul>
220<p>After that, I configured my Super L key to launch <code>xfce4-popup-whiskermenu</code> so that it opens the application menu, pretty much the same as it would on Windows, moved the panels around and configured them to my needs, and it feels like home once more.</p>
221<p>I made some mistakes while <a href="https://wiki.archlinux.org/index.php/Systemd-networkd">configuring systemd-networkd</a> and accidentally added a service that was incorrect, which caused boot to wait for it to timeout before completing. My boot time was taking 90 seconds longer because of this! <a href="https://www.reddit.com/r/archlinux/comments/4nv9yi/my_arch_greets_me_now_with_a_start_job/">The solution was to remove said service</a>, so this is something to look out for.</p>
222<p>In order to find what was taking long, I had to edit the <a href="https://wiki.archlinux.org/index.php/kernel_parameters">kernel parameters</a> to remove the <code>quiet</code> option. I prefer seeing the output on what my computer is doing anyway, because it gives me a sense of progress and most importantly is of great value when things go wrong. Another interesting option is <code>noauto,x-systemd.automount</code>, which makes a disk lazily-mounted. If you have a slow disk, this could help speed things up.</p>
223<p>If you see a service taking long, you can also use <code>systemd-analyze blame</code> to see what takes the longest, and <code>systemctl list-dependencies</code> is also helpful to find what services are active.</p>
224<p>My <code>locale charmap</code> was spitting out a bunch of warnings:</p>
225<pre><code class="language-sh" data-lang="sh">$ locale charmap
226locale: Cannot set LC_CTYPE to default locale: No such file or directory
227locale: Cannot set LC_MESSAGES to default locale: No such file or directory
228locale: Cannot set LC_ALL to default locale: No such file or directory
229ANSI_X3.4-1968
230</code></pre>
231<p>…ANSI encoding? Immediately I added the following to <code>~/.bashrc</code> and <code>~/.profile</code>:</p>
232<pre><code class="language-sh" data-lang="sh">export LC_ALL=en_US.UTF-8
233export LANG=en_US.UTF-8
234export LANGUAGE=en_US.UTF-8
235</code></pre>
236<p>For some reason, I also had to edit <code>xfce4-terminal</code>'s preferences in advanced to change the default character encoding to UTF-8. This also solved my issues with pasting things into the terminal, and also proper rendering! I guess pastes were not working because it had some characters that could not be encoded.</p>
237<p>To have working notifications, I added the following to <code>~/.bash_profile</code> after <code>exec startx</code>:</p>
238<pre><code class="language-sh" data-lang="sh">systemctl --user start xfce4-notifyd.service
239</code></pre>
240<p>I'm pretty sure there's a better way to do this, or maybe it's not even necessary, but this works for me.</p>
241<p>Some of the other things I had left to do was setting up <code>sccache</code> to speed up Rust builds:</p>
242<pre><code class="language-sh" data-lang="sh">cargo install sccache
243echo export RUSTC_WRAPPER=sccache &gt;&gt; ~/.bashrc
244</code></pre>
245<p>Once I had <code>cargo</code> ready, installed <code>hacksaw</code> and <code>shotgun</code> with it to perform screenshots.</p>
246<p>I also disabled the security delay when downloading files in Firefox because it's just annoying, in <code>about:config</code> setting <code>security.dialog_enable_delay</code> to <code>0</code>, and added the <a href="https://alisdair.mcdiarmid.org/kill-sticky-headers/">Kill sticky headers</a> to my bookmarks (you may prefer <a href="https://github.com/t-mart/kill-sticky">the updated version</a>).</p>
247<p>The <code>utils-linux</code> comes with a <code>fstrim</code> utility to <a href="https://wiki.archlinux.org/index.php/Solid_state_drive#Periodic_TRIM">trim the SSD weekly</a>, which I want enabled via <code>systemctl enable fstrim.timer</code> (you may also want to <code>start</code> it if you don't reboot often). For more SSD tips, check <a href="https://easylinuxtipsproject.blogspot.com/p/ssd.html">How to optimize your Solid State Drive</a>.</p>
248<p>If the sound is funky prior to reboot, try <code>pulseaudio --kill</code> and <code>pulseaudio --start</code>, or delete <code>~/.config/pulse</code>.</p>
249<p>I haven't been able to get the brightness keys to work yet, but it's not a big deal, because scrolling on the power manager plugin of Xfce does work (and also <code>xbacklight</code> works, or writing directly to <code>/sys/class/backlight/*</code>).</p>
250<h2 id="tuning-windows">Tuning Windows</h2>
251<p>On the Windows side, I disabled the annoying Windows defender by running (<kbd>Ctrl+R</kbd>) <code>gpedit.msc</code> and editing:</p>
252<ul>
253<li><em>Computer Configuration &gt; Administrative Templates &gt; Windows Components &gt; Windows Defender » Turn off Windows Defender » Enable</em></li>
254<li><em>User Configuration &gt; Administrative Templates &gt; Start Menu and Taskbar » Remove Notifications and Action Center » Enable</em></li>
255</ul>
256<p>I also updated the <a href="https://github.com/WindowsLies/BlockWindows/raw/master/hosts"><code>hosts</code> file</a> (located at <code>%windir%\system32\Drivers\etc\hosts</code>) with the hope that it will stop some of the telemetry.</p>
257<p>Last, to have consistent time on Windows and Linux, I changed the following registry key for a <code>qword</code> with value <code>1</code>:</p>
258<pre><code>HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\TimeZoneInformation\RealTimeIsUniversal
259</code></pre>
260<p>(The key might not exist, but you can create it if that's the case).</p>
261<p>All this time, my laptop had the keyboard lights on, which have been quite annoying. Apparently, they also can cause <a href="https://www.reddit.com/r/ValveIndex/comments/cm6pos/psa_uninstalldisable_aura_sync_lighting_if_you/">massive FPS drops</a>. I headed over to <a href="https://rog.asus.com/downloads/">Asus Rog downloads</a>, selected Aura Sync…</p>
262<pre><code class="language-md" data-lang="md"># Not Found
263
264The requested URL /campaign/aura/us/Sync.html was not found on this server.
265
266Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.
267</code></pre>
268<p>…great! I'll just find the <a href="https://www.asus.com/campaign/aura/global/">Aura site</a> somewhere else…</p>
269<pre><code class="language-md" data-lang="md"># ASUS
270
271# We'll be back.
272
273Hi, our website is temporarily closed for service enhancements.
274
275We'll be back shortly.Thank you for your patience!
276</code></pre>
277<p>Oh come on. After waiting for the next day, I headed over, downloaded their software, tried to install it and it was an awful experience. It felt like I was purposedly installing malware. It spammed and flashed a lot of <code>cmd</code>'s on screen as if it was a virus. It was stuck at 100% doing that and then, Windows blue-screened with <code>KERNEL_MODE_HEAP_CORRUPTION</code>. Amazing. How do you screw up this bad?</p>
278<p>Well, at least rebooting worked. I tried to <a href="https://answers.microsoft.com/en-us/windows/forum/all/unable-to-uninstall-asus-aura-sync-utility/e9bec36c-e62f-4773-80be-88fb68dace16">uninstall Aura, but of course that failed</a>. Using the <a href="https://support.microsoft.com/en-us/help/17588/windows-fix-problems-that-block-programs-being-installed-or-removed">troubleshooter to uninstall programs</a> helped me remove most of the crap that was installed.</p>
279<p>After searching around how to disable the lights (because <a href="https://rog.asus.com/forum/showthread.php?112786-Option-to-Disable-Aura-Lights-on-Strix-G-series-(G531GT)-irrespective-of-OSes">my BIOS did not have this setting</a>), I stumbled upon <a href="https://rog.asus.com/us/innovation/armoury_crate/">&quot;Armoury Crate&quot;</a>. Okay, fine, I will install that.</p>
280<p>The experience wasn't much better. It did the same thing with a lot of consoles flashing on screen. And of course, it resulted in another blue-screen, this time <code>KERNEL_SECURITY_CHECK_FAILURE</code>. To finish up, the BSOD kept happening as I rebooted the system. <del>Time to reinstall Windows once more.</del> After booting and crashing a few more times I could get into secure mode and perform the reinstall from there, which saved me from burning the <code>.iso</code> again.</p>
281<p>Asus software might be good, but the software is utter crap.</p>
282<p>After trying out <a href="https://github.com/wroberts/rogauracore">rogauracore</a> (which didn't list my model), it worked! I could disable the stupid lights from Linux, and <a href="https://gitlab.com/CalcProgrammer1/OpenRGB/-/wikis/home">OpenRGB</a> also works on Windows which may be worth checking out too.</p>
283<p>Because <code>rougauracore</code> helped me and they linked to <a href="https://github.com/linuxhw/hw-probe/blob/master/README.md#appimage">hw-probe</a>, I decided to <a href="https://linux-hardware.org/?probe=0e3e48c501">run it on my system</a>, with the hopes it is useful for other people.</p>
284<h2 id="closing-words">Closing words</h2>
285<p>I hope the installation journey is at least useful to someone, or that you enjoyed reading about it all. If not, sorry!</p>
286</content>
287 </entry>
288 <entry xml:lang="en">
289 <title>Privado: Final NoSQL evaluation</title>
290 <published>2020-05-13T00:00:00+00:00</published>
291 <updated>2020-05-14T08:31:06+00:00</updated>
292 <link href="https://lonami.dev/blog/mdad/final-nosql-evaluation/" type="text/html"/>
293 <id>https://lonami.dev/blog/mdad/final-nosql-evaluation/</id>
294 <content type="html"><p>This evaluation is a bit different to my <a href="/blog/mdad/nosql-evaluation/">previous one</a> because this time I have been tasked to evaluate student <code>a(i - 2)</code>, and because I am <code>i = 11</code> that happens to be <code>a(9) =</code> a classmate.</p>
295<h2 id="classmate-s-evaluation">Classmate’s Evaluation</h2>
296<p><strong>Grading: A.</strong></p>
297<p>The post I have evaluated is Trabajo en grupo – Bases de datos NoSQL, 3ª entrada: Aplicación con una Base de datos NoSQL seleccionada.</p>
298<p>It starts with a very brief introduction with who has written the post, what data they will be using, and what database they have chosen.</p>
299<p>They properly describe their objective, how they will do it and what library will be used.</p>
300<p>They also explain where they obtain the data from, and what other things the site can do, which is a nice bonus.</p>
301<p>The post continues listing and briefly explaining all the tools used and what they are for, including commands to execute.</p>
302<p>At last, they list what files their project uses, what they do, and contains a showcase of images which lets the reader know what the application does.</p>
303<p>All in all, in my opinion, it’s clear they have put work into this entry and I have not noticed any major flaws, so they deserve the highest grade.</p>
304</content>
305 </entry>
306 <entry xml:lang="en">
307 <title>Privado: Final NoSQL evaluation</title>
308 <published>2020-05-13T00:00:00+00:00</published>
309 <updated>2020-05-14T07:30:08+00:00</updated>
310 <link href="https://lonami.dev/blog/ribw/final-nosql-evaluation/" type="text/html"/>
311 <id>https://lonami.dev/blog/ribw/final-nosql-evaluation/</id>
312 <content type="html"><p>This evaluation is a bit different to my <a href="/blog/ribw/16/nosql-evaluation/">previous one</a> because this time I have been tasked to evaluate the student <code>a(i - 2)</code>, and because I am <code>a = 9</code> that happens to be <code>a(7) =</code> Classmate.</p>
313<p>Unfortunately for Classmate, the only entry related to NoSQL I have found in their blog is Prima y segunda Actividad: Base de datos NoSQL which does not develop an application as requested for the third entry (as of 14th of May).</p>
314<p>This means that, instead, I will evaluate <code>a(i - 3)</code> which happens to be <code>a(6) =</code> Classmate and they do have an entry.</p>
315<h2 id="classmate-s-evaluation">Classmate’s Evaluation</h2>
316<p><strong>Grading: B.</strong></p>
317<p>The post I have evaluated is BB.DD. NoSQL RethinkDB 3ª Fase. Aplicación.</p>
318<p>It starts with an introduction, properly explaining what database they have chosen and why, but not what application they will be making.</p>
319<p>This is detailed just below in the next section, although it’s a bit vague.</p>
320<p>The next section talks about the Python dependencies that are required, but they never said they would be making a Python application or that we need to install Python!</p>
321<p>The next section talks about the file structure of the project, and they detail what everything part does, although I have missed some code snippets.</p>
322<p>The final result is pretty cool and contains many interesting graphs, they provide a download to the source code and list all the relevant references used.</p>
323<p>Except for a weird «necesario falta» in the text, it’s otherwise well-written, although given the issues above I cannot grade it with the highest score.</p>
324</content>
325 </entry>
326 <entry xml:lang="en">
327 <title>Tips for Outpost</title>
328 <published>2020-05-10T00:00:00+00:00</published>
329 <updated>2020-05-22T00:00:00+00:00</updated>
330 <link href="https://lonami.dev/blog/tips-outpost/" type="text/html"/>
331 <id>https://lonami.dev/blog/tips-outpost/</id>
332 <content type="html"><p><a href="https://store.steampowered.com/app/1127110/Outpost/">Outpost</a> is a fun little game by Open Mid Interactive that has popped in recently in my recommended section of Steam, and I decided to give it a try.</p>
333<p>It's a fun tower-defense game with progression, different graphics and random world generation which makes it quite fun for a few hours. In this post I want to talk about some tips I found useful to get past night 50.</p>
334<h2 id="build-pattern">Build Pattern</h2>
335<p>At first, you may be inclined to design a checkerboard pattern like the following, where &quot;C&quot; is the Crystal shrine, &quot;S&quot; is a stone launcher and &quot;B&quot; is a booster:</p>
336<p><img src="https://lonami.dev/blog/tips-outpost/outpost-bad-pattern.svg" alt="Bad Outpost build pattern" /></p>
337<p>Indeed, this pattern will apply <strong>4</strong> boosts to every turret, but unfortunately, the other 4 slots of the booster are wasted! This is because boosters are able to power 8 different towers, and you really want to maximize that. Here's a better design:</p>
338<p><img src="https://lonami.dev/blog/tips-outpost/outpost-good-pattern.svg" alt="Good Outpost build pattern" /></p>
339<p>The shrine's tower does get boosted, but it's still not really worth it to boost it. This pattern works good, and it's really easy to tile: just repeat the same 3x3 pattern.</p>
340<p>Nonetheless, we can do better. What if we applied multiple boosters to the same tower while still applying all 8 boosts?</p>
341<p><img src="https://lonami.dev/blog/tips-outpost/outpost-best-pattern.svg" alt="Best Outpost build pattern" /></p>
342<p>That's what peak performance looks like. You can actually apply multiple boosters to the same tower, and it works great.</p>
343<p>Now, is it really worth it building anywhere except around the shrine? Not really. You never know where a boss will come from, so all sides need a lot of defense if you want to stand a chance.</p>
344<p>The addition of traps in 1.6 is amazing. You want to build these outside your strong &quot;core&quot;, mostly to slow the enemies down so your turrets have more time to finish them off. Don't waste boosters on the traps, and build them at a reasonable distance from the center (the sixth tile is a good spot):</p>
345<p><img src="https://lonami.dev/blog/tips-outpost/outpost-trap-pattern.svg" alt="Trap Outpost build pattern" /></p>
346<p>If you gather enough materials, you can build more trap and cannon layers outside, roughly at enough distance to slow them for enough duration until they reach the next layer of traps, and so on. Probably a single gap of &quot;cannon, booster, cannon&quot; is enough between trap layers, just not in the center where you need a lot of fire power.</p>
347<h2 id="talents">Talents</h2>
348<p>Talents are the way progression works in the game. Generally, after a run, you will have enough experience to upgrade nearly all talents of roughly the same tier. However, some are worth upgrading more than others (which provide basically no value).</p>
349<p>The best ones to upgrade are:</p>
350<ul>
351<li>Starting supplies. Amazing to get good tools early.</li>
352<li>Shrine shield. Very useful to hold against tough bosses.</li>
353<li>Better buildings (cannon, boosters, bed and traps). They're a must to deal the most damage.</li>
354<li>Better pickaxe. Stone is limited, so better make good use of it.</li>
355<li>Better chests. They provide an insane amount of resources early.</li>
356<li>Winter slow. Turrets will have more time to deal damage, it's perfect.</li>
357<li>More time. Useful if you're running out, although generally you enter nights early after having a good core anyway.</li>
358<li>More rocks. Similar to a better pickaxe, more stone is always better.</li>
359</ul>
360<p>Some decent ones:</p>
361<ul>
362<li>In-shrine turret. It's okay to get past the first night without building but not much beyond that.</li>
363<li>Better axe and greaves. Great to save some energy and really nice quality of life to move around.</li>
364<li>Tree growth. Normally there's enough trees for this not to be an issue but it can save some time gathering wood.</li>
365<li>Wisps. They're half-decent since they can provide materials once you max out or max out expensive gear.</li>
366</ul>
367<p>Some okay ones:</p>
368<ul>
369<li>Extra XP while playing. Generally not needed due to the way XP scales per night, but can be a good boost.</li>
370<li>Runestones. Not as reliable as chests but some can grant more energy per day.</li>
371</ul>
372<p>Some crap ones:</p>
373<ul>
374<li>Boosts for other seasons. I mean, winter is already the best, no use there.</li>
375<li>Bow. The bow is very useless at the moment, it's not worth your experience.</li>
376<li>More energy per bush. Not really worth hunting for bushes since you will have enough energy to do well.</li>
377</ul>
378<h2 id="turrets">Turrets</h2>
379<p>Always build the highest tier, there's no point in anything lower than that. You will need to deal a lot of damage in a small area, which means space is a premium.</p>
380<h2 id="boosters">Boosters</h2>
381<p>If you're very early in the game, I recommend alternating both the flag and torch in a checkerboard pattern where the boosters should go in the pattern above. This way your towers will get extra speed and extra range, which works great.</p>
382<p>When you're in mid-game (stone launchers, gears and campfires), I do not recommend using campfires. The issue is their range boost is way too long, and the turrets will miss quite a few shots. It's better to put all your power into fire speed for increased DPS, at least near the center. If you manage to build too far out and some of the turrets hardly ever shoot, you may put campfires there.</p>
383<p>In end-game, of course alternate both of the highest tier upgrades. They are really good, and provide the best benefit / cost ratio.</p>
384<h2 id="gathering-materials">Gathering Materials</h2>
385<p>It is <strong>very</strong> important to use all your energy every day! Otherwise it will go to waste, and you will need a lot of materials.</p>
386<p>As of 1.6, you can mine two things at once if they're close enough! I don't know if this is intended or a bug, but it sure is great.</p>
387<p>Once you're in mid-game, your stone-based fort should stand pretty well against the nights on its own. After playing for a while you will notice, if your base can defend a boss, then it will have no issue carrying you through the nights until the next boss. You can (and should!) spend the nights gathering materials, but only when you're confident that the night won't run out.</p>
388<p>Before the boss hits (every fifth night), come back to your base and use all of your materials. This is the next fort upgrade that will carry it the five next nights.</p>
389<p>You may also speed up time during night, but make sure you use all your energy before hand. And also take care, in the current version of the game speeding up time only speeds up monster movement, not the fire rate or projectile speed of your turrets! This means they will miss more shots and can be pretty dangerous. If you're speeding up time, consider speeding it up for a little bit, then go back to normal until things are more calm, and repeat.</p>
390<p>If you're in the end-game, try to rush for chests. They provide a huge amount of materials which is really helpful to upgrade all your tools early so you can make sure to get the most out of every rock left in the map.</p>
391<p>In the end-game, after all stone has been collected, you don't really need to use all of your energy anymore. Just enough to have enough wood to build with the remaining stone. This will also be nice with the bow upgrades, which admitedly can get quite powerful, but it's best to have a strong fort first.</p>
392<h2 id="season">Season</h2>
393<p>In my opinion, winter is just the best of the seasons. You don't <em>really</em> need that much energy (it gets tiresome), or extra tree drops, or luck. Slower movement means your turrets will be able to shoot enemies for longer, dealing more damage over time, giving them more chance to take enemies out before they reach the shrine.</p>
394<p>Feel free to re-roll the map a few times (play and exit, or even restart the game) until you get winter if you want to go for The Play.</p>
395<h2 id="gear">Gear</h2>
396<p>In my opinion, you really should rush for the best pickaxe you can afford. Stone is a limited resource that doesn't regrow like trees, so once you run out, it's over. Better to make the best use out of it with a good pickaxe!</p>
397<p>You may also upgrade your greaves, we all known faster movement is a <em>really</em> nice quality of life improvement.</p>
398<p>Of course, you will eventually upgrade your axe to chop wood (otherwise it's wasted energy, really), but it's not as much of a priority as the pickaxe.</p>
399<p>Now, the bow is completely useless. Don't bother with it. Your energy is better spent gathering materials to build permanent turrets that deal constant damage while you're away, and the damage adds up with every extra turret you build.</p>
400<p>With regards to items you carry (like sword, or helmet), look for these (from best to worst):</p>
401<ul>
402<li>Less minion life.</li>
403<li>Chance to not consume energy.</li>
404<li>+1 turret damage.</li>
405<li>Extra energy.</li>
406<li>+1 drop from trees or stones.</li>
407<li>+1 free wood or stone per day.</li>
408</ul>
409<p>Less minion life, nothing to say. You will need it near end-game.</p>
410<p>The chance to not consume energy is better the more energy you have. With a 25% chance not to consume energy, you can think of it as 1 extra energy for every 4 energy you have on average.</p>
411<p>Turret damage is a tough one, it's <em>amazing</em> mid-game (it basically doubles your damage) but falls short once you unlock the cannon where you may prefer other items. Definitely recommended if you're getting started. You may even try to roll it on low tiers by dying on the second night, because it's that good.</p>
412<p>Extra energy is really good, because it means you can get more materials before it gets too rough. Make sure you have built at least two beds in the first night! This extra energy will pay of for the many nights to come.</p>
413<p>The problem with free wood or stone per day is that you have, often, five times as much energy per day. By this I mean you can get easily 5 stone every day, which means 5 extra stone, whereas the other would provide just 1 per night. On a good run, you will get around 50 free stone or 250 extra stone. It's a clear winner.</p>
414<p>In end-game, more quality of life are revealing chests so that you can rush them early, if you like to hunt for them try to make better use of the slot.</p>
415<h2 id="closing-words">Closing words</h2>
416<p>I hope you enjoy the game as much as I do! Movement is sometimes janky and there's the occassional lag spikes, but despite this it should provide at least a few good hours of gameplay. Beware however a good run can take up to an hour!</p>
417</content>
418 </entry>
419 <entry xml:lang="en">
420 <title>A practical example with Hadoop</title>
421 <published>2020-04-01T02:00:00+00:00</published>
422 <updated>2020-04-03T08:43:41+00:00</updated>
423 <link href="https://lonami.dev/blog/ribw/a-practical-example-with-hadoop/" type="text/html"/>
424 <id>https://lonami.dev/blog/ribw/a-practical-example-with-hadoop/</id>
425 <content type="html"><p>In our <a href="/blog/ribw/introduction-to-hadoop-and-its-mapreduce/">previous Hadoop post</a>, we learnt what it is, how it originated, and how it works, from a theoretical standpoint. Here we will instead focus on a more practical example with Hadoop.</p>
426<p>This post will showcase my own implementation to implement a word counter for any plain text document that you want to analyze.</p>
427<h2 id="installation">Installation</h2>
428<p>Before running any piece of software, its executable code must first be downloaded into our computers so that we can run it. Head over to <a href="http://hadoop.apache.org/releases.html">Apache Hadoop’s releases</a> and download the <a href="https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz">latest binary version</a> at the time of writing (3.2.1).</p>
429<p>We will be using the <a href="https://linuxmint.com/">Linux Mint</a> distribution because I love its simplicity, although the process shown here should work just fine on any similar Linux distribution such as <a href="https://ubuntu.com/">Ubuntu</a>.</p>
430<p>Once the archive download is complete, extract it with any tool of your choice (graphical or using the terminal) and execute it. Make sure you have a version of Java installed, such as <a href="https://openjdk.java.net/">OpenJDK</a>.</p>
431<p>Here are all the three steps in the command line:</p>
432<pre><code>wget https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
433tar xf hadoop-3.2.1.tar.gz
434hadoop-3.2.1/bin/hadoop version
435</code></pre>
436<h2 id="processing-data">Processing data</h2>
437<p>To take advantage of Hadoop, we have to design our code to work in the MapReduce model. Both the map and reduce phase work on key-value pairs as input and output, and both have a programmer-defined function.</p>
438<p>We will use Java, because it’s a dependency that we already have anyway, so might as well.</p>
439<p>Our map function needs to split each of the lines we receive as input into words, and we will also convert them to lowercase, thus preparing the data for later use (counting words). There won’t be bad records, so we don’t have to worry about that.</p>
440<p>Copy or reproduce the following code in a file called <code>WordCountMapper.java</code>, using any text editor of your choice:</p>
441<pre><code>import java.io.IOException;
442
443import org.apache.hadoop.io.IntWritable;
444import org.apache.hadoop.io.LongWritable;
445import org.apache.hadoop.io.Text;
446import org.apache.hadoop.mapreduce.Mapper;
447
448public class WordCountMapper extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt; {
449 @Override
450 public void map(LongWritable key, Text value, Context context)
451 throws IOException, InterruptedException {
452 for (String word : value.toString().split(&quot;\\W&quot;)) {
453 context.write(new Text(word.toLowerCase()), new IntWritable(1));
454 }
455 }
456}
457</code></pre>
458<p>Now, let’s create the <code>WordCountReducer.java</code> file. Its job is to reduce the data from multiple values into just one. We do that by summing all the values (our word count so far):</p>
459<pre><code>import java.io.IOException;
460import java.util.Iterator;
461
462import org.apache.hadoop.io.IntWritable;
463import org.apache.hadoop.io.Text;
464import org.apache.hadoop.mapreduce.Reducer;
465
466public class WordCountReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
467 @Override
468 public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context)
469 throws IOException, InterruptedException {
470 int count = 0;
471 for (IntWritable value : values) {
472 count += value.get();
473 }
474 context.write(key, new IntWritable(count));
475 }
476}
477</code></pre>
478<p>Let’s just take a moment to appreciate how absolutely tiny this code is, and it’s Java! Hadoop’s API is really awesome and lets us write such concise code to achieve what we need.</p>
479<p>Last, let’s write the <code>main</code> method, or else we won’t be able to run it. In our new file <code>WordCount.java</code>:</p>
480<pre><code>import org.apache.hadoop.fs.Path;
481import org.apache.hadoop.io.IntWritable;
482import org.apache.hadoop.io.Text;
483import org.apache.hadoop.mapreduce.Job;
484import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
485import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
486
487public class WordCount {
488 public static void main(String[] args) throws Exception {
489 if (args.length != 2) {
490 System.err.println(&quot;usage: java WordCount &lt;input path&gt; &lt;output path&gt;&quot;);
491 System.exit(-1);
492 }
493
494 Job job = Job.getInstance();
495
496 job.setJobName(&quot;Word count&quot;);
497 job.setJarByClass(WordCount.class);
498 job.setMapperClass(WordCountMapper.class);
499 job.setReducerClass(WordCountReducer.class);
500 job.setOutputKeyClass(Text.class);
501 job.setOutputValueClass(IntWritable.class);
502
503 FileInputFormat.addInputPath(job, new Path(args[0]));
504 FileOutputFormat.setOutputPath(job, new Path(args[1]));
505
506 boolean result = job.waitForCompletion(true);
507
508 System.exit(result ? 0 : 1);
509 }
510}
511</code></pre>
512<p>And compile by including the required <code>.jar</code> dependencies in Java’s classpath with the <code>-cp</code> switch:</p>
513<pre><code>javac -cp &quot;hadoop-3.2.1/share/hadoop/common/*:hadoop-3.2.1/share/hadoop/mapreduce/*&quot; *.java
514</code></pre>
515<p>At last, we can run it (also specifying the dependencies in the classpath, this one’s a mouthful). Let’s run it on the same <code>WordCount.java</code> source file we wrote:</p>
516<pre><code>java -cp &quot;.:hadoop-3.2.1/share/hadoop/common/*:hadoop-3.2.1/share/hadoop/common/lib/*:hadoop-3.2.1/share/hadoop/mapreduce/*:hadoop-3.2.1/share/hadoop/mapreduce/lib/*:hadoop-3.2.1/share/hadoop/yarn/*:hadoop-3.2.1/share/hadoop/yarn/lib/*:hadoop-3.2.1/share/hadoop/hdfs/*:hadoop-3.2.1/share/hadoop/hdfs/lib/*&quot; WordCount WordCount.java results
517</code></pre>
518<p>Hooray! We should have a new <code>results/</code> folder along with the following files:</p>
519<pre><code>$ ls results
520part-r-00000 _SUCCESS
521$ cat results/part-r-00000
522 154
5230 2
5241 3
5252 1
526addinputpath 1
527apache 6
528args 4
529boolean 1
530class 6
531count 1
532err 1
533exception 1
534-snip- (output cut for clarity)
535</code></pre>
536<p>It worked! Now this example was obviously tiny, but hopefully enough to demonstrate how to get the basics running on real world data.</p>
537</content>
538 </entry>
539 <entry xml:lang="en">
540 <title>Introduction to Hadoop and its MapReduce</title>
541 <published>2020-04-01T01:00:00+00:00</published>
542 <updated>2020-04-03T08:43:44+00:00</updated>
543 <link href="https://lonami.dev/blog/ribw/introduction-to-hadoop-and-its-mapreduce/" type="text/html"/>
544 <id>https://lonami.dev/blog/ribw/introduction-to-hadoop-and-its-mapreduce/</id>
545 <content type="html"><p>Hadoop is an open-source, free, Java-based programming framework that helps processing large datasets in a distributed environment and the problems that arise when trying to harness the knowledge from BigData, capable of running on thousands of nodes and dealing with petabytes of data. It is based on Google File System (GFS) and originated from the work on the Nutch open-source project on search engines.</p>
546<p>Hadoop also offers a distributed filesystem (HDFS) enabling for fast transfer among nodes, and a way to program with MapReduce.</p>
547<p>It aims to strive for the 4 V’s: Volume, Variety, Veracity and Velocity. For veracity, it is a secure environment that can be trusted.</p>
548<h2 id="milestones">Milestones</h2>
549<p>The creators of Hadoop are Doug Cutting and Mike Cafarella, who just wanted to design a search engine, Nutch, and quickly found the problems of dealing with large amounts of data. They found their solution with the papers Google published.</p>
550<p>The name comes from the plush of Cutting’s child, a yellow elephant.</p>
551<ul>
552<li>In July 2005, Nutch used GFS to perform MapReduce operations.</li>
553<li>In February 2006, Nutch started a Lucene subproject which led to Hadoop.</li>
554<li>In April 2007, Yahoo used Hadoop in a 1 000-node cluster.</li>
555<li>In January 2008, Apache took over and made Hadoop a top-level project.</li>
556<li>In July 2008, Apache tested a 4000-node cluster. The performance was the fastest compared to other technologies that year.</li>
557<li>In May 2009, Hadoop sorted a petabyte of data in 17 hours.</li>
558<li>In December 2011, Hadoop reached 1.0.</li>
559<li>In May 2012, Hadoop 2.0 was released with the addition of YARN (Yet Another Resource Navigator) on top of HDFS, splitting MapReduce and other processes into separate components, greatly improving the fault tolerance.</li>
560</ul>
561<p>From here onwards, many other alternatives have born, like Spark, Hive &amp; Drill, Kafka, HBase, built around the Hadoop ecosystem.</p>
562<p>As of 2017, Amazon has clusters between 1 and 100 nodes, Yahoo has over 100 000 CPUs running Hadoop, AOL has clusters with 50 machines, and Facebook has a 320-machine (2 560 cores) and 1.3PB of raw storage.</p>
563<h2 id="why-not-use-rdbms">Why not use RDBMS?</h2>
564<p>Relational database management systems simply cannot scale horizontally, and vertical scaling will require very expensive servers. Similar to RDBMS, Hadoop has a notion of jobs (analogous to transactions), but without ACID or concurrency control. Hadoop supports any form of data (unstructured or semi-structured) in read-only mode, and failures are common but there’s a simple yet efficient fault tolerance.</p>
565<p>So what problems does Hadoop solve? It solves the way we should think about problems, and distributing them, which is key to do anything related with BigData nowadays. We start working with clusters of nodes, and coordinating the jobs between them. Hadoop’s API makes this really easy.</p>
566<p>Hadoop also takes very seriously the loss of data with replication, and if a node falls, they are moved to a different node.</p>
567<h2 id="major-components">Major components</h2>
568<p>The previously-mentioned HDFS runs on commodity machine, which are cost-friendly. It is very fault-tolerant and efficient enough to process huge amounts of data, because it splits large files into smaller chunks (or blocks) that can be more easily handled. Multiple nodes can work on multiple chunks at the same time.</p>
569<p>NameNode stores the metadata of the various datablocks (map of blocks) along with their location. It is the brain and the master in Hadoop’s master-slave architecture, also known as the namespace, and makes use of the DataNode.</p>
570<p>A secondary NameNode is a replica that can be used if the first NameNode dies, so that Hadoop doesn’t shutdown and can restart.</p>
571<p>DataNode stores the blocks of data, and are the slaves in the architecture. This data is split into one or more files. Their only job is to manage this access to the data. They are often distributed among racks to avoid data lose.</p>
572<p>JobTracker creates and schedules jobs from the clients for either map or reduce operations.</p>
573<p>TaskTracker runs MapReduce tasks assigned to the current data node.</p>
574<p>When clients need data, they first interact with the NameNode and replies with the location of the data in the correct DataNode. Client proceeds with interaction with the DataNode.</p>
575<h2 id="mapreduce">MapReduce</h2>
576<p>MapReduce, as the name implies, is split into two steps: the map and the reduce. The map stage is the «divide and conquer» strategy, while the reduce part is about combining and reducing the results.</p>
577<p>The mapper has to process the input data (normally a file or directory), commonly line-by-line, and produce one or more outputs. The reducer uses all the results from the mapper as its input to produce a new output file itself.</p>
578<p><img src="https://lonami.dev/blog/ribw/introduction-to-hadoop-and-its-mapreduce/bitmap.png" alt="" /></p>
579<p>When reading the data, some may be junk that we can choose to ignore. If it is valid data, however, we label it with a particular type that can be useful for the upcoming process. Hadoop is responsible for splitting the data accross the many nodes available to execute this process in parallel.</p>
580<p>There is another part to MapReduce, known as the Shuffle-and-Sort. In this part, types or categories from one node get moved to a different node. This happens with all nodes, so that every node can work on a complete category. These categories are known as «keys», and allows Hadoop to scale linearly.</p>
581<h2 id="references">References</h2>
582<ul>
583<li><a href="https://youtu.be/oT7kczq5A-0">YouTube – Hadoop Tutorial For Beginners | What Is Hadoop? | Hadoop Tutorial | Hadoop Training | Simplilearn</a></li>
584<li><a href="https://youtu.be/bcjSe0xCHbE">YouTube – Learn MapReduce with Playing Cards</a></li>
585<li><a href="https://youtu.be/j8ehT1_G5AY?list=PLi4tp-TF_qjM_ed4lIzn03w7OnEh0D8Xi">YouTube – Video Post #2: Hadoop para torpes (I)-¿Qué es y para qué sirve?</a></li>
586<li><a href="https://youtu.be/NQ8mjVPCDvk?list=PLi4tp-TF_qjM_ed4lIzn03w7OnEh0D8Xi">Video Post #3: Hadoop para torpes (II)-¿Cómo funciona? HDFS y MapReduce</a></li>
587<li><a href="https://hadoop.apache.org/old/releases.html">Apache Hadoop Releases</a></li>
588<li><a href="https://youtu.be/20qWx2KYqYg?list=PLi4tp-TF_qjM_ed4lIzn03w7OnEh0D8Xi">Video Post #4: Hadoop para torpes (III y fin)- Ecosistema y distribuciones</a></li>
589<li><a href="http://www.hadoopbook.com/">Chapter 2 – Hadoop: The Definitive Guide, Fourth Edition</a> (<a href="http://grut-computing.com/HadoopBook.pdf">pdf,</a><a href="http://www.hadoopbook.com/code.html">code</a>)</li>
590</ul>
591</content>
592 </entry>
593 <entry xml:lang="en">
594 <title>Google’s BigTable</title>
595 <published>2020-04-01T00:00:00+00:00</published>
596 <updated>2020-04-03T09:30:05+00:00</updated>
597 <link href="https://lonami.dev/blog/ribw/googles-bigtable/" type="text/html"/>
598 <id>https://lonami.dev/blog/ribw/googles-bigtable/</id>
599 <content type="html"><p>Let’s talk about BigTable, and why it is what it is. But before we get into that, let’s see some important aspects anybody should consider when dealing with a lot of data (something BigTable does!).</p>
600<h2 id="the-basics">The basics</h2>
601<p>Converting a text document into a different format is often a great way to greatly speed up scanning of it in the future. It allows for efficient searches.</p>
602<p>In addition, you generally want to store everything in a single, giant file. This will save a lot of time opening and closing files, because everything is in the same file! One proposal to make this happen is <a href="https://trec.nist.gov/file_help.html">Web TREC</a> (see also the <a href="https://en.wikipedia.org/wiki/Text_Retrieval_Conference">Wikipedia page on TREC</a>), which is basically HTML but every document is properly delimited from one another.</p>
603<p>Because we will have a lot of data, it’s often a good idea to compress it. Most text consists of the same words, over and over again. Classic compression techniques such as <code>DEFLATE</code> or <code>LZW</code> do an excellent job here.</p>
604<h2 id="so-what-s-bigtable">So what’s BigTable?</h2>
605<p>Okay, enough of an introduction to the basics on storing data. BigTable is what Google uses to store documents, and it’s a customized approach to save, search and update web pages.</p>
606<p>BigTable is is a distributed storage system for managing structured data, able to scale to petabytes of data across thousands of commodity servers, with wide applicability, scalability, high performance, and high availability.</p>
607<p>In a way, it’s kind of like databases and shares many implementation strategies with them, like parallel databases, or main-memory databases, but of course, with a different schema.</p>
608<p>It consists of a big table known as the «Root tablet», with pointers to many other «tablets» (or metadata in between). These are stored in a replicated filesystem accessible by all BigTable servers. Any change to a tablet gets logged (said log also gets stored in a replicated filesystem).</p>
609<p>If any of the tablets servers gets locked, a different one can take its place, read the log and deal with the problem.</p>
610<p>There’s no query language, transactions occur at row-level only. Every read or write in a row is atomic. Each row stores a single web page, and by combining the row and column keys along with a timestamp, it is possible to retrieve a single cell in the row. More formally, it’s a map that looks like this:</p>
611<pre><code>fetch(row: string, column: string, time: int64) -&gt; string
612</code></pre>
613<p>A row may have as many columns as it needs, and these column groups are the same for everyone (but the columns themselves may vary), which is importan to reduce disk read time.</p>
614<p>Rows are split in different tablets based on the row keys, which simplifies determining an appropriated server for them. The keys can be up to 64KB big, although most commonly they range 10-100 bytes.</p>
615<h2 id="conclusions">Conclusions</h2>
616<p>BigTable is Google’s way to deal with large amounts of data on many of their services, and the ideas behind it are not too complex to understand.</p>
617</content>
618 </entry>
619 <entry xml:lang="en">
620 <title>A practical example with Hadoop</title>
621 <published>2020-03-30T01:00:00+00:00</published>
622 <updated>2020-04-18T13:25:43+00:00</updated>
623 <link href="https://lonami.dev/blog/mdad/a-practical-example-with-hadoop/" type="text/html"/>
624 <id>https://lonami.dev/blog/mdad/a-practical-example-with-hadoop/</id>
625 <content type="html"><p>In our <a href="/blog/mdad/introduction-to-hadoop-and-its-mapreduce/">previous Hadoop post</a>, we learnt what it is, how it originated, and how it works, from a theoretical standpoint. Here we will instead focus on a more practical example with Hadoop.</p>
626<p>This post will reproduce the example on Chapter 2 of the book <a href="http://www.hadoopbook.com/">Hadoop: The Definitive Guide, Fourth Edition</a> (<a href="http://grut-computing.com/HadoopBook.pdf">pdf,</a><a href="http://www.hadoopbook.com/code.html">code</a>), that is, finding the maximum global-wide temperature for a given year.</p>
627<h2 id="installation">Installation</h2>
628<p>Before running any piece of software, its executable code must first be downloaded into our computers so that we can run it. Head over to <a href="http://hadoop.apache.org/releases.html">Apache Hadoop’s releases</a> and download the <a href="https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz">latest binary version</a> at the time of writing (3.2.1).</p>
629<p>We will be using the <a href="https://linuxmint.com/">Linux Mint</a> distribution because I love its simplicity, although the process shown here should work just fine on any similar Linux distribution such as <a href="https://ubuntu.com/">Ubuntu</a>.</p>
630<p>Once the archive download is complete, extract it with any tool of your choice (graphical or using the terminal) and execute it. Make sure you have a version of Java installed, such as <a href="https://openjdk.java.net/">OpenJDK</a>.</p>
631<p>Here are all the three steps in the command line:</p>
632<pre><code>wget https://apache.brunneis.com/hadoop/common/hadoop-3.2.1/hadoop-3.2.1.tar.gz
633tar xf hadoop-3.2.1.tar.gz
634hadoop-3.2.1/bin/hadoop version
635</code></pre>
636<p>We will be using the two example data files that they provide in <a href="https://github.com/tomwhite/hadoop-book/tree/master/input/ncdc/all">their GitHub repository</a>, although the full dataset is offered by the <a href="https://www.ncdc.noaa.gov/">National Climatic Data Center</a> (NCDC).</p>
637<p>We will also unzip and concatenate both files into a single text file, to make it easier to work with. As a single command pipeline:</p>
638<pre><code>curl https://raw.githubusercontent.com/tomwhite/hadoop-book/master/input/ncdc/all/190{1,2}.gz | gunzip &gt; 190x
639</code></pre>
640<p>This should create a <code>190x</code> text file in the current directory, which will be our input data.</p>
641<h2 id="processing-data">Processing data</h2>
642<p>To take advantage of Hadoop, we have to design our code to work in the MapReduce model. Both the map and reduce phase work on key-value pairs as input and output, and both have a programmer-defined function.</p>
643<p>We will use Java, because it’s a dependency that we already have anyway, so might as well.</p>
644<p>Our map function needs to extract the year and air temperature, which will prepare the data for later use (finding the maximum temperature for each year). We will also drop bad records here (if the temperature is missing, suspect or erroneous).</p>
645<p>Copy or reproduce the following code in a file called <code>MaxTempMapper.java</code>, using any text editor of your choice:</p>
646<pre><code>import java.io.IOException;
647
648import org.apache.hadoop.io.IntWritable;
649import org.apache.hadoop.io.LongWritable;
650import org.apache.hadoop.io.Text;
651import org.apache.hadoop.mapreduce.Mapper;
652
653public class MaxTempMapper extends Mapper&lt;LongWritable, Text, Text, IntWritable&gt; {
654 private static final int TEMP_MISSING = 9999;
655 private static final String GOOD_QUALITY_RE = &quot;[01459]&quot;;
656
657 @Override
658 public void map(LongWritable key, Text value, Context context)
659 throws IOException, InterruptedException {
660 String line = value.toString();
661 String year = line.substring(15, 19);
662 String temp = line.substring(87, 92).replaceAll(&quot;^\\+&quot;, &quot;&quot;);
663 String quality = line.substring(92, 93);
664
665 int airTemperature = Integer.parseInt(temp);
666 if (airTemperature != TEMP_MISSING &amp;&amp; quality.matches(GOOD_QUALITY_RE)) {
667 context.write(new Text(year), new IntWritable(airTemperature));
668 }
669 }
670}
671</code></pre>
672<p>Now, let’s create the <code>MaxTempReducer.java</code> file. Its job is to reduce the data from multiple values into just one. We do that by keeping the maximum out of all the values we receive:</p>
673<pre><code>import java.io.IOException;
674import java.util.Iterator;
675
676import org.apache.hadoop.io.IntWritable;
677import org.apache.hadoop.io.Text;
678import org.apache.hadoop.mapreduce.Reducer;
679
680public class MaxTempReducer extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
681 @Override
682 public void reduce(Text key, Iterable&lt;IntWritable&gt; values, Context context)
683 throws IOException, InterruptedException {
684 Iterator&lt;IntWritable&gt; iter = values.iterator();
685 if (iter.hasNext()) {
686 int maxValue = iter.next().get();
687 while (iter.hasNext()) {
688 maxValue = Math.max(maxValue, iter.next().get());
689 }
690 context.write(key, new IntWritable(maxValue));
691 }
692 }
693}
694</code></pre>
695<p>Except for some Java weirdness (…why can’t we just iterate over an <code>Iterator</code>? Or why can’t we just manually call <code>next()</code> on an <code>Iterable</code>?), our code is correct. There can’t be a maximum if there are no elements, and we want to avoid dummy values such as <code>Integer.MIN_VALUE</code>.</p>
696<p>We can also take a moment to appreciate how absolutely tiny this code is, and it’s Java! Hadoop’s API is really awesome and lets us write such concise code to achieve what we need.</p>
697<p>Last, let’s write the <code>main</code> method, or else we won’t be able to run it. In our new file <code>MaxTemp.java</code>:</p>
698<pre><code>import org.apache.hadoop.fs.Path;
699import org.apache.hadoop.io.IntWritable;
700import org.apache.hadoop.io.Text;
701import org.apache.hadoop.mapreduce.Job;
702import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
703import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
704
705public class MaxTemp {
706 public static void main(String[] args) throws Exception {
707 if (args.length != 2) {
708 System.err.println(&quot;usage: java MaxTemp &lt;input path&gt; &lt;output path&gt;&quot;);
709 System.exit(-1);
710 }
711
712 Job job = Job.getInstance();
713
714 job.setJobName(&quot;Max temperature&quot;);
715 job.setJarByClass(MaxTemp.class);
716 job.setMapperClass(MaxTempMapper.class);
717 job.setReducerClass(MaxTempReducer.class);
718 job.setOutputKeyClass(Text.class);
719 job.setOutputValueClass(IntWritable.class);
720
721 FileInputFormat.addInputPath(job, new Path(args[0]));
722 FileOutputFormat.setOutputPath(job, new Path(args[1]));
723
724 boolean result = job.waitForCompletion(true);
725
726 System.exit(result ? 0 : 1);
727 }
728}
729</code></pre>
730<p>And compile by including the required <code>.jar</code> dependencies in Java’s classpath with the <code>-cp</code> switch:</p>
731<pre><code>javac -cp &quot;hadoop-3.2.1/share/hadoop/common/*:hadoop-3.2.1/share/hadoop/mapreduce/*&quot; *.java
732</code></pre>
733<p>At last, we can run it (also specifying the dependencies in the classpath, this one’s a mouthful):</p>
734<pre><code>java -cp &quot;.:hadoop-3.2.1/share/hadoop/common/*:hadoop-3.2.1/share/hadoop/common/lib/*:hadoop-3.2.1/share/hadoop/mapreduce/*:hadoop-3.2.1/share/hadoop/mapreduce/lib/*:hadoop-3.2.1/share/hadoop/yarn/*:hadoop-3.2.1/share/hadoop/yarn/lib/*:hadoop-3.2.1/share/hadoop/hdfs/*:hadoop-3.2.1/share/hadoop/hdfs/lib/*&quot; MaxTemp 190x results
735</code></pre>
736<p>Hooray! We should have a new <code>results/</code> folder along with the following files:</p>
737<pre><code>$ ls results
738part-r-00000 _SUCCESS
739$ cat results/part-r-00000
7401901 317
7411902 244
742</code></pre>
743<p>It worked! Now this example was obviously tiny, but hopefully enough to demonstrate how to get the basics running on real world data.</p>
744</content>
745 </entry>
746</feed>