content/blog/self-host.md (view raw)
1+++
2title = "Self-hosting Extravaganza"
3date = 2023-07-16
4template = "blog-page.html"
5[taxonomies]
6tags = [ "advice", "foss", "privacy" ]
7+++
8
9Lately, more and more companies are putting their services behind paywalls, usage limits and closed APIs. Some examples are [Twitter](https://nitter.it/elonmusk/status/1675187969420828672) limiting the number of tweets a non-paying user can read, [Reddit](https://www.redditinc.com/blog/2023apiupdates) increasing their API price to an extent that's unbearable for any normal individual and [YouTube](https://libreddit.kavin.rocks/r/youtube/comments/14kmd07/youtube_cracking_down_on_if_youre_not_paying_them/) starting to block their service towards anyone using an adblock extension.
10
11## There must be a better way
12Luckily, I've been interested in [alternative front-ends](https://github.com/mendel5/alternative-front-ends) for a while. These services allow you to get the same (or better) functionality as their corporate counterpart without giving away any of your information in return. Some of these even offer their own free APIs.
13
14Here's my favorite instances with respect to the service they provide:
15
16| Service | PC | Mobile |
17|---------|--------------------------------------------|---------------------------------------------------------------------------------|
18| YouTube | [Invidious](https://y.com.sb/) | [NewPipe](https://apt.izzysoft.de/fdroid/index/apk/org.polymorphicshade.newpipe)|
19| Twitter | [Nitter](https://nitter.it) | [Squawker](https://apt.izzysoft.de/fdroid/index/apk/org.ca.squawker) |
20| Reddit | [LibReddit](https://libreddit.kavin.rocks) | [LibReddit](https://libreddit.kavin.rocks) |
21| Medium | [Scribe](https://scribe.rip) | [Scribe](https://scribe.rip/) |
22
23## Drawbacks
24Of course, this is not a perfect solution. There are a lot of problems to be discussed.
25
26### Privacy
27First and foremost, these instances do not make any profit. This is not a problem until you really think about it. Can you really trust a random developer offering a (paid) service for thousands of users out of their own kindness?
28The answer is "probably yes", but are you willing to take this risk?
29
30Instance admins could easily edit the upstream source code to make it so they can track their users indefinetly and sell usage data without them even realizing.
31This is a given if you use any "normal" (not self-hosted) service, but the difference is big companies are _required_ by GDPR to protect collected user data in a certain way and keep them for a maximum set amount of time.
32
33The same cannot be assured for individuals who apparently don't even make a profit for what they're doing.
34
35### Scaling
36This buzzword has become a meme in the programming world, but it's been shown how important it is to consider when dealing with large userbases that can grow exponentially without any warning.
37
38Think about the amount of users who migrated to Mastodon immediately after Elon Musk acquired Twitter. Instance admins were used to having a couple hundred users, so hundred of thousands of new signups made a lot of popular instances slow down or even temporarily shut down while they migrated to new (and more expensive) hardware.
39
40Anything public you use can be subject to this phenomenon, leading to poor user experience, as you'll be one of the many people wondering why your feed takes one minute to load.
41
42## Fine, I'll do it myself
43
44Since joining the world of minimalism, I had always considered Docker as a bloated way to run multiple virtual machines. I read about people complaining that even simple Python scripts were providing `Dockerfile` and `docker-compose.yml` files and I started seeing it as a bloaty way to achieve the same result.
45
46Whenever I wanted to host anything by myself, I used to SSH into my VPS with password authentication (!!!) and expose a public port for each service (!!!).
47I used my public IP address to log into my services, so I had to resort to sending cleartext passwords through HTTP (!!!) since TLS was not an option.
48
49Of course, this is possibly the most insecure way to host services on a public server, but I felt that was "secure enough" and nobody would ever be interested in hacking me (!!! × ∞).
50
51Nonetheless, I used to `cat /var/log/auth.log` to see all the failed login attempts, and pray that nobody actually got my password right.
52Nowadays, I look back and laugh at my previous config; at least I'm (almost) sure that nobody actually managed to get in.
53
54## The right way
55Since I started my new job, I also began experimenting with Docker and found out it's not as bad as I thought it'd be. I will now let my previous config serve as the perfect example of how NOT to secure your VPS correctly for any self-hosting configuration.
56
57### Ditch password authentication
58First of all, password authentication. You'll be a lot safer as soon as you disable it.
59
60Having it enabled means you're vulnerable to dictionary and bruteforce attacks. Also, if some new vulnerability is published, the password field is one more way the attacker could send a malicious string to get inside (see [the log4j incident](https://scribe.rip/geekculture/the-log4j-incident-explained-ed0ce6d36df2)).
61
62A better way of logging into your VPS is through public key authentication.
63
64First, generate a key on your own PC:
65```
66ssh-keygen -t ed25519 -a 100
67```
68
69This will create two files: `~/.ssh/id_ed25519.pub` and `~/.ssh/id_ed25519`
70
71Now, use the following command to copy your key over to the VPS:
72
73```
74ssh-copy-id -i ~/.ssh/id_ed25519 <user>@<host>
75```
76
77At this point, if everything went correctly, just add or change the following line in `/etc/ssh/sshd_config`:
78```
79PasswordAuthentication no
80```
81At this point, you should be able to log into your VPS without the need to input your password, which is more secure as well as more convenient.
82
83I keep the content of my public and private ssh key files saved as secure notes in my BitWarden account, so I can take them to any PC I want to access my VPS from.
84People say this is bad practice (you should only have a key for each host), but I personally feel like it's not that big of a deal compared to the security mess I had going on before.
85
86### Containerize your applications
87Now that you have a safe way to SSH into your machine, you can start hosting your own services.
88
89First, some terminology:
90* `Dockerfile` files are like a list of ingredients. They contain every dependency needed to build a minimal operating system dedicated to running a program. They're used to build images.
91* `Images` are like recipes. You can create some yourself from a Dockerfile or download them from an external repository. They can be instantiated as containers.
92* `Containers` are like courses. You can instantiate multiple equal courses from the same image and you can actually eat (use) them! They can be managed through `docker-compose`.
93* `docker-compose.yml` files are like menus. They're a convenient way to instantiate and deinstantiate multiple containers in a specific and reproducible configuration. If you're not a developer, you'll be mainly working on these files.
94
95To get started with Docker, install `docker` and `docker-compose` via your package manager of choice. If you want an easy start, you can follow [this guide](https://docs.invidious.io/installation/#docker-compose-method-production) to host our own Invidious instance.
96
97It's not that hard, but you might need to read the official [Docker Compose documentation](https://docs.docker.com/compose/) if something doesn't go as planned.
98
99My advice is to generate an `hmac_key` using `pwgen 20 1` or `openssl rand -hex 20` and insert it in the correct spot inside `docker-compose.yml`.
100
101Also, remove the `127.0.0.1:` part in the `ports` section since we don't have a reverse proxy set up (yet).
102
103After you're done configuring, you can type `docker-compose up -d` to pull all required images and instantiate your containers, and `docker-compose down` if you want to stop and remove everything.
104
105### Use a reverse proxy
106If you've followed that guide correctly, you should now have two containers that communicate through a network. You can find out their names by running `docker ps -a`. Take note of the name of your main invidious container, which will be referred as `invidious` for the rest of this guide.
107
108Problem is, you're still using an IP address and communicating in cleartext through HTTP! This means your ISP can read every single detail in every single request you make.
109
110Luckily, there is a way to get a cool domain name for free that also happens to include free and auto-generated TLS certificates.
111
112First, create an account on [DuckDNS](https://www.duckdns.org/) and set up a free domain.
113
114Just make a new directory near the one you used for Invidious and create a new `docker-compose.yml`:
115```
116mkdir swag
117cd swag
118nano docker-compose.yml
119```
120You can paste and edit accordingly the lines in [this guide](https://docs.linuxserver.io/general/swag#creating-a-swag-container).
121
122For example, instead of `DNSPLUGIN=cloudflare` you should have `DNSPLUGIN=duckdns`.
123
124When you're done, start your container with `docker-compose up -d`. This will create the config folder in `/etc/config/swag` as well as a new network called `swag_default`.
125
126Now we need to create a custom subdomain for Invidious. You can do it by creating the following file: `/etc/config/swag/nginx/proxy-confs/invidious.subdomain.conf` with this content:
127
128```
129server {
130 listen 443 ssl http2;
131 listen [::]:443 ssl http2;
132
133 server_name y.*;
134
135 include /config/nginx/ssl.conf;
136
137 client_max_body_size 0;
138
139 location / {
140 include /config/nginx/proxy.conf;
141 include /config/nginx/resolver.conf;
142 set $upstream_app invidious;
143 set $upstream_port 3000;
144 set $upstream_proto http;
145 proxy_pass $upstream_proto://$upstream_app:$upstream_port;
146 }
147}
148```
149
150Where:
151* `server_name yt.*`: `yt` is the subdomain of choice;
152* `set $upstream_app invidious;`: `invidious` is the name of the main Invidious container;
153* `set $upstream_port 3000;`: `3000` is the Invidious port.
154
155There's one last step remaining. Invidious and Swag are two separate containers, so they cannot communicate unless they're connected to the same network. You can connect Invidious to Swag's network with the following command, where `invidious` is the name of your main Invidious container.
156
157```
158docker network connect swag_default invidious
159```
160
161Finally, you can visit https://yt.<yourdomain>.duckdns.org/ and check if you can access Invidious through HTTPS.
162
163Note: now that you have a reverse proxy set up, you can remove your `ports:` section entirely from Invidious' `docker-compose.yml`.
164You can do this because the containers are communicating internally to the `swag_default` network, without the need to expose any ports to the outside.
165After you're done, remember to reload your configuration by running `docker-compose restart` in your Invidious folder.
166
167Ideally, the only container with exposed ports in your VPS should be Swag exposing ports 80 (HTTP) and 443 (HTTPS).
168
169## Conclusion
170Self-hosting is not easy. It's been my [Camino de Santiago](https://wiki.froth.zone/wiki/Camino_de_Santiago): a long path of redemption for the sins I have committed in my young age.
171Even if I made a lot of mistakes, in the end I've learned a lot about dev-ops and cybersecurity, as well as precious skills that proved themselves useful for my engineering job.
172
173You can find a full list of self-hostable services [here](https://github.com/awesome-selfhosted/awesome-selfhosted)!