Protohackers: Tips for getting started

Using Python
Hosting
Debugging
Bits and pieces
Useful things

Protohackers is a new "casual programming challenge in which you create servers for network protocols" created by (fellow Brit) James Stanley.

Before starting the first challenge (0: Smoke Test) I had no clue what I was doing, but it sounded like fun. I figured it wouldn't be too difficult if I wrote the server in Python, which largely proved to be true. But I also hit a few walls and needed some "new to me" debugging techniques.

With that in mind, I decided to do a small write up for people who are feeling overwhelmed or stuck. I'll avoid substantial spoilers or code, but will talk a little about problems I encountered.

Using Python

Python has (at least) 3 modules for working with sockets. There's the socket module, socketserver and chunks of asyncio. That means there's a lot of documentation and it can be tricky to look things up because people do things in lots of different ways.

asyncio is all about concurrency, which is good for web servers. But asyncio streams introduce a lot of abstraction and new concepts, meaning they're less useful if you want to learn about networking.

socketserver "simplifies the task of writing network servers", to quote the docs. Useful in production code but, again, less useful if you want to learn about networking.

socket is what serversocket is built on. It's much closer to the C and Linux socket APIs and means you have to do more work to get a working server. But this is Python, so "more work" means creating a basic server takes 10-15 lines of code.

Hosting

Your server has to be publicly accessible so the Protohackers testing system can send requests and see if the server passes the tests.

I ran my server as a normal Python script, e.g. python 0_smoke_test.py and stopped the program when the tests were finished. It doesn't need a fancy setup.

For some people, the sticking point will be the hosting itself.

You can host your server at home. My knowledge of networking is almost 0, so I didn't feel like opening a port on my home router!

For cloud hosting, the cheapest option is probably Google Cloud Platform, which provides an "always free" e2-micro virtual machine. Smaller companies like Hetzner charge pennies per hour.

If you've never run a virtual machine in the cloud before, Protohackers is a good way to start. You don't have to get a complex production environment running; all you need is an operating system and appropriate version of Python.

Cloud providers generally have standard "images" of operating systems like Ubuntu and Fedora and let you pick which OS runs on your virtual machine. Even better, recent versions of Ubuntu, Fedora and so on usually have recent versions of Python.

Firewalls

Making your server accessible from the outside world usually means configuring a firewall.

I was using ufw as a firewall and added a rule to unblock the port where the "smoke test" server was running. Bizarrely, Protohackers still couldn't access it. I'd forgotten that Hetzner has a separate firewall controlled through the account page, so connections were still being refused on the port.

Debugging

At a few points I had no idea how to start debugging a test failure. The Protohackers tests provide little information about what went wrong and it became obvious that I needed a lot more insight into what was happening.

It's crucial to be able to send your own requests to your server and see how it responds. I mostly used telnet to send requests. If you aren't familiar with something like telnet, you might find it easier to write a little Python client to send requests.

To see what messages are getting passed around, the Protohackers "help" page recommended tcpdump. Julia Evans's free Let's learn tcpdump! zine turned out to be exactly what I needed. It explains the basics of the tcpdump output and common flags. I found I just wanted to "listen on all interfaces, listen to a specific port" and to control whether tcpdump printed packet contents or not. (Seeing the packet content is sometimes useful, sometimes overwhelming.)

It's also useful to have logging (or prints) in the server itself. One "Aha!" moment came when I realised that a particular request appeared in the tcpdump output but not the server logs.

Bits and pieces

Don't worry too much about async or multi-threaded code. You can definitely complete 0: Smoke Test with ordinary, synchronous code. (I know because that's how I wrote my first version.) For 1: Prime Time you might find that a single process struggles to handle the multi-client tests, but you can always switch to a concurrent implementation after the core code is written.

TCP is a "streaming protocol". This becomes more relevant in 1: Prime Time, where the specification says that each request and each response is "a single line containing a JSON object, terminated by a newline character ('\n', or ASCII 10)". But TCP doesn't guarantee that you'll receive exactly one of those JSON messages at a time. You might only receive a few bytes (just part of the message) or, because the client can reuse the connection, you might receive several messages at once. It's something that your server needs to be able to handle.

If you're undecided about trying Protohackers: practically everything in this post is something I learned while doing Protohackers challenges 0 Smoke Test and 1 Prime Time. (Admittedly I already knew most of the content in the Hosting section.) Protohackers was a really fun way to start learning about networking. Perhaps you'll enjoy it too.

Useful things

Digital Ocean has a good UFW essentials article
Digital Ocean also has a good article about setting up a Ubuntu server
The Let's learn tcpdump! zine by Julia Evans quickly teaches everything you're likely to need to know about tcpdump to play Protohackers
Over the Wire (specifically the Bandit game) is a fun way to learn Linux commands like ssh and telnet. Other games are about cybersecurity. Here in the UK your ISP might block it as a "hacking website", in which case you'll need to fiddle with the parental controls that some ISPs turn on by default.