David Hanny

PhD Researcher | Spatial Data Science

Create your own budget High Performance Computing cluster with a Raspberry Pi and Google Colab | David Hanny

Create your own budget High Performance Computing cluster with a Raspberry Pi and Google Colab

January 10, 2025, 3 minute read

In my dayjob as a researcher, I frequently run large-scale data analysis workflows that involve all kinds of processing steps including the training and application of Large Language Models (LLMs) and Application Programming Interface (API) calls. While there are some compute resources available for researchers, including University-wide and national computing clusters, access to them is not always trivial. Frequently, I find myself running a workflow on my personal computer or Google Colab instead because it requires much less overhead and can be used instantaneously. In addition, I am able to use both for private projects as well.

However, there are several drawbacks to this:

Both issues can easily be resolved by using platforms like Lightning AI (which I personally highly recommend). However, Lightning AI is 2-3 times more expensive compared to Google Colab and easy solutions are boring. So why not come up with a hacky solution for setting up your own budget HPC cluster.

Prerequistes

For the proposed setup you need:

The Raspberry Pi can be used for low effort jobs like repeated API calls for scraping or translation. Google Colab, on the other hand, can be utilised for more resource-intensive jobs like model training or inference. The development of these scripts should still happen on your main computer. You can synchronise your projects with Git (see here how to use Git in Google Colab).

Component 1: Raspberry Pi

The Raspberry Pi is the basis of our HPC setup. It acts as a low-cost and energy-efficient computer for low-effort but long-running tasks. Assuming that you have installed a proper OS, I recommend to also install an application for remote administration like Raspberry Pi Connect or TightVNC. We need a desktop environment with a webbrowser. SSH is not enough.

To run basic Python notebooks, I use Miniconda with Visual Studio Code. Files can be synced via Git and data can copied via SCP or Google Drive (which is what I use).

Component 2: Google Colab

The more interesting part of setup is Google Colab. The main issue with longer Colab sessions for me always is that they disconnect if I close or lock my laptop. This can be circumvented with the Raspberry Pi. If you have a longer session to run, use the browser on your Raspberri Pi (via Raspberry Pi Connect or VNC). With this setup, you can use any automation library (e.g. pyautogui) to simulate user input which will keep the Colab session running.

While I generally do recommend Python in combination with pyautogui for automation, this solution unfortunately does not work on the newest Raspberry Pi OS due to Wayland display protocol. Programmatically moving the mouse on Wayland is really cumbersome. One alternative I found is to install wtype sudo apt install wtype and then create a bash script with the following content:

while true; do
    wtype w -d 2000 a -d 1000 i -d 1500 t
    sleep 30
done

Store this script as simulate-keystrokes.sh somewhere on your Raspberry Pi and modify it to be executable using sudo chmod +x simulate-keystrokes.sh. It will type the word “wait” every 30 seconds with a pause of 1-2 seconds between every letter. If you now need to keep a Google Colab session running, execute the script using ./simulate-keystrokes.sh, go to an empty cell in the Colab notebook and let the computer fill the cell.

To make things even easier, you might also append the line alias simulate-keystrokes="bash /path/to/script.sh" to your ~/.bashrc file. This way, you can trigger the simulated behaviour by simply entering simulate-keystrokes.

As mentioned, you can easily sync your Git repository with Colab as described here.

Compute!

That is it! With a few hacky tricks, we can easily set a private budget HPC cluster comprised of (1) a Raspberry Pi and (2) Google Colab. Git and Google Drive allow for easy syncing.

Now go out, train your models and have fun!

Cheers, David