← back

Project Overlord

After seeing Poke by Interaction release and how it was able to control its own price autonomously, I wondered what other tools Poke could be given access to. A few months later while trying to lock in, I was left dissatisfied with productivity tools’ “blanket” coverage. They couldn’t discriminate between Youtube for Karpathy’s videos and brainrot videos.

It then occurred to me that I could tie these two ideas together to engineer an agentic system that was able to discern between task-relevant and irrelevant screen content. Months went by however, with the idea collecting dust. And of course, I decided to bring Overlord to life while studying for finals What follows is a technical write-up of two weeks of learning Electron, MCP, server-client relationships, and writing most logic code by-hand for learning's sake.

System architecture diagram showing server-client relationship
Fig 1. Architecture Diagram

At a high level, Overlord consists of four components: a local Electron client, a coordinating Python server, an external agent (Poke), and OpenAI API.The architecture is intentionally minimal and can be best demonstrated with a demo walk-through.

After a user opens the application, they are greeted with the main Overlord screen, which asks them to enter an objective (task) and duration.

Overlord main screen with task input Overlord timer overlay
Fig 2. Main screen and timer overlay

Once initiated, the screen minimizes into a timer in the corner and overlays all applications, including full-screen ones.

Every 30 seconds, a screenshot is taken. Electron passes the screenshot to the server (hosted on Render), which then submits the user’s declared task plus the current screenshot to OpenAI’s 4o vision model and requests a strict binary verdict (YES/NO) as to if the user is on task.

If “YES”, the screenshot loop will continue until the timer runs out.

In case that “NO” is passed to Electron, the application goes into “Kiosk Mode”, which restricts standard exit paths (Command-Q, window switching) but isn’t a kernel-level lock.

Shortly after, Electron does a fetch to the server, which uses the POST method /punish to tell Poke that the user is misbehaving and needs to send proof of completing X task (10 pushups in this case) before the screen will be unlocked. At this point, all that’s left is for Poke to determine that the user has completed X task, before it exercises the unlock_user() tool via MCP (Model Context Protocol), which updates the server’s locked state to false.

All this while, Electron has been async polling the server every 500ms (via GET requests), waiting for this. Once this happens, Electron unlocks the user’s screen and returns them to the homescreen.

Locked screen showing violation detection
Fig 3. Lockout screen when user goes off-task
Poke messages verifying pushup completion
Fig 4. Poke verifying task completion via messages

Decoupling Client from the Infrastructure:

Halfway through making the github public, it occurred to me that I would need to allow users to use their own Poke and OpenAI API keys! To enable this and make Overlord a distributable product and not just a personal script, I implemented a dynamic configuration store (electron-store) that lets users plug in their own python server without recompiling the electron app.

Settings page for configuring server URL
Fig 5. Dynamic configuration settings

Technical Learnings

Electron:

This was my first time building with Electron and learning all about IPC (Inter-Process Communication) between the Main Process and Renderer Process was pretty cool.

ipcMain.on('start-focus-mode', (event, duration, task) => {
    console.log('start focus mode signal activated yuhh') 
})

Architecture:

Though I had a vision for the product before I started, drawing out some diagrams and deep diving into the differences between MCP and REST really helped iron out my thought process. Overlord uses a standard REST API for the “punishment” flow and Model Context Protocol interface for the “redemption” flow. This lets Poke mutate state directly on my server.

Distributed State Consistency

Most of my previous understanding of state was purely theoretical. This project was extremely useful in having my work with stateless and stateful components hands-on.

# Judge class maintains the authoritative state
class Judge: 
    def __init__(self):
        self.locked = False
        self.offenses = 0
    
    def status(self):
        return {"locked": self.locked, "offenses": self.offenses}

    def punish(self):
        self.offenses += 1
        self.locked = True
        # ... trigger external consequences (Poke/SMS) ...
        return self.status()

    def forgive(self):
        self.locked = False
        return self.status()

judge = Judge()

After running into a minefield of race conditions, I realized I had to establish a single source of truth for state across the distributed system (Renderer, Main Process, Python Server, Poke). The python server maintains the authoritative lock state while the Electron client treats its local state as speculative and uses polling to reconcile. Poke is the only component that can mutate lock state via MCP. The React front-end is a UI that is limited to rendering what the main process dictates via IPC events.

const scheduleNextRun = (): void => {
    if (isFocusModeActive) {
        console.log("Scheduling next check in 30s...");
        setTimeout(() => runFocusCheck(window, task), 30000);
    }
};

One specific problem I ran into was using setInterval instead of setTimeout. During testing, I discovered that even when Electron went into kiosk mode and locked out the user, screenshots continued to be captured. Analysis revealed that setInterval is “fire-and-forget”, meaning that screenshots can’t be stopped! Additionally, if the network is really poor, multiple requests could be running in parallel and fighting for bandwidth leading to race conditions. Refactoring the code to use the recursive setTimeout guaranteed sequential execution and proper stoppage of screenshots.

Next Steps

Migrating to Local LLMs

Yes, I’m aware that having possibly sensitive data leave the computer and into OpenAI’s hands is far from ideal. Ollama model support will be introduced ASAP.

Real-Time Locking

After I finished coding up everything, I realized that I could have used websockets for real-time locking instead of polling. At this point I had already spent a lot of time fixing bugs + refactoring. Decided to prioritize shipping for now, but will come back to this hopefully soon.

Data Persistence

Since this app doesn’t have many data points that need to persist, I have chosen to keep it as lightweight as possible and not add a database. It would be trivial to add a database that would allow Poke to increase punishments based off of repeat offences.

I had a lot of fun building this out! Would love for you to try it out and give me feedback! Code is live on Github.