Daemon

TAGS // Python · Local AI · Ollama · Desktop App

An always-on local AI assistant that knows what's on my screen and in my notes. Llama 3.1 8B running on my own hardware. No API bills, no subscriptions, no telemetry.

Source ↗

Overview

Every AI conversation starts the same way. Re-explain my project. Re-paste the file. Re-state the context I told it yesterday. The model is brilliant for 5 minutes and then forgets everything.

I had 1,800+ notes in Obsidian about every project, course, and person in my life, and no AI assistant could see any of it. Every chat was a fresh introduction.

So I built one that actually knows me. I called the project Daemon and named the AI inside it Francis. Press Alt+Space anywhere on my laptop and a small dark window pops up. Type a question, hit Enter, and Francis answers using two things at the same time: a fresh OCR of whatever is on my screen at that moment, and semantic search over my entire Obsidian vault. The model is Llama 3.1 8B running locally on an RTX 3070 in the next room. No API keys. No subscriptions. No telemetry. Everything stays on my hardware.

The setup runs across two machines. My desktop PC is a headless server, running Ollama, a FastAPI backend, an embedding worker that watches the vault folder, and a sqlite-vec database for vector search. My laptop is the only thing I actually touch, running the screenshot and OCR worker, the global hotkey overlay, and the canonical Obsidian vault. Tailscale links them with a peer-to-peer encrypted tunnel. MSU OneDrive syncs the vault both ways. All four processes auto-start at user logon as Windows Scheduled Tasks, so I never have to manually launch anything.

When I press Alt+Space, the overlay briefly moves itself off-screen, captures the primary monitor with mss, runs OCR locally via the built-in Windows.Media.Ocr API, and ships only the extracted text to the PC. The API runs semantic retrieval against the vault, builds a prompt with the current screen plus the most relevant notes, and streams the answer back token by token. About a second to first token. Three to six seconds for a full short answer.

Two debugging moments stand out from the build. After I added per-question OCR, the chat window itself kept getting screenshotted by the background worker. Francis would see his own previous answer in the OCR, riff on it, and repeat himself in the next response. I fixed it with a visibility lock the overlay touches every 30 seconds while open. The background worker checks the lock before each capture and skips its cycle if the file is fresh.

The second was a one-line config fix. Ollama defaults num_ctx to 2048 tokens. My prompts (system + retrieved chunks + screen OCR) were silently getting truncated, so Francis kept asking me to “provide more context” because the actual question had been cut off the front of the prompt. Bumping num_ctx to 8192 fixed it instantly.

Building this was a fun way to learn how to wire local models, embeddings, vector search, OCR, system-level hotkeys, and a polished GUI into one continuously running thing across two machines. The 30+ seconds of typing context into ChatGPT, cut down to a single hotkey and a question.

Key Features

Global Alt+Space hotkey from anywhere on the laptop
Per-submit OCR with a fresh capture on every question
Semantic search over 1,800+ Obsidian notes via local nomic-embed-text embeddings
Token-by-token streaming responses with markdown rendering and source citations
Two-pass retrieval that prioritizes hand-curated notes over auto-generated graph nodes
Silent screen-summary hotkey (Alt+Shift+Space) writes a session recap directly into the vault
Manual capture hotkey (Alt+Shift+C) pairs a typed annotation with fresh OCR into one vault file
Auto-starts at logon, fully recovers from sleep, wake, and reboot

Tech Stack

Python (FastAPI, PyQt6, watchdog, mss, winocr, requests)
Ollama running Llama 3.1 8B (chat) and nomic-embed-text (embeddings)
SQLite + sqlite-vec for vector search
Obsidian as the knowledge substrate
Tailscale for the laptop and PC tunnel
MSU OneDrive for vault sync
Windows Scheduled Tasks for autostart