# Daemon
An always-on local AI assistant that knows what's on my screen and in my notes. Llama 3.1 8B running on my own hardware. No API bills, no subscriptions, no telemetry.
# tags: [Python, Local AI, Ollama, Desktop App]
Overview
Every AI conversation starts the same way. Re-explain my project. Re-paste the file. Re-state the context I told it yesterday. The model is brilliant for 5 minutes and then forgets everything.
I had 1,800+ notes in Obsidian about every project, course, and person in my life, and no AI assistant could see any of it. Every chat was a fresh introduction.
So I built one that actually knows me. I called the project Daemon and named the AI inside it Francis. Press Alt+Space anywhere on my laptop and a small dark window pops up. Type a question, hit Enter, and Francis answers using two things at the same time: a fresh OCR of whatever is on my screen at that moment, and semantic search over my entire Obsidian vault. The model is Llama 3.1 8B running locally on an RTX 3070 in the next room. No API keys. No subscriptions. No telemetry. Everything stays on my hardware.
The setup runs across two machines. My desktop PC is a headless server, running Ollama, a FastAPI backend, an embedding worker that watches the vault folder, and a sqlite-vec database for vector search. My laptop is the only thing I actually touch, running the screenshot and OCR worker, the global hotkey overlay, and the canonical Obsidian vault. Tailscale links them with a peer-to-peer encrypted tunnel. MSU OneDrive syncs the vault both ways. All four processes auto-start at user logon as Windows Scheduled Tasks, so I never have to manually launch anything.
When I press Alt+Space, the overlay briefly moves itself off-screen, captures the primary monitor with mss, runs OCR locally via the built-in Windows.Media.Ocr API, and ships only the extracted text to the PC. The API runs semantic retrieval against the vault, builds a prompt with the current screen plus the most relevant notes, and streams the answer back token by token. About a second to first token. Three to six seconds for a full short answer.
Two debugging moments stand out from the build. After I added per-question OCR, the chat window itself kept getting screenshotted by the background worker. Francis would see his own previous answer in the OCR, riff on it, and repeat himself in the next response. I fixed it with a visibility lock the overlay touches every 30 seconds while open. The background worker checks the lock before each capture and skips its cycle if the file is fresh.
The second was a one-line config fix. Ollama defaults num_ctx to 2048 tokens. My prompts (system + retrieved chunks + screen OCR) were silently getting truncated, so Francis kept asking me to “provide more context” because the actual question had been cut off the front of the prompt. Bumping num_ctx to 8192 fixed it instantly.
Building this was a fun way to learn how to wire local models, embeddings, vector search, OCR, system-level hotkeys, and a polished GUI into one continuously running thing across two machines. The 30+ seconds of typing context into ChatGPT, cut down to a single hotkey and a question.
Key Features
- Global Alt+Space hotkey from anywhere on the laptop
- Per-submit OCR with a fresh capture on every question
- Semantic search over 1,800+ Obsidian notes via local nomic-embed-text embeddings
- Token-by-token streaming responses with markdown rendering and source citations
- Two-pass retrieval that prioritizes hand-curated notes over auto-generated graph nodes
- Silent screen-summary hotkey (Alt+Shift+Space) writes a session recap directly into the vault
- Manual capture hotkey (Alt+Shift+C) pairs a typed annotation with fresh OCR into one vault file
- Auto-starts at logon, fully recovers from sleep, wake, and reboot
Tech Stack
- Python (FastAPI, PyQt6, watchdog, mss, winocr, requests)
- Ollama running Llama 3.1 8B (chat) and nomic-embed-text (embeddings)
- SQLite + sqlite-vec for vector search
- Obsidian as the knowledge substrate
- Tailscale for the laptop and PC tunnel
- MSU OneDrive for vault sync
- Windows Scheduled Tasks for autostart