Push-to-talk voice typing for Windows. Hold a key, speak, release — your words land at the cursor. State-of-the-art speech recognition, running entirely on your device's NPU.
One click, auto-updates, both architectures · or download directly · MIT licensed
OpenWritr isn't an app you switch to — it's a tray utility that types for you, into whatever window already has focus. Email, chat, your editor, a browser field. Hold the hotkey, talk, let go.
Speech recognition runs on your device. Your audio never leaves the machine. No account, no analytics, nothing phoned home.
On Snapdragon X the model runs on the Hexagon NPU — a 5-second sentence transcribes in well under half a second, sipping power.
English, German, French, Spanish, Italian, Dutch, Polish, Ukrainian and more — auto-detected, with punctuation and capitalization.
A global low-level keyboard hook means recording survives popups, UAC prompts and system shortcuts stealing focus mid-sentence.
Hold Shift too and an LLM tidies punctuation and grammar — via your own GitHub Copilot or OpenAI-compatible key. Off by default.
MIT-licensed, native Rust, no Electron. A single ~7 MB executable. Every release built reproducibly in public CI.
OpenWritr is built on NVIDIA Parakeet — one of the most accurate open speech-recognition models in the world — running on a chip that, on most laptops, sits completely idle.
Parakeet TDT 0.6B v3 is a 600-million-parameter speech model from NVIDIA. On the open ASR leaderboards it goes toe-to-toe with — and often beats — OpenAI's Whisper, while being smaller and dramatically faster. It transcribes 25 European languages, adds its own punctuation and capitalization, and is genuinely state of the art.
The catch: a model that good usually means the cloud. You speak, your audio is uploaded to someone's server, transcribed there, and sent back. That costs money, adds latency, needs a connection — and means your voice leaves your machine.
OpenWritr runs the whole thing locally — and not just on the CPU. On Copilot+ PCs (the Snapdragon X laptops) it offloads the heavy part of the model to the Hexagon NPU, a dedicated neural-processing chip Qualcomm builds into the silicon precisely for this kind of on-device AI. The NPU does the work in a fraction of the time and a fraction of the power a CPU would need.
Nobody had wired Parakeet to the Hexagon NPU before. Doing it took rewriting part of the model's compute graph, requantizing it the way the NPU wants, and compiling it specifically for the chip. The full story is below. The result is published as an open model anyone can reuse.
A subtle overlay with a live level meter appears only while you record. Everything else is one tray icon and a settings dialog.
Four stages, all on your machine.
Default Ctrl+Win. A global hook captures the keystroke no matter what's focused; the mic opens.
Audio is downmixed to 16 kHz mono and turned into mel features — locally, in real time.
The Parakeet encoder runs on the NPU (or CPU); a transducer decoder turns features into text.
The result is pasted at your cursor. Optionally polished by an LLM first.
The interesting part. NVIDIA's model was never built to run on a Qualcomm NPU — here's what it took to put it there.
Parakeet's encoder builds its attention mask dynamically at runtime (Shape → Gather → Range → Expand). The NPU's ahead-of-time compiler can't trace that and bails out every time. We constant-folded that subgraph against a fixed 8-second window, freezing every shape so the whole graph becomes static and compilable.
The 600M-param encoder was quantized to INT8 weights / INT16 activations — the recipe the Hexagon backend wants for transformer nets — calibrated on real multilingual speech (FLEURS). Pure INT8 made LayerNorm and attention activations clip; the 16-bit activations keep accuracy where it matters.
Qualcomm AI Hub compiled the quantized model to a QNN context binary targeting the Snapdragon X Elite. The NPU has no int64, so the integer I/O is truncated to int32 at compile time. The result is device-gated to Hexagon V73.
The high-level Rust ONNX bindings crashed on this binary. OpenWritr calls the ONNX Runtime C API directly through a thin FFI layer — the exact sequence that works from Python, reproduced in native Rust. A missing pair of Hexagon skeleton files turned out to be the silent culprit behind a stack-overflow on load.
The compiled encoder takes a fixed 8-second window. Longer dictation is run in overlapping chunks and the encoder features are stitched back together at the seams, so the decoder runs once over the whole stream — no doubled or dropped words at the boundaries.
The whole toolchain — graph surgery, AI Hub submission, the wrapper, the validator — ships in the repo, and the compiled model is published on Hugging Face. Read the full deep-dive on the model card →
Measured on a Snapdragon X Elite (X1E80100). Decode is the full pipeline: preprocess + encode + transducer decode.
| Audio length | Decode time | × Realtime | NPU chunks |
|---|---|---|---|
| 3 s | 128 ms | 23× | 1 |
| 5.8 s | 221 ms | 26× | 1 |
| 16.4 s | 375 ms | 44× | 3 |
| 23.0 s | 626 ms | 37× | 4 |
The encoder itself runs in ~67 ms per 8-second window on the NPU. The x64 build runs the same pipeline on the CPU at roughly 25× realtime.
The Microsoft Store is the easiest way — it picks the right build for your CPU automatically, keeps itself updated, and there's no SmartScreen warning.
Prefer a direct download? Pick your CPU — both are per-user installs, no admin needed. (Settings → System → About → "System type" if you're unsure.)
Direct-download binaries aren't code-signed, so Windows SmartScreen shows a warning on first launch — click More info → Run anyway, or just use the Store. Checksums are attached to every release.