OpenWritr — local voice-to-text for Windows, on the NPU

What it does

Dictate into anything

OpenWritr isn't an app you switch to — it's a tray utility that types for you, into whatever window already has focus. Email, chat, your editor, a browser field. Hold the hotkey, talk, let go.

🔒

Private by design

Speech recognition runs on your device. Your audio never leaves the machine. No account, no analytics, nothing phoned home.

⚡

NPU-fast

On Snapdragon X the model runs on the Hexagon NPU — a 5-second sentence transcribes in well under half a second, sipping power.

🌍

25 languages

English, German, French, Spanish, Italian, Dutch, Polish, Ukrainian and more — auto-detected, with punctuation and capitalization.

⌨️

Focus-proof hotkey

A global low-level keyboard hook means recording survives popups, UAC prompts and system shortcuts stealing focus mid-sentence.

✨

Optional AI cleanup

Hold Shift too and an LLM tidies punctuation and grammar — via your own GitHub Copilot or OpenAI-compatible key. Off by default.

🧩

Open source

MIT-licensed, native Rust, no Electron. A single ~7 MB executable. Every release built reproducibly in public CI.

The model

Why this is a big deal

OpenWritr is built on NVIDIA Parakeet — one of the most accurate open speech-recognition models in the world — running on a chip that, on most laptops, sits completely idle.

Parakeet TDT 0.6B v3 is a 600-million-parameter speech model from NVIDIA. On the open ASR leaderboards it goes toe-to-toe with — and often beats — OpenAI's Whisper, while being smaller and dramatically faster. It transcribes 25 European languages, adds its own punctuation and capitalization, and is genuinely state of the art.

The catch: a model that good usually means the cloud. You speak, your audio is uploaded to someone's server, transcribed there, and sent back. That costs money, adds latency, needs a connection — and means your voice leaves your machine.

OpenWritr runs the whole thing locally — and not just on the CPU. On Copilot+ PCs (the Snapdragon X laptops) it offloads the heavy part of the model to the Hexagon NPU, a dedicated neural-processing chip Qualcomm builds into the silicon precisely for this kind of on-device AI. The NPU does the work in a fraction of the time and a fraction of the power a CPU would need.

Nobody had wired Parakeet to the Hexagon NPU before. Doing it took rewriting part of the model's compute graph, requantizing it the way the NPU wants, and compiling it specifically for the chip. The full story is below. The result is published as an open model anyone can reuse.

600Mparameters — FastConformer encoder + transducer decoder

~67 msto encode 8 seconds of audio on the NPU

25languages, auto-detected, with punctuation

0bytes of your audio ever leave the device

NVIDIA Parakeet TDT v3 Qualcomm Hexagon HTP INT8 / INT16 quantized ONNX Runtime · QNN EP

☁️ Cloud dictation

Your voice is uploaded to a server
Needs an internet connection
Per-minute or subscription cost
Round-trip latency on every phrase
You trust someone else with your audio

💻 OpenWritr — on-device

Audio is processed on your machine
Works fully offline after first run
Free, forever
Near-instant — the NPU is fast
Nothing to trust: it can't leave

How it works

Press → speak → release

Four stages, all on your machine.

Hold the hotkey

Default Ctrl+Win. A global hook captures the keystroke no matter what's focused; the mic opens.

Speak

Audio is downmixed to 16 kHz mono and turned into mel features — locally, in real time.

Release

The Parakeet encoder runs on the NPU (or CPU); a transducer decoder turns features into text.

Text appears

The result is pasted at your cursor. Optionally polished by an LLM first.

The engineering

Getting Parakeet onto the Hexagon NPU

The interesting part. NVIDIA's model was never built to run on a Qualcomm NPU — here's what it took to put it there.

mel preprocessor→ encoder · Hexagon NPU→ TDT decoder→ text

01 The shape problem

Parakeet's encoder builds its attention mask dynamically at runtime (Shape → Gather → Range → Expand). The NPU's ahead-of-time compiler can't trace that and bails out every time. We constant-folded that subgraph against a fixed 8-second window, freezing every shape so the whole graph becomes static and compilable.

02 Quantization

The 600M-param encoder was quantized to INT8 weights / INT16 activations — the recipe the Hexagon backend wants for transformer nets — calibrated on real multilingual speech (FLEURS). Pure INT8 made LayerNorm and attention activations clip; the 16-bit activations keep accuracy where it matters.

03 Compilation

Qualcomm AI Hub compiled the quantized model to a QNN context binary targeting the Snapdragon X Elite. The NPU has no int64, so the integer I/O is truncated to int32 at compile time. The result is device-gated to Hexagon V73.

04 Loading it from Rust

The high-level Rust ONNX bindings crashed on this binary. OpenWritr calls the ONNX Runtime C API directly through a thin FFI layer — the exact sequence that works from Python, reproduced in native Rust. A missing pair of Hexagon skeleton files turned out to be the silent culprit behind a stack-overflow on load.

05 Long-form audio

The compiled encoder takes a fixed 8-second window. Longer dictation is run in overlapping chunks and the encoder features are stitched back together at the seams, so the decoder runs once over the whole stream — no doubled or dropped words at the boundaries.

06 Open for reuse

The whole toolchain — graph surgery, AI Hub submission, the wrapper, the validator — ships in the repo, and the compiled model is published on Hugging Face. Read the full deep-dive on the model card →

Performance

Faster than realtime, by a lot

Measured on a Snapdragon X Elite (X1E80100). Decode is the full pipeline: preprocess + encode + transducer decode.

Audio length	Decode time	× Realtime	NPU chunks
3 s	128 ms	23×	1
5.8 s	221 ms	26×	1
16.4 s	375 ms	44×	3
23.0 s	626 ms	37×	4

The encoder itself runs in ~67 ms per 8-second window on the NPU. The x64 build runs the same pipeline on the CPU at roughly 25× realtime.

Get it

Install in one click

The Microsoft Store is the easiest way — it picks the right build for your CPU automatically, keeps itself updated, and there's no SmartScreen warning.

⊞ Get it from the Microsoft Store

Prefer a direct download? Pick your CPU — both are per-user installs, no admin needed. (Settings → System → About → "System type" if you're unsure.)

Snapdragon X · ARM64

Surface Pro 11, Surface Laptop 7, and other Copilot+ PCs — runs on the Hexagon NPU

↓ arm64 installer

Intel / AMD · x64

Most other Windows laptops and desktops — runs on the CPU

↓ x64 installer

Direct-download binaries aren't code-signed, so Windows SmartScreen shows a warning on first launch — click More info → Run anyway, or just use the Store. Checksums are attached to every release.