Case study · Personal project

Meeting Notes

Private meeting transcription that stays editable while the conversation is still happening.

A native macOS workflow for capturing microphone audio, the other side of a call, or both—then transcribing everything locally with Core ML.

View source

Role

Product, design, and engineering

Platform

macOS 14.4+

Core stack

React, Tauri, Rust, Swift, Core ML

The problem

The transcript should not become another meeting participant

Most transcription tools ask for a quiet trade: send the conversation to a server, accept a rigid transcript, and clean it up later in a separate document.

Meeting Notes began with a stricter brief: capture both sides of a call, keep recognition on-device, and let the user keep writing in the same note while transcription continues around them.

That made privacy, editability, and native audio capture architectural constraints rather than features to bolt on at the end.

Product decisions

Three principles shaped the entire build

Local by default

Audio and speech recognition stay on the Mac. Cloud-based enhancement is optional, user-triggered, and separate from the recording path.

A note, not a locked transcript

The document remains editable during recording. Manual changes are preserved while new transcript segments continue after them.

Capture the whole conversation

Microphone and system audio can be recorded independently or together, with clear Me and Them labels in dual-source mode.

Architecture

Each language handles the layer it is best at

The application uses a narrow event boundary between the web interface, the desktop shell, and native audio processing. That keeps the editor productive without asking the browser layer to pretend it is macOS.

01
React interface
Owns the note editor, recording controls, and visible recording state.
02
Tauri and Rust shell
Coordinates audio capture, process lifecycle, and typed events between the interface and native sidecar.
03
Swift sidecar
Handles streaming speech recognition and the Core Audio process tap for system sound.
04
Core ML models
Runs Parakeet v3 locally for live captions, system audio, and file transcription across 25 languages.

What shipped

A local-first recording loop with fewer compromises

01
Live captions stream into the note with roughly three seconds of latency.
02
Dual-source mode captures both sides of a call without requiring screen-recording permission.
03
Every decoded window is retained instead of silently dropping transcript segments.
04
One local speech model supports live recording, system audio, and imported audio files.
05
The raw transcript remains the source of truth until the user explicitly chooses to enhance it.

Reflection

Privacy works best when it simplifies the product

Keeping speech recognition local did more than protect audio. It created a clearer product contract: recording produces a useful raw note, editing never stops the stream, and AI enhancement only happens after an explicit choice.

Explore the repository

The transcript should not become another meeting participant

Three principles shaped the entire build

Local by default

A note, not a locked transcript

Capture the whole conversation

Each language handles the layer it is best at

React interface

Tauri and Rust shell

Swift sidecar

Core ML models

A local-first recording loop with fewer compromises

Privacy works best when it simplifies the product