Overview

Relevant source files

Purpose and Scope

Omi is an open-source AI wearable platform designed to capture, transcribe, and analyze conversations in real-time. The ecosystem integrates custom hardware, cross-platform mobile and desktop applications, and a robust cloud backend to transform raw audio into structured memories, action items, and AI-driven insights README.md3-7

The platform is organized into several key technical domains:

Component	Location	Technology	Primary Entry Point
Backend API	`backend/`	Python (FastAPI)	`backend/main.py` README.md114
Mobile App	`app/`	Flutter (Dart)	`app/lib/main.dart` README.md113
macOS App	`desktop/`	Swift / Rust	`desktop/run.sh` README.md112
Firmware (Omi)	`omi/`	Zephyr RTOS (C)	`omi/firmware/readme.md:11`()
Omi Glass	`omiGlass/`	ESP32-S3 (C)	`omiGlass/` README.md116
AI Personas	`web/personas-open-source/`	Next.js	`web/personas-open-source/` README.md118
SDKs	`sdks/`	Python, Swift, React Native	`sdks/` README.md117

The core value proposition lies in its 100% open-source nature, allowing developers to customize everything from the PCB layouts and firmware to the LLM processing pipelines and third-party integrations via a modular plugin system README.md7-14 docs/getstartedwithomi.mdx41-43

Sources:

System Architecture

The Omi ecosystem follows a distributed architecture where audio is captured at the edge (wearables, mobile, or desktop), streamed via Bluetooth Low Energy (BLE) or WebSockets to a gateway, and processed by a specialized FastAPI backend.

High-Level Ecosystem Flow

Diagram: Data Flow from Capture to AI Processing

Explanation of Components:

Edge Capture (Natural Language Space):
- Omi Wearable: Custom low-power hardware with dual microphones capturing audio, encoding with OPUS, transmitting via BLE.
- Omi Glass: ESP32-S3 based smart glasses capturing audio and video streams.
- Desktop Mic/System Audio: Native desktop audio capture on macOS/Windows.
Gateway Layer (Code Entity Space):
- Mobile App (app/lib/main.dart): Flutter app managing device connection, audio capture, UI, and streaming.
- Desktop App (desktop/): Native app receiving desktop audio, managing user interaction and streaming data.
- CaptureProvider: Flutter provider managing audio recording state and WAV/Opus streaming.
- DeviceService: BLE interface handling device discovery, connection, and audio transmission.
Cloud Backend (Processing Space):
- FastAPI backend for control, API, and orchestration.
- /v4/listen WebSocket endpoint receives audio streams.
- Pusher Service: Async processing of transcripts, communication to AI and storage services.
- Diarizer: GPU service that processes speaker embeddings for diarization.
- VAD: Voice Activity Detection and speaker identification GPU service.
Storage Layer:
- Firestore: Stores user conversations, memories, app data in structured form.
- Redis: Cache for metadata and state to optimize performance.
- Google Cloud Storage: Stores raw audio, binary blobs, speech profile data.
- Pinecone: Vector database for semantic search using conversation and memory embeddings.

Sources:

Repository Structure

The repository is a monorepo containing all components of the Omi platform:

Directory	Description	Technology Stack	Details
`app/`	Cross-platform mobile app	Flutter (Dart)	Manages audio capture, BLE device connection, UI for recordings and chat, state management through Providers app/lib/pages/apps/app_home_web_page.dart11-14
`backend/`	FastAPI backend and specialized services	Python, FastAPI, Firestore, Redis	Provides REST API and WebSocket endpoints; contains async services such as `pusher` for event processing, `diarizer` for speaker embedding, and `vad` for voice activity detection AGENTS.md39-63
`desktop/`	Native desktop application	Swift, Rust	Native macOS (and Windows) app with UI, local SQLite storage, ACP bridge to AI agent VM, integrates with system audio capture desktop/run.sh49-118
`omi/`	Hardware designs and firmware	Zephyr RTOS (C), nRF5340	Device firmware for audio sampling, OPUS encoding, BLE streaming, power management omi/firmware/readme.md60-68
`omiGlass/`	Smart glasses firmware and app	ESP32-S3 (C), Arduino, React Native	Open-source glasses project capturing audio/video with AI integration support README.md116-164
`sdks/`	Official client SDKs	Python, Swift, React Native	BLE connectivity libraries, audio decoding and transcription clients README.md117
`web/`	Next.js based web frontend	React, Next.js	Public frontend portals, personas hosting, and admin dashboards README.md118

Sources:

Core Value Proposition

Omi differentiates itself through four main pillars:

Open-Source Transparency:
All hardware designs, firmware, and source code are fully open source, enabling developer control and trust README.md7-14
Continuous Capture:
The Omi wearable supports 24h+ continuous conversation capture using energy-efficient dual microphones, OPUS codec, and BLE streaming README.md154-158
Cross-Platform Memory:
Integrates screen capture and conversations from desktop, mobile, and wearable devices into a unified, AI-augmented memory system README.md5-7
Extensible Intelligence:
Modular plugin architecture with a community app marketplace enables customization, structured data extraction (action items, events, memories), and AI-driven chat personas docs/getstartedwithomi.mdx53-55

Sources:

Implementation Details

Audio Capture & Transcription Pipeline

Audio is captured on the edge via wearable devices (omi/firmware/), phone microphones managed by the Flutter mobile app (app/), or desktop audio captured by the native app (desktop/). The mobile and desktop apps stream audio data encoded as OPUS over WebSocket binary channels to the backend `/v4/listen` endpoint in `backend/main.py` The backend uses Deepgram to transcribe speech in real-time, and auxiliary services perform voice activity detection (VAD) and speaker diarization to identify who is speaking AGENTS.md77-83

Diagram: Audio Processing Service Map

Key functions in the backend include:

routers/transcribe.py: WebSocket endpoint /v4/listen handling incoming audio streams and dispatching to Deepgram and VAD services.
pusher/main.py: Async service for background processing, calling diarizer and handling storage and embedding generation.
diarizer/main.py: GPU-accelerated speaker embedding extraction.
modal/main.py: Voice activity detection APIs.

All audio data and transcriptions are stored persistently with transcript segments in Firestore, raw audio in Google Cloud Storage, and vector embeddings in Pinecone for efficient semantic search.

Sources:

Authentication and Configuration

Authentication is managed via Firebase integration, ensuring secure user identity and access control. The Flutter mobile app configures Firebase for multiple build flavors including development and production configurations docs/doc/developer/AppSetup.mdx211-233 Backend API authorization is enforced via Firebase ID tokens and appropriate validation middleware AGENTS.md75

Sources:

This overview is intended to provide a comprehensive technical introduction to the Omi platform, facilitating developer onboarding and deep understanding of the core subsystems, data flows, and repository organization. For more detailed architecture and subsystem documentation, refer to subsequent wiki pages aligned with the documentation Table of Contents.

Overview

Purpose and Scope

System Architecture

High-Level Ecosystem Flow

Repository Structure

Core Value Proposition

Implementation Details

Audio Capture & Transcription Pipeline

Authentication and Configuration

On this page