Omi is an open-source AI wearable platform designed to capture, transcribe, and analyze conversations in real-time. The ecosystem integrates custom hardware, cross-platform mobile and desktop applications, and a robust cloud backend to transform raw audio into structured memories, action items, and AI-driven insights README.md3-7
The platform is organized into several key technical domains:
| Component | Location | Technology | Primary Entry Point |
|---|---|---|---|
| Backend API | backend/ | Python (FastAPI) | backend/main.py README.md114 |
| Mobile App | app/ | Flutter (Dart) | app/lib/main.dart README.md113 |
| macOS App | desktop/ | Swift / Rust | desktop/run.sh README.md112 |
| Firmware (Omi) | omi/ | Zephyr RTOS (C) | omi/firmware/readme.md:11() |
| Omi Glass | omiGlass/ | ESP32-S3 (C) | omiGlass/ README.md116 |
| AI Personas | web/personas-open-source/ | Next.js | web/personas-open-source/ README.md118 |
| SDKs | sdks/ | Python, Swift, React Native | sdks/ README.md117 |
The core value proposition lies in its 100% open-source nature, allowing developers to customize everything from the PCB layouts and firmware to the LLM processing pipelines and third-party integrations via a modular plugin system README.md7-14 docs/getstartedwithomi.mdx41-43
Sources:
The Omi ecosystem follows a distributed architecture where audio is captured at the edge (wearables, mobile, or desktop), streamed via Bluetooth Low Energy (BLE) or WebSockets to a gateway, and processed by a specialized FastAPI backend.
Diagram: Data Flow from Capture to AI Processing
Explanation of Components:
Edge Capture (Natural Language Space):
Gateway Layer (Code Entity Space):
app/lib/main.dart): Flutter app managing device connection, audio capture, UI, and streaming.desktop/): Native app receiving desktop audio, managing user interaction and streaming data.Cloud Backend (Processing Space):
/v4/listen WebSocket endpoint receives audio streams.Storage Layer:
Sources:
The repository is a monorepo containing all components of the Omi platform:
| Directory | Description | Technology Stack | Details |
|---|---|---|---|
app/ | Cross-platform mobile app | Flutter (Dart) | Manages audio capture, BLE device connection, UI for recordings and chat, state management through Providers app/lib/pages/apps/app_home_web_page.dart11-14 |
backend/ | FastAPI backend and specialized services | Python, FastAPI, Firestore, Redis | Provides REST API and WebSocket endpoints; contains async services such as pusher for event processing, diarizer for speaker embedding, and vad for voice activity detection AGENTS.md39-63 |
desktop/ | Native desktop application | Swift, Rust | Native macOS (and Windows) app with UI, local SQLite storage, ACP bridge to AI agent VM, integrates with system audio capture desktop/run.sh49-118 |
omi/ | Hardware designs and firmware | Zephyr RTOS (C), nRF5340 | Device firmware for audio sampling, OPUS encoding, BLE streaming, power management omi/firmware/readme.md60-68 |
omiGlass/ | Smart glasses firmware and app | ESP32-S3 (C), Arduino, React Native | Open-source glasses project capturing audio/video with AI integration support README.md116-164 |
sdks/ | Official client SDKs | Python, Swift, React Native | BLE connectivity libraries, audio decoding and transcription clients README.md117 |
web/ | Next.js based web frontend | React, Next.js | Public frontend portals, personas hosting, and admin dashboards README.md118 |
Sources:
Omi differentiates itself through four main pillars:
Open-Source Transparency:
All hardware designs, firmware, and source code are fully open source, enabling developer control and trust README.md7-14
Continuous Capture:
The Omi wearable supports 24h+ continuous conversation capture using energy-efficient dual microphones, OPUS codec, and BLE streaming README.md154-158
Cross-Platform Memory:
Integrates screen capture and conversations from desktop, mobile, and wearable devices into a unified, AI-augmented memory system README.md5-7
Extensible Intelligence:
Modular plugin architecture with a community app marketplace enables customization, structured data extraction (action items, events, memories), and AI-driven chat personas docs/getstartedwithomi.mdx53-55
Sources:
Audio is captured on the edge via wearable devices (omi/firmware/), phone microphones managed by the Flutter mobile app (app/), or desktop audio captured by the native app (desktop/). The mobile and desktop apps stream audio data encoded as OPUS over WebSocket binary channels to the backend `/v4/listen` endpoint in `backend/main.py` The backend uses Deepgram to transcribe speech in real-time, and auxiliary services perform voice activity detection (VAD) and speaker diarization to identify who is speaking AGENTS.md77-83
Diagram: Audio Processing Service Map
Key functions in the backend include:
routers/transcribe.py: WebSocket endpoint /v4/listen handling incoming audio streams and dispatching to Deepgram and VAD services.pusher/main.py: Async service for background processing, calling diarizer and handling storage and embedding generation.diarizer/main.py: GPU-accelerated speaker embedding extraction.modal/main.py: Voice activity detection APIs.All audio data and transcriptions are stored persistently with transcript segments in Firestore, raw audio in Google Cloud Storage, and vector embeddings in Pinecone for efficient semantic search.
Sources:
Authentication is managed via Firebase integration, ensuring secure user identity and access control. The Flutter mobile app configures Firebase for multiple build flavors including development and production configurations docs/doc/developer/AppSetup.mdx211-233 Backend API authorization is enforced via Firebase ID tokens and appropriate validation middleware AGENTS.md75
Sources:
This overview is intended to provide a comprehensive technical introduction to the Omi platform, facilitating developer onboarding and deep understanding of the core subsystems, data flows, and repository organization. For more detailed architecture and subsystem documentation, refer to subsequent wiki pages aligned with the documentation Table of Contents.
Refresh this wiki
This wiki was recently refreshed. Please wait 6 days to refresh again.