feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp) by localai-bot · Pull Request #10084 · mudler/LocalAI

localai-bot · 2026-05-29T22:17:09Z

What

Adds a new parakeet-cpp audio-transcription backend wrapping parakeet.cpp, a C++/ggml port of NVIDIA NeMo Parakeet (FastConformer TDT/CTC/RNNT/hybrid) that matches the upstream PyTorch models on CPU. It's a cgo-less Go gRPC backend (purego over the flat parakeet_capi.h), mirroring the whisper / vibevoice-cpp backends.

Layers

L0, scaffold + text transcription: main.go dlopens libparakeet.so, LoadModel, AudioTranscription (text) via parakeet_capi_transcribe_path, serialized through base.SingleThread.
L1, word/segment timestamps: AudioTranscription uses parakeet_capi_transcribe_path_json; per-word timings attached when timestamp_granularities[]=word (OpenAI-compatible), token ids populate Segment.Tokens.
L2, cache-aware streaming: AudioTranscriptionStream feeds 16 kHz mono PCM in chunks to stream_{begin,feed,finalize}, emitting deltas; <EOU>/<EOB> events close per-utterance segments; closing FinalResult carries the full transcript.
L3, registration (full whisper parity): 10 build variants in .github/backend-matrix.yml (cpu, cuda-12/13, ROCm/hipblas, Intel-SYCL f16/f32, Vulkan, L4T, metal-darwin), scripts/changed-backends.js, bump_deps.yaml (tracks PARAKEET_VERSION), backend/index.yaml metas + images, and root Makefile wiring.
L4, gallery importer: ParakeetCppImporter auto-detects parakeet GGUFs (narrow <arch>-<size>-<quant>.gguf match, won't claim arbitrary llama GGUFs or upstream .nemo repos), routes to this backend, surfaces in /backends/known.
L5, docs: audio-to-text.md usage section (import, YAML, word timestamps, streaming).

Models

GGUFs for all 10 Parakeet models x 5 quants are published in a single collection repo: mudler/parakeet-cpp-gguf.

Validation

Backend builds CGO_ENABLED=0, gofmt clean, go vet clean except one documented unsafeptr note shared with the whisper backend.
Backend specs (text, word-granularity, streaming) pass against real tdt_ctc-110m and realtime_eou_120m-v1 models, the streamed transcript matches the offline reference word-for-word.
Importer: 9 specs (positive matches, narrowness negatives, import/quant-pick) pass.
scripts/changed-backends.js resolution + backend-matrix.yml/index.yaml YAML validated; matrix ↔ index tag-suffixes cross-checked.

🤖 Generated with Claude Code

…on (text) Add a Go gRPC backend that bridges LocalAI to parakeet.cpp via the flat C-API (parakeet_capi.h), loaded with purego (cgo-less, mirrors the whisper / vibevoice-cpp backends). L0 scope: - main.go: dlopen libparakeet.so (override via PARAKEET_LIBRARY), register the C-API entry points, start the gRPC server. - goparakeetcpp.go: Load (parakeet_capi_load), AudioTranscription (parakeet_capi_transcribe_path, decoder=0 = per-arch default head), Free, serialized through base.SingleThread since the C engine is a thread-unsafe singleton. char* returns are bound as uintptr so the malloc'd buffer is freed via parakeet_capi_free_string after copy. - AudioTranscriptionStream returns a clear "not implemented in L0" error (closes the channel so the server doesn't hang), wired in L2. - Makefile: clone-at-pin + cmake (PARAKEET_VERSION for bump_deps.sh), with a local-symlink dev shortcut; run.sh / package.sh mirror whisper. - Test auto-skips without PARAKEET_BACKEND_TEST_MODEL/_WAV fixtures. Builds clean (CGO_ENABLED=0), gofmt clean, test passes. The single unsafeptr vet note in goStringFromCPtr is documented and matches the whisper backend's tolerated pattern. Word/segment timestamps (L1) and cache-aware streaming (L2) follow. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

AudioTranscription now calls parakeet_capi_transcribe_path_json and shapes the per-word / per-token timestamps into the TranscriptResult: - Bind parakeet_capi_transcribe_path_json (purego, char* as uintptr like the other returns) and register it in main.go + the test loader. - Parse the JSON document ({"text","words":[{w,start,end,conf}], "tokens":[{id,t,conf}]}) into typed structs. - Synthesise a single whole-clip segment (parakeet emits no native segment boundaries) spanning the first word start to the last word end; token ids populate Segment.Tokens. - Attach word-level timings only when timestamp_granularities=["word"], matching the OpenAI API (segment-level default). secondsToNanos mirrors the whisper backend's nanosecond convention. Verified end-to-end against tdt_ctc-110m (f16): both the default and word-granularity specs pass; builds clean, gofmt clean, vet shows only the one documented unsafeptr note shared with the whisper backend. Cache-aware streaming (L2) follows. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Wire AudioTranscriptionStream to the streaming RNN-T C-API: - Bind parakeet_capi_stream_{begin,feed,finalize,free}; feed takes 16 kHz mono float PCM ([]float32 via purego) and writes *eou_out on <EOU>/<EOB>. - Decode opts.Dst to 16 kHz mono PCM (utils.AudioToWav + go-audio, same as the whisper backend), feed it in 1 s chunks, and emit each newly-finalized text run as a TranscriptStreamResponse delta. - <EOU>/<EOB> events close the current segment; a closing FinalResult carries the full transcript plus the per-utterance segments (with a whole-clip fallback segment when no EOU fired). - stream_begin returns 0 for non-streaming models, surfaced as a clear error instead of an empty stream. Honours context cancellation between chunks. Frees every malloc'd delta and the session. Verified end-to-end against realtime_eou_120m-v1 (f16): the streamed transcript matches the offline 110m reference word-for-word, deltas reconstruct the final text, and the spec passes alongside the offline specs. Builds clean, gofmt clean, vet shows only the shared documented unsafeptr note. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…parity) Wire the new Go gRPC parakeet-cpp backend (parakeet.cpp ggml port of NVIDIA NeMo Parakeet ASR) into LocalAI's build/CI/gallery surfaces, matching the existing ggml whisper Go backend 1:1. - .github/backend-matrix.yml: add 11 linux entries + 1 darwin entry mirroring every whisper build (cpu amd64/arm64, intel sycl f32/f16, vulkan amd64/arm64, nvidia cuda-12, nvidia cuda-13, nvidia-l4t-arm64, nvidia-l4t-cuda-13-arm64, rocm hipblas, metal-darwin-arm64), all on ./backend/Dockerfile.golang with backend: "parakeet-cpp" and -*-parakeet-cpp tag-suffixes. - scripts/changed-backends.js: explicit inferBackendPath branch resolving parakeet-cpp to backend/go/parakeet-cpp/ before the generic golang branch. - .github/workflows/bump_deps.yaml: track the PARAKEET_VERSION pin in backend/go/parakeet-cpp/Makefile (repo mudler/parakeet.cpp, branch master). - backend/index.yaml: add &parakeetcpp meta + latest/development image entries for every matrix tag-suffix. - Makefile: add backends/parakeet-cpp to .NOTPARALLEL, BACKEND_PARAKEET_CPP definition, docker-build target eval, and test-extra-backend-parakeet-cpp- transcription target (mirrors test-extra-backend-whisper-transcription). Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Add ParakeetCppImporter so parakeet.cpp GGUFs auto-detect on /import-model and route to the parakeet-cpp backend (it also surfaces in /backends/known, which drives the import dropdown). - Match is narrow: a .gguf whose name carries a parakeet architecture token (<arch>-<size>-<quant>.gguf, e.g. tdt_ctc-110m-f16.gguf, rnnt-0.6b-q4_k.gguf, realtime_eou_120m-v1-q8_0.gguf), a direct URL to one, or preferences.backend="parakeet-cpp". It deliberately does NOT claim arbitrary llama-style GGUFs, nor the upstream nvidia/parakeet-* NeMo repos (.nemo, not runnable here). - Registered in the ASR batch BEFORE LlamaCPPImporter so its GGUFs aren't swallowed by the generic .gguf importer. - Import nests files under parakeet-cpp/models/<name>/, defaults to the smallest quant (q4_k, near-lossless on parakeet) with a size-ladder fallback, and honours preferences.quantizations / name / description. Tested with synthetic HF details (no network): metadata, positive matches (HF repo, direct URL, preference), narrowness negatives (llama GGUF, NeMo repo), and import (default quant, override, direct URL), 9 specs pass, build/vet/gofmt clean. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Add parakeet-cpp to the audio-to-text backend list and a dedicated usage section: direct GGUF import (auto-detects to the backend), model YAML, word-level timestamps via timestamp_granularities[]=word, and cache-aware streaming with the realtime_eou model. Points at the mudler/parakeet-cpp-gguf collection repo. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The L3 commit added the test-extra-backend-parakeet-cpp-transcription Makefile target but never invoked it in CI. Mirror the whisper job: - Add a parakeet-cpp output to detect-changes (emitted by changed-backends.js from the matrix entry). - Add tests-parakeet-cpp-grpc-transcription, gated on the parakeet-cpp path filter / run-all, building the backend image and running the transcription e2e against tdt_ctc-110m + the JFK clip. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Replace em dashes with plain punctuation in the backend comments, the importer, package.sh, and the audio-to-text docs section (and use "and" instead of the multiplication sign). No behaviour change. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

Add the 10 NVIDIA Parakeet models (f16, the recommended quality/speed default) as gallery entries that install on the parakeet-cpp backend from mudler/parakeet-cpp-gguf: tdt_ctc-110m/1.1b, tdt-0.6b-v2/v3, tdt-1.1b, ctc-0.6b/1.1b, rnnt-0.6b/1.1b, and the cache-aware streaming realtime_eou_120m-v1. Each pins the file sha256 and routes transcript usecases to the backend. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

- goparakeetcpp.go: //nolint:govet on the C-owned-pointer unsafe.Pointer conversion (golangci-lint reports new-only issues, so unlike the whisper backend's identical line this one is flagged). - Makefile: bump PARAKEET_VERSION to the current parakeet.cpp master commit (the previous pin's commit no longer exists after upstream history was squashed), so the backend image clone/build resolves again. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The previous SHA pin was orphaned when parakeet.cpp's single-commit master was amended/force-pushed, so the backend image clone (git fetch <sha>) failed across every build variant. Repoint to 845c29e, which upstream now keeps permanently fetchable via the `localai-backend-pin` tag, so future upstream amends no longer break the backend build. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The backend Dockerfile clones parakeet.cpp at PARAKEET_VERSION with a shallow fetch + checkout but never initialised submodules, so third_party/ggml was empty and the parakeet.cpp cmake build failed at `add_subdirectory(third_party/ggml)` (CMakeLists.txt:53) on every build variant. Add `git submodule update --init --recursive --depth 1 --single-branch` after checkout, mirroring the whisper backend. Verified locally: clone + submodule + cmake configure now succeeds. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The shared libparakeet.so linked ggml's shared libs (libggml*.so), but the package only ships libparakeet.so, so at runtime dlopen failed with "libggml.so.0: cannot open shared object file" (the e2e transcription test panicked on load). Build ggml static + PIC (BUILD_SHARED_LIBS=OFF, CMAKE_POSITION_INDEPENDENT_CODE=ON) so libparakeet.so embeds ggml and depends only on system libs already present in the runtime image. Verified locally: ldd shows no libggml dependency. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

The e2e streaming test ran AudioTranscriptionStream against tdt_ctc-110m (not a cache-aware streaming model), so stream_begin returned 0 and the call errored. Per LocalAI's streaming contract (and the whisper backend), a non-streaming model should fall back to a single offline transcription emitted as one delta plus a closing FinalResult. Do that instead of erroring, so the streaming endpoint works for every parakeet model. Verified locally: the streaming spec passes against the non-streaming 110m model via fallback. Assisted-by: Claude:claude-opus-4-8 [Claude Code] Signed-off-by: Ettore Di Giacinto <mudler@localai.io>

…10106) parakeet-cpp was added in #10084 but not registered in BackendCapabilities, so GuessUsecases only allowed "whisper" for FLAG_TRANSCRIPT and the UI could not classify parakeet-cpp models as speech-to-text. The result was that parakeet models appeared only in the LLM selector in the speech-to-speech pipeline, making them unusable for transcription through the UI. Closes #9718 Assisted-by: Claude Sonnet 4.6 <noreply@anthropic.com>

mudler force-pushed the feat/parakeet-backend branch 2 times, most recently from a7b7f46 to 4b1249c Compare May 30, 2026 07:35

mudler added 14 commits May 30, 2026 09:39

mudler force-pushed the feat/parakeet-backend branch from a044302 to cd2fc3b Compare May 30, 2026 09:39

mudler merged commit 4912c9b into master May 30, 2026
68 checks passed

mudler deleted the feat/parakeet-backend branch May 30, 2026 12:46

BrewTestBot mentioned this pull request May 30, 2026

localai 4.3.6 Homebrew/homebrew-core#285503

Merged

Dennisadira mentioned this pull request May 31, 2026

fix(config): register parakeet-cpp as a transcript backend (#9718) #10106

Merged

2 tasks

localai-bot added the enhancement New feature or request label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp)#10084

feat(parakeet-cpp): add NVIDIA NeMo Parakeet ASR backend (parakeet.cpp)#10084
mudler merged 14 commits into
masterfrom
feat/parakeet-backend

localai-bot commented May 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

localai-bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Layers

Models

Validation

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

localai-bot commented May 29, 2026 •

edited

Loading