Reminder service
Edit: Improve Hermes STT for Hebrew voice notes with local Codex
Local-first task tracking, scheduling, and history.
Today
Plan
All tasks
History
Calendar
Edit
Update task
Title
Notes
Prompt for local Codex: Task: Improve Luna/Hermes speech-to-text for Hebrew and mixed Hebrew-English Telegram voice notes. Problem: - Luna/Hermes voice notes are currently transcribed by Hermes local STT, not cloud STT. - Current config is local faster-whisper with model `small` and language auto-detect (`stt.provider=local`, `stt.local.model=small`, `stt.local.language=""`). This was upgraded from `base` because Hebrew/mixed-note quality was weak. - OpenCLAW/Richard used a more robust transcription chain: local faster-whisper `small` plus a wrapper/audit path and optional OpenAI `gpt-4o-transcribe` via Codex OAuth. - The old OpenCLAW Codex/OpenAI STT leg is not usable as-is: it failed with HTTP 401 token expired. - Luna has an `openai-codex` OAuth credential in Hermes auth, but direct probes to OpenAI audio transcription with that token returned HTTP 500 for `whisper-1`, `gpt-4o-mini-transcribe`, and `gpt-4o-transcribe`. Do not assume Codex OAuth works for audio STT without re-verifying a supported path. - Standard cloud STT keys were not configured at the time checked: `VOICE_TOOLS_OPENAI_KEY`, `OPENAI_API_KEY`, `GROQ_API_KEY`, `MISTRAL_API_KEY`, and `XAI_API_KEY` were all unset. What we are trying to solve: - Hebrew and mixed Hebrew-English voice notes are mis-transcribed often enough to cause wrong reminders/project updates. - We need a clean Luna-owned solution, not a copy of OpenCLAW secrets or a dependency on `/Users/agent/.openclaw`. Suggested approach: 1. Keep local faster-whisper as the first pass and fallback. 2. Capture language/quality metadata from the local pass if Hermes exposes it; otherwise add a small wrapper around faster-whisper that returns detected language/probability plus transcript. 3. If detected language is Hebrew (`he`), transcript contains Hebrew characters, or quality/confidence heuristics are poor, route to a stronger cloud STT provider. 4. Prefer a normal configured cloud credential, e.g. `VOICE_TOOLS_OPENAI_KEY` for OpenAI `gpt-4o-transcribe`, or Groq/Mistral/xAI if that is the credential Amit chooses. Treat Codex OAuth as experimental until a supported audio endpoint path is proven. 5. Add a non-secret audit log comparing local vs cloud transcript and selection decision, so we can tune false positives/false negatives. 6. Avoid printing or logging secrets/tokens. Redact credential values. 7. Keep changes under Luna/Hermes-owned paths (`/Users/agent/Assistant` or proper Hermes config/code), not `.openclaw` compatibility paths. Relevant paths: - Hermes config: `/Users/agent/.hermes/config.yaml` - Hermes STT implementation: `/Users/agent/.hermes/hermes-agent/tools/transcription_tools.py` - Audio cache examples: `/Users/agent/.hermes/audio_cache/` - OpenCLAW reference only, do not depend on it: `/Users/agent/.openclaw/workspace/scripts/robust_transcribe_voice_note.py`, `/Users/agent/.openclaw/workspace/scripts/local_transcribe_faster_whisper.py`, `/Users/agent/.openclaw/workspace/scripts/openai_transcribe_via_codex_oauth.py` - Hermes auth exists but secrets must remain redacted: `/Users/agent/.hermes/auth.json` Acceptance criteria: - A Hebrew/mixed note either transcribes better than current local-only path or falls back safely to local. - English notes are not made worse by globally forcing Hebrew. - No OpenCLAW live secrets are reused. - No token values are printed in logs or task output. - There is a simple verification command or test using cached audio files.
Due type
none
date only
date + time
Due date
Due date/time
Save changes
Cancel