HUGO

AI/MLOngoingSolo project2026
GotRPC-agent-goClaude Sonnet 4Silero VADMoonshine STTKokoro TTSWebSocketReachy MiniONNX Runtime
Streaming ReAct agent with concurrent voice pipeline in Go
All-local voice processing: Silero VAD, Moonshine STT, Kokoro TTS
Overlapped execution — agent thinks while TTS synthesizes
WebSocket server with structured message protocol
Barge-in support via context cancellation propagation

HUGO - Helpful Universal Guide & Organizer

Voice-First Agent Platform for Reachy Mini Robot


Project Overview

HUGO is a voice-first agent platform written in Go that orchestrates tools, controls a Reachy Mini robot, and communicates primarily through speech. The agent talks to you while it works — not after it's done.

The Idea: A physical desktop companion that listens, acts on your behalf, and responds with voice — all through natural conversation in a browser-based interface.

The Architecture: Browser Mic → WebSocket → VAD → STT → LLM Agent (Claude) → Tool Execution → TTS → Audio Playback

Technical Architecture

Core (tRPC-agent-go)

Single agent powered by Claude Sonnet 4 via tRPC-agent-go, a production-grade orchestration framework supporting:

  • Streaming ReAct loop — Observe, think, act, with real-time event emission
  • Tool callingcurrent_time, calculator, look_at (robot control)
  • Concurrent event consumer — Agent runner and voice pipeline operate independently via goroutine channels

Voice Pipeline (All Local)

Full speech processing running on-device with no cloud dependency:

  • Voice Activity Detection: Silero VAD (ONNX Runtime, <20ms)
  • Speech-to-Text: Moonshine Tiny (sherpa-onnx, quantized, local)
  • Text-to-Speech: Kokoro-82M (MLX) with OpenAI API fallback
  • Speech Queue: Non-blocking buffered channel with barge-in support

WebSocket Server

Real-time browser communication with structured message protocol:

  • Streaming response chunks, tool call events, and tool results
  • Goroutine-isolated per-connection handlers
  • JSON message protocol for extensibility

Concurrent Design

The core UX innovation is overlapped execution:

  • Agent keeps thinking while TTS synthesizes previous sentences
  • User can barge-in (interrupt) — context cancellation propagates through runner, consumer, and speech queue
  • Non-blocking speech queue means the agent never waits for audio playback

Technology Stack

  • Language: Go 1.25
  • Agent Framework: tRPC-agent-go v1.6.0 (streaming ReAct loop)
  • LLM: Claude Sonnet 4 (via OpenAI-compatible endpoint)
  • VAD: Silero ONNX
  • STT: Moonshine Tiny (sherpa-onnx, quantized)
  • TTS: Kokoro-82M (MLX), OpenAI TTS fallback
  • Audio: gen2brain/malgo (miniaudio, cross-platform)
  • Server: gorilla/websocket + net/http
  • Robot: Reachy Mini (integration in progress)

What's Been Built

  • Text Agent — Full agentic conversation with tool calling and streaming responses
  • Tool System — Extensible tool registration with current_time, calculator, look_at
  • WebSocket Server — Browser-based real-time communication with structured protocol
  • Voice Pipeline — VAD, STT, TTS interfaces and implementations with speech queue
  • Concurrent Architecture — Event consumer pattern enabling overlapped agent/voice execution

Links: