HUGO - Helpful Universal Guide & Organizer

Voice-First Agent Platform for Reachy Mini Robot

Project Overview

HUGO is a voice-first agent platform written in Go that orchestrates tools, controls a Reachy Mini robot, and communicates primarily through speech. The agent talks to you while it works — not after it's done.

The Idea: A physical desktop companion that listens, acts on your behalf, and responds with voice — all through natural conversation in a browser-based interface.

The Architecture: Browser Mic → WebSocket → VAD → STT → LLM Agent (Claude) → Tool Execution → TTS → Audio Playback

Technical Architecture

Core (tRPC-agent-go)

Single agent powered by Claude Sonnet 4 via tRPC-agent-go, a production-grade orchestration framework supporting:

Streaming ReAct loop — Observe, think, act, with real-time event emission
Tool calling — current_time, calculator, look_at (robot control)
Concurrent event consumer — Agent runner and voice pipeline operate independently via goroutine channels

Voice Pipeline (All Local)

Full speech processing running on-device with no cloud dependency:

Voice Activity Detection: Silero VAD (ONNX Runtime, <20ms)
Speech-to-Text: Moonshine Tiny (sherpa-onnx, quantized, local)
Text-to-Speech: Kokoro-82M (MLX) with OpenAI API fallback
Speech Queue: Non-blocking buffered channel with barge-in support

WebSocket Server

Real-time browser communication with structured message protocol:

Streaming response chunks, tool call events, and tool results
Goroutine-isolated per-connection handlers
JSON message protocol for extensibility

Concurrent Design

The core UX innovation is overlapped execution:

Agent keeps thinking while TTS synthesizes previous sentences
User can barge-in (interrupt) — context cancellation propagates through runner, consumer, and speech queue
Non-blocking speech queue means the agent never waits for audio playback

Technology Stack

Language: Go 1.25
Agent Framework: tRPC-agent-go v1.6.0 (streaming ReAct loop)
LLM: Claude Sonnet 4 (via OpenAI-compatible endpoint)
VAD: Silero ONNX
STT: Moonshine Tiny (sherpa-onnx, quantized)
TTS: Kokoro-82M (MLX), OpenAI TTS fallback
Audio: gen2brain/malgo (miniaudio, cross-platform)
Server: gorilla/websocket + net/http
Robot: Reachy Mini (integration in progress)

What's Been Built

Text Agent — Full agentic conversation with tool calling and streaming responses
Tool System — Extensible tool registration with current_time, calculator, look_at
WebSocket Server — Browser-based real-time communication with structured protocol
Voice Pipeline — VAD, STT, TTS interfaces and implementations with speech queue
Concurrent Architecture — Event consumer pattern enabling overlapped agent/voice execution

Links:

GitHub Repository