Gemini Live AI Agent — Brandi Kinard

Clutch

Clutch is an AI how-to agent that sees real-world tasks through Ray-Ban Meta smart glasses or a phone camera and guides you step by step — in your language, with voice narration, AI-generated images, and contextual YouTube tutorials. Built on the Gemini Live API (bidirectional audio streaming), Google ADK, and Cloud Run, the core challenge wasn’t UI — it was designing for a probabilistic agent: silent tool failures, skipped steps, and leaky internal state. The UX was built to absorb uncertainty without exposing it to the user. Competed in the Gemini Live Agent Challenge (up to $80,000 prize pool).

Role — Solo Designer & Engineer · Gemini Live API · Google ADK · Cloud Run · SwiftUI · Python · Ray-Ban Meta Wayfarer · Bidirectional Audio Streaming · Agentic UI Design · iOS

The Concept
A task assistant that lives at the edge of the physical world. You're under the hood of a car, your hands are dirty, and you need to know what to do next. Clutch is the only tool in your box that tells you.

The Agent
Architecture Three core tools: generate_steps (Gemini 2.5 Flash + parallel image generation), search_youtube (YouTube Data API v3), and advance_step (frontend wizard signal). All image generation is baked directly into generate_steps via asyncio.gather — running four images in parallel rather than relying on the agent to call a separate tool it would frequently skip.

The Experience
Step-by-step wizard with AI-generated reference images for each stage. Voice-first navigation with manual tap fallback. Cross-lingual switching across five languages mid-session. Camera preview streaming from glasses or phone. A floating liquid glass toolbar on iOS. All surfaces designed as refractive panels against a Gemini gradient background — the visual language of a product positioned between Google and Apple.

Designing for Probabilistic Behavior
The agent doesn't always do what the prompt says. It sometimes reads its own tool output aloud. It sometimes advances steps before the user is ready. It sometimes asks "how's that going?" three times in a row. These aren't bugs — they're the nature of LLM agents under real latency and instruction-following constraints. The design solution was a manual wizard layer that stays correct regardless of agent behavior: the user always has ground truth on screen even when the voice experience drifts. Graceful degradation as a first-class design requirement.

Live
clutch-vyt2xlbryq-uc.a.run.app · github.com/Brandi-Kinard/clutch

About

Brandi Kinard

Say hello

Clutch

Previous project

Next project