He listens. He looks. He feels.
A real WALL·E you can talk to — who talks back, moves, and shows how he feels. A multilingual AI robot with an animated touchscreen face, seven servos, and live Spotify.
WALL·E is a hand-built, AI-powered robot inspired by the Pixar character. You speak to him in English or Vietnamese; he understands, replies in a warm little voice, and physically acts.
At his core is a real-time voice agent running on a PC: it transcribes your speech, reasons with a large language model, speaks back, and — crucially — decides when to do something. When the conversation calls for it, the agent calls tools that turn intent into motion: raise the right hand, wave both arms, look up, look at what's in front of him, or shift his expression to match the mood of the chat. He can even see— ask "what are you looking at?" and he captures a camera frame and describes it.
Those decisions travel over an MQTT message bus to two embedded boards. An Arduino R4 drives seven servos through a smooth, non-blocking animation engine, so arms, neck, and eyes move naturally. An ESP32-S3 touchscreen renders his face in a custom UI that faithfully matches the WALL·E aesthetic — amber highlights, a warm dark palette, a subtle CRT texture, and a pair of expressive binocular eyes that blink, wander, and change shape and color for seven distinct emotions.
Most robot replicas are static props. WALL·E is built to feel alive — across conversation, motion, vision, and a polished interface.
Conversational AI
Natural two-way voice, bilingual (EN + VI), low-latency with automatic turn-taking.
It acts, not just talks
Tool-calling lets the AI decide when to move, look, or change expression.
Computer vision
"What do you see?" captures a camera frame and reasons over it in real time.
Expressive face
Binocular eyes with blinking, gaze wander, and 7 emotions — each with its own geometry and color.
Real movement
7 servos: arms raise/lower/wave, neck pan + tilt, eye tilt — smooth concurrent motion.
6-tab touchscreen
Face · Talk · Drive · Music · Power · System, plus an AUTO mode that follows the robot.
Live Spotify
Now-playing, on-device album-art decode, equalizer, and full transport + volume control.
Real-time sync
Three processors stay in lock-step over an MQTT bus, reflecting state within a fraction of a second.
You → Brain (PC) → MQTT → Body (Arduino) + Head (ESP32) → back up via telemetry.
- 01
You speak
The PC Brain transcribes your speech to text (multilingual STT).
- 02
The model reasons
An LLM decides how to respond and whether to act, then speaks the reply and emits commands.
- 03
Commands publish
Motion (hand, head, eyes, gesture), mood, and live talk state flow onto the MQTT bus.
- 04
Body animates
The Arduino receives motion commands and drives the seven servos with a non-blocking engine.
- 05
Head reacts
The ESP32 updates the face, transcript, and indicators; its Music tab streams from Spotify.
- 06
Telemetry returns
Battery and body status flow back up the bus so the screen always reflects reality.
A faithful digital twin of WALL·E's 4.3" 800×480 touchscreen. It runs live below — tap the tabs to drive it, or leave AUTO on and watch the screens follow his state.
Drag to orbit · Scroll to zoom
WALL·E
Full assembly · 3D print
800 × 480 · Interactive
HAPPY
Task complete. Cube stacked.
WALL·E · 4.3" LCD · 800×480 — autonomous mode active, screens shift with the robot's state
| Tab | What it shows |
|---|---|
| FACE | Animated WALL·E eyes that mirror his current emotion, plus an emotion picker. |
| TALK | Live speaking / listening state, the sentence he's saying, and an audio visualizer. |
| DRIVE | A dark-themed map with a drift trail and speed / heading / track telemetry. |
| MUSIC | Live Spotify: cover art, title/artist/album, equalizer, and full playback control. |
| POWER | Battery gauge, voltage, core temperature, and a cell array. |
| SYSTEM | Diagnostics for every servo and module, lighting up as commands fire. |
| AUTO | Toggle: the screen auto-follows the robot's live state (jumps to Talk while speaking). |
WALL·E is a three-tier distributed system connected by a local MQTT broker — careful engineering to make conversation, motion, and a polished interface all happen at once.
Brain
PC · Python
A real-time voice agent on LiveKit Agents chains Deepgram (STT), Anthropic Claude (reasoning + tool-calling), and Cartesia (TTS), with Silero VAD. Function tools turn conversation into robot actions over MQTT.
Body
Arduino R4 WiFi · C++
Subscribes to command topics and drives 7 MG90S servos through a PCA9685 PWM controller using a non-blocking easing/wave animation engine. Publishes power + status telemetry.
Head
ESP32-S3 · LVGL
An 800×480 RGB touchscreen UI on a dual-core design: LVGL rendering on one core, MQTT + Spotify TLS on the other. Album art is fetched over HTTPS and JPEG-decoded into a PSRAM framebuffer.
Languages
Embedded UI
AI / Voice
Robotics
Connectivity
Infra
| Component | Detail |
|---|---|
| Head display | Waveshare ESP32-S3-Touch-LCD-4.3B — 800×480 RGB IPS, capacitive touch, 8 MB PSRAM, 16 MB flash |
| Body controller | Arduino R4 WiFi |
| Servo driver | PCA9685 16-channel PWM (I²C) |
| Actuators | 7× MG90S servos — 2 arms, neck pan, neck tilt (×2 mirrored), 2 eye-tilt |
| Camera | USB webcam (WALL·E’s "eye"); ESP32-CAM planned |
| Messaging | Eclipse Mosquitto MQTT broker (Docker) |
A study in distributed real-time systems on tiny hardware.
Highlights
- Real-time on a microcontroller: a dual-core FreeRTOS split (rendering vs networking) stopped the animated UI from starving the network stack.
- Spotify on an ESP32: OAuth2 refresh flow, streaming filtered JSON parsing, HTTPS/TLS, and on-device album-art JPEG decode into a PSRAM canvas.
- Memory engineering: routing LVGL's ~300 UI widgets into PSRAM to free scarce internal RAM for the WiFi stack.
- Faithful UI in C: hand-built WALL·E eyes and a CRT aesthetic recreated from a React reference into embedded LVGL.
- Distributed debugging: tracing a 'robot won't react' bug across PC → broker → device — a shell-quoting issue, not firmware.
Roadmap
- On-robot microphone + speaker for a fully untethered WALL·E.
- Drive motors + real odometry/GPS for the Drive tab.
- Battery + power sensing for true Power telemetry.
- An on-board ESP32-CAM so vision lives on the robot.