Pixar to the workbench

He listens. He looks. He feels.

A real WALL·E you can talk to — who talks back, moves, and shows how he feels. A multilingual AI robot with an animated touchscreen face, seven servos, and live Spotify.

3 processors
Bilingual EN · VI
Live Spotify
Introduction

WALL·E is a hand-built, AI-powered robot inspired by the Pixar character. You speak to him in English or Vietnamese; he understands, replies in a warm little voice, and physically acts.

At his core is a real-time voice agent running on a PC: it transcribes your speech, reasons with a large language model, speaks back, and — crucially — decides when to do something. When the conversation calls for it, the agent calls tools that turn intent into motion: raise the right hand, wave both arms, look up, look at what's in front of him, or shift his expression to match the mood of the chat. He can even see— ask "what are you looking at?" and he captures a camera frame and describes it.

Those decisions travel over an MQTT message bus to two embedded boards. An Arduino R4 drives seven servos through a smooth, non-blocking animation engine, so arms, neck, and eyes move naturally. An ESP32-S3 touchscreen renders his face in a custom UI that faithfully matches the WALL·E aesthetic — amber highlights, a warm dark palette, a subtle CRT texture, and a pair of expressive binocular eyes that blink, wander, and change shape and color for seven distinct emotions.

Capabilities

Most robot replicas are static props. WALL·E is built to feel alive — across conversation, motion, vision, and a polished interface.

Conversational AI

Natural two-way voice, bilingual (EN + VI), low-latency with automatic turn-taking.

It acts, not just talks

Tool-calling lets the AI decide when to move, look, or change expression.

Computer vision

"What do you see?" captures a camera frame and reasons over it in real time.

Expressive face

Binocular eyes with blinking, gaze wander, and 7 emotions — each with its own geometry and color.

Real movement

7 servos: arms raise/lower/wave, neck pan + tilt, eye tilt — smooth concurrent motion.

6-tab touchscreen

Face · Talk · Drive · Music · Power · System, plus an AUTO mode that follows the robot.

Live Spotify

Now-playing, on-device album-art decode, equalizer, and full transport + volume control.

Real-time sync

Three processors stay in lock-step over an MQTT bus, reflecting state within a fraction of a second.

Data flow

You → Brain (PC) → MQTT → Body (Arduino) + Head (ESP32) → back up via telemetry.

  1. 01

    You speak

    The PC Brain transcribes your speech to text (multilingual STT).

  2. 02

    The model reasons

    An LLM decides how to respond and whether to act, then speaks the reply and emits commands.

  3. 03

    Commands publish

    Motion (hand, head, eyes, gesture), mood, and live talk state flow onto the MQTT bus.

  4. 04

    Body animates

    The Arduino receives motion commands and drives the seven servos with a non-blocking engine.

  5. 05

    Head reacts

    The ESP32 updates the face, transcript, and indicators; its Music tab streams from Spotify.

  6. 06

    Telemetry returns

    Battery and body status flow back up the bus so the screen always reflects reality.

The interface

A faithful digital twin of WALL·E's 4.3" 800×480 touchscreen. It runs live below — tap the tabs to drive it, or leave AUTO on and watch the screens follow his state.

Initializing 3D engine…

WALL·E

Full assembly · 3D print

WALL·ESTANDBY
34%72%

HAPPY

Task complete. Cube stacked.

WALL·E · 4.3" LCD · 800×480 — autonomous mode active, screens shift with the robot's state

TabWhat it shows
FACEAnimated WALL·E eyes that mirror his current emotion, plus an emotion picker.
TALKLive speaking / listening state, the sentence he's saying, and an audio visualizer.
DRIVEA dark-themed map with a drift trail and speed / heading / track telemetry.
MUSICLive Spotify: cover art, title/artist/album, equalizer, and full playback control.
POWERBattery gauge, voltage, core temperature, and a cell array.
SYSTEMDiagnostics for every servo and module, lighting up as commands fire.
AUTOToggle: the screen auto-follows the robot's live state (jumps to Talk while speaking).
Technical architecture

WALL·E is a three-tier distributed system connected by a local MQTT broker — careful engineering to make conversation, motion, and a polished interface all happen at once.

Brain

PC · Python

A real-time voice agent on LiveKit Agents chains Deepgram (STT), Anthropic Claude (reasoning + tool-calling), and Cartesia (TTS), with Silero VAD. Function tools turn conversation into robot actions over MQTT.

Body

Arduino R4 WiFi · C++

Subscribes to command topics and drives 7 MG90S servos through a PCA9685 PWM controller using a non-blocking easing/wave animation engine. Publishes power + status telemetry.

Head

ESP32-S3 · LVGL

An 800×480 RGB touchscreen UI on a dual-core design: LVGL rendering on one core, MQTT + Spotify TLS on the other. Album art is fetched over HTTPS and JPEG-decoded into a PSRAM framebuffer.

Tech stack

Languages

C / C++PythonTypeScript / React

Embedded UI

LVGL 8ESP32 Arduino coreRGB direct-mode

AI / Voice

LiveKit AgentsAnthropic ClaudeDeepgramCartesiaSilero VADOpenCV

Robotics

PCA9685 PWMServo animation engine

Connectivity

MQTT (Mosquitto)PubSubClientArduinoJsonSpotify Web APITJpg_Decoder

Infra

Docker ComposeLiveKit
Hardware

ComponentDetail
Head displayWaveshare ESP32-S3-Touch-LCD-4.3B — 800×480 RGB IPS, capacitive touch, 8 MB PSRAM, 16 MB flash
Body controllerArduino R4 WiFi
Servo driverPCA9685 16-channel PWM (I²C)
Actuators7× MG90S servos — 2 arms, neck pan, neck tilt (×2 mirrored), 2 eye-tilt
CameraUSB webcam (WALL·E’s "eye"); ESP32-CAM planned
MessagingEclipse Mosquitto MQTT broker (Docker)
Engineering deep-dive

A study in distributed real-time systems on tiny hardware.

Highlights

  • Real-time on a microcontroller: a dual-core FreeRTOS split (rendering vs networking) stopped the animated UI from starving the network stack.
  • Spotify on an ESP32: OAuth2 refresh flow, streaming filtered JSON parsing, HTTPS/TLS, and on-device album-art JPEG decode into a PSRAM canvas.
  • Memory engineering: routing LVGL's ~300 UI widgets into PSRAM to free scarce internal RAM for the WiFi stack.
  • Faithful UI in C: hand-built WALL·E eyes and a CRT aesthetic recreated from a React reference into embedded LVGL.
  • Distributed debugging: tracing a 'robot won't react' bug across PC → broker → device — a shell-quoting issue, not firmware.

Roadmap

  • On-robot microphone + speaker for a fully untethered WALL·E.
  • Drive motors + real odometry/GPS for the Drive tab.
  • Battery + power sensing for true Power telemetry.
  • An on-board ESP32-CAM so vision lives on the robot.