WALL·E

Pixar to the workbench

He listens. He looks. He feels.

A real WALL·E you can talk to — who talks back, moves, and shows how he feels. A multilingual AI robot with an animated touchscreen face, seven servos, and live Spotify.

Try the interface Meet the robot

3 processors

Bilingual EN · VI

Live Spotify

Introduction

WALL·E is a hand-built, AI-powered robot inspired by the Pixar character. You speak to him in English or Vietnamese; he understands, replies in a warm little voice, and physically acts.

At his core is a real-time voice agent running on a PC: it transcribes your speech, reasons with a large language model, speaks back, and — crucially — decides when to do something. When the conversation calls for it, the agent calls tools that turn intent into motion: raise the right hand, wave both arms, look up, look at what's in front of him, or shift his expression to match the mood of the chat. He can even see— ask "what are you looking at?" and he captures a camera frame and describes it.

Those decisions travel over an MQTT message bus to two embedded boards. An Arduino R4 drives seven servos through a smooth, non-blocking animation engine, so arms, neck, and eyes move naturally. An ESP32-S3 touchscreen renders his face in a custom UI that faithfully matches the WALL·E aesthetic — amber highlights, a warm dark palette, a subtle CRT texture, and a pair of expressive binocular eyes that blink, wander, and change shape and color for seven distinct emotions.

Capabilities

Most robot replicas are static props. WALL·E is built to feel alive — across conversation, motion, vision, and a polished interface.

Conversational AI

Natural two-way voice, bilingual (EN + VI), low-latency with automatic turn-taking.

It acts, not just talks

Tool-calling lets the AI decide when to move, look, or change expression.

Computer vision

"What do you see?" captures a camera frame and reasons over it in real time.

Expressive face

Binocular eyes with blinking, gaze wander, and 7 emotions — each with its own geometry and color.

Real movement

7 servos: arms raise/lower/wave, neck pan + tilt, eye tilt — smooth concurrent motion.

6-tab touchscreen

Face · Talk · Drive · Music · Power · System, plus an AUTO mode that follows the robot.

Live Spotify

Now-playing, on-device album-art decode, equalizer, and full transport + volume control.

Real-time sync

Three processors stay in lock-step over an MQTT bus, reflecting state within a fraction of a second.

Data flow

You → Brain (PC) → MQTT → Body (Arduino) + Head (ESP32) → back up via telemetry.

01
You speak
The PC Brain transcribes your speech to text (multilingual STT).
02
The model reasons
An LLM decides how to respond and whether to act, then speaks the reply and emits commands.
03
Commands publish
Motion (hand, head, eyes, gesture), mood, and live talk state flow onto the MQTT bus.
04
Body animates
The Arduino receives motion commands and drives the seven servos with a non-blocking engine.
05
Head reacts
The ESP32 updates the face, transcript, and indicators; its Music tab streams from Spotify.
06
Telemetry returns
Battery and body status flow back up the bus so the screen always reflects reality.

The interface

A faithful digital twin of WALL·E's 4.3" 800×480 touchscreen. It runs live below — tap the tabs to drive it, or leave AUTO on and watch the screens follow his state.

Drag to orbit · Scroll to zoom

Initializing 3D engine…

WALL·E

Full assembly · 3D print

800 × 480 · Interactive

WALL·ESTANDBY

34%72%

HAPPY

Task complete. Cube stacked.

WALL·E · 4.3" LCD · 800×480 — autonomous mode active, screens shift with the robot's state

Tab	What it shows
FACE	Animated WALL·E eyes that mirror his current emotion, plus an emotion picker.
TALK	Live speaking / listening state, the sentence he's saying, and an audio visualizer.
DRIVE	A dark-themed map with a drift trail and speed / heading / track telemetry.
MUSIC	Live Spotify: cover art, title/artist/album, equalizer, and full playback control.
POWER	Battery gauge, voltage, core temperature, and a cell array.
SYSTEM	Diagnostics for every servo and module, lighting up as commands fire.
AUTO	Toggle: the screen auto-follows the robot's live state (jumps to Talk while speaking).

Technical architecture

WALL·E is a three-tier distributed system connected by a local MQTT broker — careful engineering to make conversation, motion, and a polished interface all happen at once.

Brain

PC · Python

A real-time voice agent on LiveKit Agents chains Deepgram (STT), Anthropic Claude (reasoning + tool-calling), and Cartesia (TTS), with Silero VAD. Function tools turn conversation into robot actions over MQTT.

Body

Arduino R4 WiFi · C++

Subscribes to command topics and drives 7 MG90S servos through a PCA9685 PWM controller using a non-blocking easing/wave animation engine. Publishes power + status telemetry.

Head

ESP32-S3 · LVGL

An 800×480 RGB touchscreen UI on a dual-core design: LVGL rendering on one core, MQTT + Spotify TLS on the other. Album art is fetched over HTTPS and JPEG-decoded into a PSRAM framebuffer.

Tech stack

Languages

C / C++PythonTypeScript / React

Embedded UI

LVGL 8ESP32 Arduino coreRGB direct-mode

AI / Voice

LiveKit AgentsAnthropic ClaudeDeepgramCartesiaSilero VADOpenCV

Robotics

PCA9685 PWMServo animation engine

Connectivity

MQTT (Mosquitto)PubSubClientArduinoJsonSpotify Web APITJpg_Decoder

Infra

Docker ComposeLiveKit

Hardware

Component	Detail
Head display	Waveshare ESP32-S3-Touch-LCD-4.3B — 800×480 RGB IPS, capacitive touch, 8 MB PSRAM, 16 MB flash
Body controller	Arduino R4 WiFi
Servo driver	PCA9685 16-channel PWM (I²C)
Actuators	7× MG90S servos — 2 arms, neck pan, neck tilt (×2 mirrored), 2 eye-tilt
Camera	USB webcam (WALL·E’s "eye"); ESP32-CAM planned
Messaging	Eclipse Mosquitto MQTT broker (Docker)

Engineering deep-dive

A study in distributed real-time systems on tiny hardware.

Highlights

Real-time on a microcontroller: a dual-core FreeRTOS split (rendering vs networking) stopped the animated UI from starving the network stack.
Spotify on an ESP32: OAuth2 refresh flow, streaming filtered JSON parsing, HTTPS/TLS, and on-device album-art JPEG decode into a PSRAM canvas.
Memory engineering: routing LVGL's ~300 UI widgets into PSRAM to free scarce internal RAM for the WiFi stack.
Faithful UI in C: hand-built WALL·E eyes and a CRT aesthetic recreated from a React reference into embedded LVGL.
Distributed debugging: tracing a 'robot won't react' bug across PC → broker → device — a shell-quoting issue, not firmware.

Roadmap

On-robot microphone + speaker for a fully untethered WALL·E.
Drive motors + real odometry/GPS for the Drive tab.
Battery + power sensing for true Power telemetry.
An on-board ESP32-CAM so vision lives on the robot.

A robot built to feel alive

Key features

Conversational AI

It acts, not just talks

Computer vision

Expressive face

Real movement

6-tab touchscreen

Live Spotify

Real-time sync

How it works

You speak

The model reasons

Commands publish

Body animates

Head reacts

Telemetry returns

Six tabs. One little terminal.

Interactive 3D Model

Live Interface

Three processors, one shared language

Brain

Body

Head

Built across the full stack

Languages

Embedded UI

AI / Voice

Robotics

Connectivity

Infra

Specs

Challenges & what's next

Highlights

Roadmap