https://mcp.vision.espressif.com{
"mcpServers": {
"esp-vision-mcp": {
"url": "https://mcp.vision.espressif.com"
}
}
}Capture, processing, inference, and display — all on the device.

Camera, image processing, on-device inference, and hardware peripherals are all accessible through Python APIs.

import espdl
import sensor
import time
sensor.reset()
sensor.set_pixformat(sensor.RGB565)
sensor.set_framesize(sensor.QVGA)
sensor.skip_frames(time=1000)
det = espdl.ESPDet("/sdcard/hand_det.espdl", score=0.5, nms=0.7)
while True:
img = sensor.snapshot()
for x, y, w, h, score, category in det.detect(img):
img.draw_rectangle(x, y, w, h, color=(255, 0, 0), thickness=2)
img.draw_string(x, max(0, y - 12), "%.2f:%d" % (score, category))
img.flush()
time.sleep_ms(20)Capture, processing, inference, and control all run on the device.
Unified sensor · image · display · espdl APIs — real-time results in a few lines of Python. Flash online, connect the Web IDE, no toolchain to set up.
Object detection, pose estimation, and image classification; load a quantized .espdl model in one line — real-time, offline, on device — or bring your own PyTorch / TensorFlow model.
Stream straight to OpenAI-compatible vision APIs and tap multimodal models like GPT-4o — complex scene understanding with no local compute.
Drawing, filtering, color tracking, feature detection, QR codes, barcodes, and AprilTags.
H.264 / MJPEG / RTSP and USB CDC live preview, saturating the on-chip multimedia accelerators.
Cameras, displays, SPI, I2C, UART, SD cards and more work out of the box, with a MicroPython machine-compatible API.
No toolchain installation is required; flashing, writing, and running all happen in the browser.
Connect your board straight from the browser and flash the official firmware.
Loading firmware manifest…
Write scripts in the Web IDE or VS Code and watch capture and inference results in real time.
Connect the ESP-VISION MCP to an AI assistant to build edge vision applications through conversation.
https://mcp.vision.espressif.com{
"mcpServers": {
"esp-vision-mcp": {
"url": "https://mcp.vision.espressif.com"
}
}
}ESP-DL and TFLite Micro models are loaded from device storage in a single line of code.
| Model | Task | Input | Dataset | Size |
|---|---|---|---|---|
ESPDet Pico Cat ESPDet Pico · espdl.ESPDetDetects cats in camera images. cat | Object Detection | 224×224 RGB565 | Cat | 487 KB |
ESPDet Pico Cat & Dog ESPDet Pico · espdl.ESPDetDetects cats and dogs in camera images. catdog | Object Detection | 224×224 RGB565 | Cat & Dog↗ | 561 KB |
ESPDet Pico Dog ESPDet Pico · espdl.ESPDetDetects dogs in camera images. dog | Object Detection | 224×224 RGB565 | Dog | 486 KB |
ESPDet Pico Face ESPDet Pico · espdl.ESPDetDetects human faces in camera images. face | Object Detection | 224×224 RGB565 | Face | 484 KB |
ESPDet Pico Hand ESPDet Pico · espdl.ESPDetDetects human hands in camera images. hand | Object Detection | 224×224 RGB565 | Hand | 486 KB |
YOLO11n COCO YOLO11n · espdl.YOLO11Detects the 80 COCO object classes in camera images. | Object Detection | 160×160 RGB565 | COCO | 2.7 MB |
YOLO11n-Pose COCO YOLO11n-Pose · espdl.YOLO11nPoseDetects people and estimates 17 COCO body keypoints (human pose) in camera images. person | Pose Estimation | 160×160 RGB565 | COCO | 3.0 MB |
| Model | Task | Input | Dataset | Size |
|---|---|---|---|---|
Person Detection MobileNet · tflite.ModelTensorFlow Lite Micro person-detection model: classifies whether a person is present in a 96x96 grayscale camera frame. no personperson | Image Classification | 96×96 GRAYSCALE | Visual Wake Words | 294 KB |
Sine MLP · tflite.ModelTensorFlow Lite Micro "hello world" model: approximates sin(x) for x in [0, 2*pi] from a single scalar input. | Regression | 1 FLOAT32 | Synthetic | 2 KB |
Camera, display, and storage work out of the box across the ESP32-S31, ESP32-P4, and ESP32-S3 series.
| Image | Board | Chip | ESP-VISION Support |
|---|---|---|---|
![]() | ESP32-P4X-EYE↗ | ESP32-P4 | Supported sensor · image · display · espdl · tflite · imageio · h264 · rtsp · barcode |
![]() | ESP32-P4X-Function-EV-Board↗ | ESP32-P4 | Supported sensor · image · display · espdl · tflite · imageio · h264 · rtsp · barcode |
![]() | ESP32-S3-EYE↗ | ESP32-S3 | Supported sensor · image · display · espdl · tflite · imageio |
![]() | ESP32-S31-Korvo↗ | ESP32-S31 | SupportedESP-IDF master only sensor · image · display · espdl · tflite · imageio |