On-Device Generative AI for GDPR-Compliant Visual Monitoring: Natural Language Alerts from Local Object Detection
Quick Take
This paper introduces a GDPR-compliant visual monitoring system using a YOLOv5n-seg model on a Raspberry Pi 5, ensuring all inference occurs on-device. The system generates natural language alerts via a Phi-3 Mini model, transmitting only minimal JSON payloads without any raw image data crossing the network, thus aligning with data minimization principles.
Key Points
- YOLOv5n-seg model performs real-time object detection on Raspberry Pi 5.
- Raw pixel buffers are discarded immediately after inference to ensure privacy.
- Phi-3 Mini synthesizes natural language alerts from minimal JSON event payloads.
- No image data is transmitted over the network, ensuring GDPR compliance.
- System architecture demonstrates feasibility for practical deployment in visual monitoring.
Article Content
From source RSS / original summaryarXiv:2605. 30544v1 Announce Type: new Abstract: Visual monitoring systems that rely on cloud-based AI inference expose raw image data to external services, creating fundamental tensions with the data-minimisation principle of the General Data Protection Regulation (GDPR). This paper presents a proof-of-concept privacy-by-design pipeline that resolves this tension by confining all inference entirely to the edge device.
A YOLOv5n-seg model compiled for a Hailo-8L AI accelerator delivers real-time object detection on a Raspberry Pi 5, from which raw pixel buffers are immediately discarded after inference. A stateful trigger engine forwards minimal JSON event payloads to a locally hosted instance of Phi-3 Mini (3. 8B parameters, Q4_0 quantisation), which synthesises one-to-two sentence natural-language alerts for a human operator. No image data crosses the network boundary at any point; only the generated text alert is transmitted.
We describe the full system architecture and implementation, report measured inference latency and resource utilisation on the target hardware, and present representative generated alerts. The results demonstrate that combining a dedicated neural-network accelerator with an on-device large language model on a single-board computer is not only feasible but produces practically deployable, human-readable monitoring output while aligning with GDPR Art. 5(1)(c) by design.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CV
See more →Evi-Steer: Learning to Steer Biomedical Vision-Language Models through Efficient and Generalizable Evidential Tuning
Evi-Steer introduces a novel evidential tuning framework for BiomedCLIP, enabling efficient fine-tuning with only 0.11% parameter updates. It significantly enhances performance in few-shot learning and domain shifts across 15 biomedical imaging datasets, demonstrating robustness for clinical applications.