In so many movies, there’s a tired security guard sitting in a dark room, watching dozens of monitors. But what if an AI could watch those streams instead? An AI that could alert the guard when someone appears, warn if that person is unknown, and—if it’s a real person (not just a photo on a smartphone)—automatically capture high-quality pictures to build a profile.
What if this AI could track individuals as they move within a single camera's view, ensuring they are recognized consistently over time?
That’s the kind of system we wanted to build—not for production, but to see what’s possible using modern tools like YOLO and InsightFace.
👀 Why We Built It
Because we believe learning by doing is the best way to grow.
We weren’t aiming to ship a product or solve a client's problem. We simply wanted to explore: What can modern face recognition models truly do in real-time? How far can we push the boundaries by combining powerful open-source tools?
This project grew out of that curiosity. What began as a playful experiment quickly turned into a complete processing pipeline—one that detects people, recognizes faces, verifies liveness through blink detection, tracks individuals within a stream, and even prunes its own memory to stay sharp.
We built this as tech enthusiasts, eager to discover what's possible. The answer is: a lot.
Seriously—give it a try. Just keep in mind: this is a proof of concept, not production-ready software.
And yes, to save time, we made extensive use of AI-assisted coding, so don’t blame only us for any mistakes or things we could’ve done better. 😉
🔧 How It Works: The Pipeline
- At a high level, here’s what happens under the hood:
- Video Input: The system connects to RTSP streams or local video files using OpenCV.
- Person Detection with YOLO: Each frame is analyzed with a YOLOv8 model to find people.
- Face Analysis with InsightFace: For each person detected, faces are located, and key features (embeddings and landmarks) are extracted.
- Face Recognition: The facial embedding is compared to a database of known profiles using cosine similarity.
- Liveness Detection: A custom blink detection mechanism, based on the Eye Aspect Ratio (EAR), is used to prevent simple spoofing attacks (like holding up a photo).
- Automatic Profile Management: New profiles are created for unknown individuals who pass the liveness check. Existing profiles are continuously refined with new, high-quality face samples.
⚙️ Deep Dive: Real-Time Pipeline, Profile Pruning, and Decision Logic
This system wanted to be something is more than just “detect and recognize.” It’s designed for multi-stream, multi-person, and dynamic environments, where people and conditions constantly change.
🎥 Multi-Stream Processing
The pipeline supports multiple video sources (like RTSP cameras or local files). Each stream is handled using concurrent threads. A global frame_queue and a save_queue decouple processing steps:
- Frame acquisition (e.g., from multiple RTSP streams)
- Detection and recognition (YOLO + InsightFace)
- I/O operations (like saving cropped faces or updating profile data)
This way, one slow stream won’t block others — it’s designed for scalability and real-time performance.
📊 Status and State History
To keep track of what the system sees, it generates a status JSON containing:
{
"timestamp": "2025-07-14T22:31:55.211615",
"streams": {
"0": {
"persons": [
{
"id": "user_1752443324",
"type": "known",
"name": "user_1752443324",
"confidence": 0.8560171127319336
}
]
}
}
}
This status is:
- Updated every time a change is detected in the streams
- Saved atomically using thread-safe locks
- Major changes are saved in a history file
This enables audit logs, history visualization, and stream debugging — and could be extended to emit real-time alerts or update a dashboard.
🧬 Face Recognition and Similarity
Once a face is detected via YOLO, it is:
- Aligned to a canonical view (eyes horizontal, normalized scale)
- Encoded into a 512-dim vector (embedding) by InsightFace
- Compared with saved profiles using cosine similarity
If the similarity exceeds a threshold (e.g., 0.6), the person is considered known. Otherwise, if the face passed the liveness check (blink detection), a new profile is created.
🌱 Profile Promotion and Quality Score
Each profile is a folder containing:
- .pkl file with embeddings
- Saved cropped images of the face
- A JSON metadata file with profile history
When a new face is seen multiple times:
- If the face is sharp, well-lit, and consistent → it’s promoted to the profile DB.
- Quality is evaluated using a custom function (calculate_face_quality()), analyzing sharpness, contrast, and facial alignment.
Only high-quality images are retained, and poor ones are discarded.
🧹 Pruning Logic: Keep it Clean
Periodically, a dedicated thread (file_io_and_pruning_worker) scans all profiles and:
- Keeps only the top-N embeddings (e.g., best 5)
- Deletes low-quality or redundant samples
- Removes profiles with too few valid detections (e.g., someone passed quickly once and never blinked again)
This prevents the profile database from ballooning and helps ensure recognition stays fast and accurate.
🔄 Dynamic Updates
Every time a user is seen again:
- Their profile is updated with the new embedding if it’s of better quality
- The system performs a smart merge, keeping the best representations of the person over time
- Embeddings are cached to speed up comparison when many profiles are active
This creates a kind of living memory, where each profile evolves over time — getting sharper and more robust with use.
🧠 Under the Hood: Key Tech Explained
-
YOLO (You Only Look Once): A fast object detection model that’s ideal for real-time face localization. We use a lightweight YOLOv5 variant to balance speed and accuracy, allowing for efficient detection even on consumer-grade GPUs or Raspberry Pi setups with acceleration.
-
InsightFace: A face recognition library that converts faces into embeddings — numerical vectors that represent facial features. We store these embeddings in a local profile database (files and folders) and use cosine similarity to determine matches.
-
Blink Detection: Liveness is verified using the Eye Aspect Ratio (EAR). For each detected face, we extract eye landmarks using a pre-trained facial landmarks model. If the EAR drops below a certain threshold for a brief period, it’s considered a blink. This rules out static spoofing attempts like printed photos.
🙏 Credits to the Giants
None of this would have been possible without the incredible work behind two major open-source tools:
- YOLO (You Only Look Once) – for real-time object detection. We used it to efficiently localize faces in live video streams. Special thanks to the Ultralytics team for making YOLOv5 accessible and powerful.
- InsightFace – for state-of-the-art face recognition and facial landmark analysis. InsightFace provides robust embeddings that make face comparison incredibly accurate. Explore their amazing work at InsightFace GitHub.
We’re simply connecting pieces. The hard part — the deep models — was already done by brilliant people. Thanks for sharing them with the world.
⚙️ Getting Started
- Clone the repo
git clone https://github.com/your-repo/face-liveness-system
cd face-liveness-system
- Install dependencies
pip install -r requirements.txt
- Configure your
.env
file - Run the main pipeline
python main.py
You’ll start seeing video frames with boxes, landmarks, names, and blinks detected — all in real time.
🛡️ Privacy Concerns
This project is a proof of concept built for educational and experimental purposes. We created it to play with exciting technologies like YOLO for object detection and the excellent open-source contributions from InsightFace. It’s a learning exercise, not a product.
You're free to use, modify, and build upon this code. However, always respect privacy laws and ethical standardswhen working with face recognition and personal data. Any use of this system in real environments must comply with local regulations such as GDPR or similar.
Faces are biometric data — treat them with care.
🌐 Try It or Fork It
Check out the GitHub repository and make it yours.
Let us know if you build something cool on top of it!
Built with ❤️ by the MojaLab team.
Disclaimer: At MojaLab, we aim to provide accurate and useful content, but hey, we’re human (well, mostly)! If you spot an error, have questions, or think something could be improved, feel free to reach out—we’d love to hear from you. Use the tutorials and tips here with care, and always test in a safe environment. Happy learning!!!
No AI was mistreated in the making of this tutorial—every LLM was used with the respect it deserves.