Real-Time Face Recognition with YOLO & InsightFace

In so many movies, there’s a tired security guard sitting in a dark room, watching dozens of monitors. But what if an AI could watch those streams instead? An AI that could alert the guard when someone appears, warn if that person is unknown, and—if it’s a real person (not just a photo on a smartphone)—automatically capture high-quality pictures to build a profile.

What if this AI could track individuals as they move within a single camera's view, ensuring they are recognized consistently over time?

That’s the kind of system we wanted to build—not for production, but to see what’s possible using modern tools like YOLO and InsightFace.

👀 Why We Built It

Because we believe learning by doing is the best way to grow.

We weren’t aiming to ship a product or solve a client's problem. We simply wanted to explore: What can modern face recognition models truly do in real-time? How far can we push the boundaries by combining powerful open-source tools?

This project grew out of that curiosity. What began as a playful experiment quickly turned into a complete processing pipeline—one that detects people, recognizes faces, verifies liveness through blink detection, tracks individuals within a stream, and even prunes its own memory to stay sharp.

We built this as tech enthusiasts, eager to discover what's possible. The answer is: a lot.

Seriously—give it a try. Just keep in mind: this is a proof of concept, not production-ready software.

And yes, to save time, we made extensive use of AI-assisted coding, so don’t blame only us for any mistakes or things we could’ve done better. 😉

🚀 Complete Code on GitHub

Workin code and detailed documentation are available open-source

View Repository

🔧 How It Works: The Pipeline

At a high level, here’s what happens under the hood:
1. Video Input: The system connects to RTSP streams or local video files using OpenCV.
2. Person Detection with YOLO: Each frame is analyzed with a YOLOv8 model to find people.
3. Face Analysis with InsightFace: For each person detected, faces are located, and key features (embeddings and landmarks) are extracted.
4. Face Recognition: The facial embedding is compared to a database of known profiles using cosine similarity.
5. Liveness Detection: A custom blink detection mechanism, based on the Eye Aspect Ratio (EAR), is used to prevent simple spoofing attacks (like holding up a photo).
6. Automatic Profile Management: New profiles are created for unknown individuals who pass the liveness check. Existing profiles are continuously refined with new, high-quality face samples.

⚙️ Deep Dive: Real-Time Pipeline, Profile Pruning, and Decision Logic

This system wanted to be something is more than just “detect and recognize.” It’s designed for multi-stream, multi-person, and dynamic environments, where people and conditions constantly change.

🎥 Multi-Stream Processing

The pipeline supports multiple video sources (like RTSP cameras or local files). Each stream is handled using concurrent threads. A global frame_queue and a save_queue decouple processing steps:

Frame acquisition (e.g., from multiple RTSP streams)
Detection and recognition (YOLO + InsightFace)
I/O operations (like saving cropped faces or updating profile data)

This way, one slow stream won’t block others — it’s designed for scalability and real-time performance.

📊 Status and State History

To keep track of what the system sees, it generates a status JSON containing:

{
    "timestamp": "2025-07-14T22:31:55.211615",
    "streams": {
        "0": {
            "persons": [
                {
                    "id": "user_1752443324",
                    "type": "known",
                    "name": "user_1752443324",
                    "confidence": 0.8560171127319336
                }
            ]
        }
    }
}

This status is:

Updated every time a change is detected in the streams
Saved atomically using thread-safe locks
Major changes are saved in a history file

This enables audit logs, history visualization, and stream debugging — and could be extended to emit real-time alerts or update a dashboard.

🧬 Face Recognition and Similarity

Once a face is detected via YOLO, it is:

Aligned to a canonical view (eyes horizontal, normalized scale)
Encoded into a 512-dim vector (embedding) by InsightFace
Compared with saved profiles using cosine similarity

If the similarity exceeds a threshold (e.g., 0.6), the person is considered known. Otherwise, if the face passed the liveness check (blink detection), a new profile is created.

🌱 Profile Promotion and Quality Score

Each profile is a folder containing:

.pkl file with embeddings
Saved cropped images of the face
A JSON metadata file with profile history

When a new face is seen multiple times:

If the face is sharp, well-lit, and consistent → it’s promoted to the profile DB.
Quality is evaluated using a custom function (calculate_face_quality()), analyzing sharpness, contrast, and facial alignment.

Only high-quality images are retained, and poor ones are discarded.

🧹 Pruning Logic: Keep it Clean

Periodically, a dedicated thread (file_io_and_pruning_worker) scans all profiles and:

Keeps only the top-N embeddings (e.g., best 5)
Deletes low-quality or redundant samples
Removes profiles with too few valid detections (e.g., someone passed quickly once and never blinked again)

This prevents the profile database from ballooning and helps ensure recognition stays fast and accurate.

🔄 Dynamic Updates

Every time a user is seen again:

Their profile is updated with the new embedding if it’s of better quality
The system performs a smart merge, keeping the best representations of the person over time
Embeddings are cached to speed up comparison when many profiles are active

This creates a kind of living memory, where each profile evolves over time — getting sharper and more robust with use.

🧠 Under the Hood: Key Tech Explained

YOLO (You Only Look Once): A fast object detection model that’s ideal for real-time face localization. We use a lightweight YOLOv5 variant to balance speed and accuracy, allowing for efficient detection even on consumer-grade GPUs or Raspberry Pi setups with acceleration.
InsightFace: A face recognition library that converts faces into embeddings — numerical vectors that represent facial features. We store these embeddings in a local profile database (files and folders) and use cosine similarity to determine matches.
Blink Detection: Liveness is verified using the Eye Aspect Ratio (EAR). For each detected face, we extract eye landmarks using a pre-trained facial landmarks model. If the EAR drops below a certain threshold for a brief period, it’s considered a blink. This rules out static spoofing attempts like printed photos.

🙏 Credits to the Giants

None of this would have been possible without the incredible work behind two major open-source tools:
- YOLO (You Only Look Once) – for real-time object detection. We used it to efficiently localize faces in live video streams. Special thanks to the Ultralytics team for making YOLOv5 accessible and powerful.
- InsightFace – for state-of-the-art face recognition and facial landmark analysis. InsightFace provides robust embeddings that make face comparison incredibly accurate. Explore their amazing work at InsightFace GitHub.
We’re simply connecting pieces. The hard part — the deep models — was already done by brilliant people. Thanks for sharing them with the world.

⚙️ Getting Started

Clone the repo

git clone https://github.com/doradame/Person-Detection--Face-Recognition
cd face-liveness-system

Install dependencies

pip install -r requirements.txt

Configure your .env file
Run the main pipeline

python main.py

You’ll start seeing video frames with boxes, landmarks, names, and blinks detected — all in real time.

🛡️ Privacy Concerns

This project is a proof of concept built for educational and experimental purposes. We created it to play with exciting technologies like YOLO for object detection and the excellent open-source contributions from InsightFace. It’s a learning exercise, not a product.

You're free to use, modify, and build upon this code. However, always respect privacy laws and ethical standardswhen working with face recognition and personal data. Any use of this system in real environments must comply with local regulations such as GDPR or similar.

Faces are biometric data — treat them with care.

🌐 Try It or Fork It

Check out the GitHub repository and make it yours.

Let us know if you build something cool on top of it!

Built with ❤️ by the MojaLab team.

Disclaimer: At MojaLab, we aim to provide accurate and useful content, but hey, we’re human (well, mostly)! If you spot an error, have questions, or think something could be improved, feel free to reach out—we’d love to hear from you. Use the tutorials and tips here with care, and always test in a safe environment. Happy learning!!!

No AI was mistreated in the making of this tutorial—every LLM was used with the respect it deserves.

MojaLab shares tutorials, examples, and discoveries in coding, AI, and system administration for tech enthusiasts, sysadmins, and beginners.

Person Detection and Face Recognition with Liveness Detection — A Python Project Using Powerful AI Models