Wait, I don't need OpenAI?!

This is a live article that I’ll be updating from time to time. When I first read the feature doc I was presented with, my first thought was “MidJourney and Hailuo” should be able to do this, but while it’s super easy to forget because of how awesome the many new models are, I had to go back to “older” libraries to build out the feature. Thanks to the countless articles, blogs, and videos (and maybe a few late-night snacks) that helped me get this far, I hope this write-up will be a guiding light for anyone else diving into rediscovering AI past API calls to the current big 3 models.

Introduction

Facial recognition and augmentation technologies have seen significant advancements, enabling applications in entertainment, healthcare, security, and more. In this article, we’ll explore how to build an application that detects emotions and applies visual effects to specific facial features using OpenCV, MediaPipe, and DeepFace.

Our focus will be on verifying a user’s emotion and, upon confirmation, applying a visual effect—such as recoloring the lips or eyes—to enhance the facial image.

Prerequisites

Before diving in, ensure you have the following installed: Python 3.8 or later OpenCV (opencv-python and opencv-contrib-python) MediaPipe DeepFace NumPy Pillow FastAPI Uvicorn

Install the required packages using pip:

pip install opencv-python opencv-contrib-python mediapipe deepface numpy Pillow fastapi uvicorn

Setting Up the Application
We’ll use FastAPI to create a web application that can handle video uploads and processing. This allows for easy integration with web interfaces or other services.


from fastapi import FastAPI, File, UploadFile
from fastapi.responses import StreamingResponse, JSONResponse
import uvicorn
import cv2
import numpy as np
import mediapipe as mp
from deepface import DeepFace
from PIL import Image
import tempfile
import io

app = FastAPI()

Initializing MediaPipe Face Mesh
MediaPipe provides a robust solution for detecting facial landmarks, which is crucial for applying precise visual effects.

mp_face_mesh = mp.solutions.face_mesh;
face_mesh = mp_face_mesh.FaceMesh(
  (static_image_mode = False),
  (max_num_faces = 1),
  (min_detection_confidence = 0.5),
  (min_tracking_confidence = 0.5)
);

Defining Facial Landmarks
Facial landmarks are specific points on the face that correspond to different facial features. We’ll define landmarks for features we want to manipulate. For lip recoloring, we’ll use the following indices:

LIPS = [
  61, 146, 91, 181, 84, 17, 314, 405, 321, 375, 291, 308, 324, 318, 402, 317,
  14, 87, 178, 88, 95,
];

Emotion Verification with DeepFace
Before applying any effects, we’ll verify the user’s emotion using DeepFace. This ensures that the effect is applied only when the desired emotion is detected.

def verify_emotion(frame, target_emotion='happy'):
    try:
        rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        result = DeepFace.analyze(
            rgb_frame, actions=['emotion'], enforce_detection=False
        )
        dominant_emotion = result.get('dominant_emotion', '').lower()
        return dominant_emotion == target_emotion.lower()
    except Exception as e:
        print(f"Emotion detection error: {e}")
        return False

Creating a Mask for Facial Features
To apply effects accurately, we’ll create a mask for the targeted facial feature.

def create_feature_mask(landmarks, indices, image_shape):
    h, w = image_shape[:2]
    points = [
        (int(landmarks.landmark[i].x * w), int(landmarks.landmark[i].y * h))
        for i in indices
    ]
    mask = np.zeros((h, w), dtype=np.uint8)
    cv2.fillPoly(mask, [np.array(points, dtype=np.int32)], 255)
    return mask

Applying Visual Effects

Recoloring the Lips With the mask created, we can now apply a color overlay to the lips.

def apply_lip_color(frame, mask, color=(0, 0, 255)):
    colored_lips = np.zeros_like(frame)
    colored_lips[:] = color
    colored_lips = cv2.bitwise_and(colored_lips, colored_lips, mask=mask)
    # Blend the colored lips with the original image
    output = cv2.addWeighted(frame, 1.0, colored_lips, 0.4, 0)
    return output

Complete Lip Recoloring Function

def recolor_lips(frame, landmarks):
    mask = create_feature_mask(landmarks, LIPS, frame.shape)
    result_frame = apply_lip_color(frame, mask)
    return result_frame

Processing Video Frames
We’ll read the video frames, verify the emotion, and apply the visual effect to each frame.

def process_video(frames, target_emotion='happy'):
    processed_frames = []
    for frame in frames:
        if verify_emotion(frame, target_emotion):
            rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
            results = face_mesh.process(rgb_frame)
            if results.multi_face_landmarks:
                landmarks = results.multi_face_landmarks[0]
                frame = recolor_lips(frame, landmarks)
        processed_frames.append(frame)
    return processed_frames

Reading and Writing Videos
Reading Video Frames

def read_video_frames(video_path):
    cap = cv2.VideoCapture(video_path)
    frames = []
    ret, frame = cap.read()
    while ret:
        frames.append(frame)
        ret, frame = cap.read()
    cap.release()
    return frames

Writing Video Frames

def write_video(frames, output_path, fps=24):
    height, width, _ = frames[0].shape
    fourcc = cv2.VideoWriter_fourcc(*'mp4v')
    out = cv2.VideoWriter(output_path, fourcc, fps, (width, height))
    for frame in frames:
        out.write(frame)
    out.release()

API Endpoint for Video Processing
We’ll create an endpoint that accepts a video file, processes it, and returns the modified video.

@app.post("/apply-lipstick")
async def apply_lipstick_effect(video: UploadFile = File(...)):
    # Save uploaded video to a temporary file
    with tempfile.NamedTemporaryFile(delete=False, suffix=".mp4") as temp_video_file:
        temp_video_path = temp_video_file.name
        temp_video_file.write(await video.read())

    # Read video frames
    frames = read_video_frames(temp_video_path)

    if not frames:
        return JSONResponse(content={"error": "No frames found in the video."}, status_code=400)

    # Process frames
    processed_frames = process_video(frames, target_emotion='happy')

    # Save processed video
    processed_video_path = temp_video_path.replace(".mp4", "_processed.mp4")
    write_video(processed_frames, processed_video_path)

    # Return the processed video
    def iterfile():
        with open(processed_video_path, mode="rb") as file_like:
            yield from file_like

    return StreamingResponse(iterfile(), media_type="video/mp4")

To run the FastAPI application, use the following command:

uvicorn main:app --host 0.0.0.0 --port 8000

Conclusion
In this article, we’ve built a foundational application capable of emotion detection and facial feature augmentation. By combining OpenCV for image processing, MediaPipe for facial landmark detection, and DeepFace for emotion analysis, we can create engaging visual effects that respond to the user’s emotional state. Understanding these tools and how they interact allows for the development of more advanced features and applications in the future.

I will continue to refine and expand this article as I make progress.

And remember, you can’t learn everything at once, one line of code at a time!