How to Build a Security Surveillance Camera Using Python OpenCV and ESP32 Cam

2,882

Last Updated on May 11, 2026 by Engr. Shahzada Fahad

Table of Contents

Python OpenCV & ESP32 Cam:

Python OpenCV & ESP32 Cam based DIY Security Surveillance Camera- By using Python, OpenCV, MediaPipe, and an ESP32 Cam module; we are going to create the most advanced DIY Security Camera or Surveillance Camera. This advanced security system eliminates the need for physically installing laser sensors and motion sensors.

Python OpenCV & ESP32 Cam

So, if anyone crosses this virtual laser, a buzzer connected to an Arduino board will be activated. Currently, I have used just one virtual laser, but you can use multiple virtual lasers if you wish. You can make rectangular, circular, and even irregular shapes to define a specific area. Then all you need is to track those landmarks within those particular areas.

Python OpenCV & ESP32 Cam

You can also place this virtual laser above a wall, so if an intruder comes over the wall, the buzzer will be activated. Not only that, you can define a specific area, and whenever someone enters or exits that area, the buzzer will be activated. The same concept can be applied to different objects and animals as well.

Python OpenCV & ESP32 Cam

It’s based on pose landmark detection to accurately identify and monitor individuals within a given space. Pose landmark detection involves identifying and tracking distinctive features of a person’s body pose, such as the positions of joints and body parts. By using this approach, we can achieve a significantly higher level of accuracy in recognizing and distinguishing individuals, even under challenging conditions such as low light. This not only enhances the effectiveness of security systems but also minimizes false positives and false negatives, reducing unnecessary alarms and improving overall operational efficiency.

By harnessing the potential of pose landmark detection technology, the Pose Landmarks based security system offers a significant advancement in security measures. Its ability to accurately identify and track individuals adds an extra layer of protection, leading to more reliable and efficient security systems. Join us as we explore the integration of ESP32 Cam and Python OpenCV to unlock the potential of pose landmark-based security and pave the way for a safer future.

I will talk about the pose landmarks and its keypoints that represent the positions of various joints and body parts within a human pose, later in this article.

Anyway, this project is entirely based on my previous two tutorials. In first tutorial “ESP32 Object detection and Identification”, I explained the most basic things, such as:

How to perform wireless live video streaming using the ESP32 Cam module?

How to install Python, OpenCV, and Yolo V3? and

How to detect and identify different objects?

In my studio, I detected and identified various objects, and not only did I identify and track birds and cats, but I also displayed alert messages on the screen.

In the second tutorial “ESP32 Cam based Car Barrier/Gate control system”, I created an automatic car barrier/gate opening and closing system. I used the ESP32 Camera module along with Python OpenCV Yolo V3 for car identification and tracking. In this project, I used two lines to control the car barrier. When the car crossed the first line, the barrier would open, and when the car crossed the second line, the barrier would close.

So, I have already explained all of these things, and I won’t repeat them today. Today, I will only explain new things. Including,

Pose landmarks.
How to install mediapipe, it’s a powerful library and provides easy-to-use Python APIs for various tasks, including landmark detection.
The Arduino Circuit diagram and programming. and finally
Python programming.

So, without any further delay let’s get started!!!

Check out our latest article on the new AI development board that’s changing the future of smart projects!

Amazon Links:

ESP32 Camera Module

ESP32 CAM W-BT Board

Arduino Nano USB C type (Recommended)

Disclosure: These are affiliate links. As an Amazon Associate I earn from qualifying purchases.

Types of Landmarks in Python:

First, let’s start with types of Landmarks. We have mainly three types of Landmarks.

Pose Landmarks.
Facial Landmarks. And
Hand Landmarks.

The Facial and Hand Landmarks I will explain and use in one of my upcoming videos. In this particular project we will only focus on Pose Landmarks.

Pose Landmarks:

Python OpenCV & ESP32 Cam

In Python, pose landmarks refer to the specific points or keypoints that represent the positions of various joints and body parts within a human pose. These landmarks are typically detected and tracked using computer vision techniques and libraries.

Pose estimation involves identifying and tracking the positions of joints, such as shoulders, elbows, wrists, hips, knees, and ankles, to infer the overall body pose and its orientation. By detecting and analyzing pose landmarks, it becomes possible to understand the structural configuration and movement of a person’s body.

Python OpenCV & ESP32 Cam

Python provides several libraries and frameworks that enable pose landmark detection and analysis. One popular library for this purpose is OpenCV (Open Source Computer Vision Library). OpenCV offers a range of functionalities, including pose estimation using pre-trained models and algorithms.

Another widely used library for pose estimation in Python is MediaPipe. MediaPipe provides a set of pre-trained models and tools specifically designed for landmark detection and tracking in real-time applications, including pose estimation.

Pose landmarks can be used for various applications, such as activity recognition, motion analysis, human-computer interaction, sports analytics, and augmented reality. By understanding the body pose and its movements, it becomes possible to develop applications that can respond to or analyze human actions and gestures.

Using Python’s libraries and tools for pose landmark detection, developers and researchers can build sophisticated applications that rely on understanding and interpreting human poses. These applications have the potential to revolutionize areas such as healthcare, gaming, sports, animation, and human-computer interaction.

Python OpenCV & ESP32 Cam

Anyway, we have got multiple keypoints and the good thing is we can detect and track any of these keypoints. You are free to use all the keypoints, or some of these keypoints, or a single keypoint it’s totally upto you. Well in my case, I am going to detect a specific landmark 31 on a person’s body using the Mediapipe library in python. So, whenever this landmark crosses a line or a virtual laser the buzzer is turned ON.

Python OpenCV & ESP32 Cam

If you learned how to track the X-axis and Y-axis location of a single keypoint on the body then you can do it for all these keypoints and then you would be able to detect and identify any pose.

Let’s say, you want to turn ON the buzzer when the person’s hand is up in the air and turn off the buzzer when the hand is down. For this we will track the X-axis and Y-axis location of any of these keypoints on the hand. So, when that particular landmark’s Y-axis location is above the keypoint 7 or any of these other keypoints then the buzzer will turn ON and when its Y-axis location is below the keypoint 23 the buzzer will be turned OFF.

Using this similar technique, you can find if the person is standing, sitting, or his arms are stretched or if the person is walking, and even you can count the biceps rips, and so on.

Now, let’s go ahead and take a look at the Arduino circuit diagram.

Buzzer interfacing with Arduino:

Python OpenCV & ESP32 Cam

The 5V buzzer is connected to the Arduino digital pin D8. I am using 2n2222 NPN transistor and a 10K ohm resistor as a driver to control the buzzer. Now, let’s go ahead and take a look at the Arduino programming.

Python Arduino Programming:

const int buzzerPin = 8; // Connect the buzzer to pin 8

void setup() {
  pinMode(buzzerPin, OUTPUT);
  Serial.begin(9600);
}

void loop() {
  if (Serial.available()) {
    char signal = Serial.read();
    if (signal == '1') {
      // Activate the buzzer
      digitalWrite(buzzerPin, HIGH);
     
    } else if (signal == '0') {
      // Deactivate the buzzer
      digitalWrite(buzzerPin, LOW);
    }
  }
}

const int buzzerPin = 8; // Connect the buzzer to pin 8

void setup() {

pinMode(buzzerPin, OUTPUT);

Serial.begin(9600);

}

void loop() {

if (Serial.available()) {

char signal = Serial.read();

if (signal == '1') {

// Activate the buzzer

digitalWrite(buzzerPin, HIGH);

} else if (signal == '0') {

// Deactivate the buzzer

digitalWrite(buzzerPin, LOW);

}

For this project you don’t need to add any library. You can see the buzzer is connected to the Arduino digital pin D8.

In the void setup() function, I set the buzzer as output using the pinMode() function and I also activated the serial communication and 9600 is the baud rate.

In the void loop() function, we constantly check the serial port, if the data is received from the Python and its available on the serial port then we simply read the serial port and store the received character in variable signal. Then using these two if condition we check; if the received character is 1 or 0. If its one then it means someone has crossed the line or virtual laser and the buzzer is turned ON. Else if there is no one. Then the python sends 0 to the Arduino and then the Arduino turns OFF the buzzer. So, that’s all about the Arduino programming.

ESP32 Cam with Python:

For the live video streaming, I am using ESP32 Cam module and as I said earlier, I am not going to explain how to setup your ESP32 camera for the live video streaming, because I have already explained it in my previous article based on the Object detection and identification using ESP32 Cam, Python, OpenCV, and Yolo V3. I am using the same setup and nothing has changed. In that article, I have also explained how to install Python and OpenCV. So, these are the things that I have already covered.

The only thing that I didn’t cover is the MediaPipe library installation. Because, in previous two protjects I used Yolo V3. This time round I am not using Yolo V3. I am just using Python, OpenCV, and MediaPipe. So, if you have installed Python and OpenCV then you can continue reading this article and if not then you can go back; read that article and after you have installed the Python and OpenCV then you can resume from here. Anyway, let’s go ahead and install MediaPipe library.

MediaPipe Library installation:

MediaPipe is a powerful library for building multimodal (audio, video, and sensor) applied machine learning pipelines. It provides easy-to-use Python APIs for various tasks, including landmark detection. You can install MediaPipe using pip with the command.

Simply open the command prompt on your PC or Laptop

Python OpenCV & ESP32 Cam

And paste the below code and press the enter button

pip install mediapipe

1	pip install mediapipe

Python OpenCV & ESP32 Cam

As you can see “Requirement already satisfied” because I have already installed it. Now, let’s go ahead and take a look at the Python program that detects and track a specific landmark and send commands to the Arduino to control a buzzer.

Python Landmarks programming:

import cv2
import mediapipe as mp
import numpy as np
import urllib.request
import serial

url = 'http://192.168.43.219/cam-hi.jpg'
# Open video file
cap = cv2.VideoCapture(0)

mpPose = mp.solutions.pose
pose = mpPose.Pose()
mpDraw = mp.solutions.drawing_utils

# Define line coordinates
line1_x1, line1_y1, line1_x2, line1_y2 = 400, 0, 400, 600  # Coordinates for line 1 (x1, y1, x2, y2)

# Establish serial connection with Arduino
ser = serial.Serial('COM5', 9600)  # Replace 'COM' with the appropriate port and baud rate

# Flag to track if the landmark has crossed the line
buzzer_active = False

while cap.isOpened():
    img_resp = urllib.request.urlopen(url,timeout=10)
    imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)
    im = cv2.imdecode(imgnp, -1)
    ret, frame = cap.read()
    if not ret:
        break

    imgRGB = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)
    results = pose.process(imgRGB)
    cv2.line(im, (line1_x1, line1_y1), (line1_x2, line1_y2), (0, 255, 0), 2)

    if results.pose_landmarks:
        mpDraw.draw_landmarks(im, results.pose_landmarks, mpPose.POSE_CONNECTIONS)
        landmark_31 = results.pose_landmarks.landmark[31]
        dot_x = int(landmark_31.x * im.shape[1])
        dot_y = int(landmark_31.y * im.shape[0])
        cv2.circle(im, (dot_x, dot_y), 10, (0, 99, 255), cv2.FILLED)

        # Draw dot on landmark 31
        if landmark_31.visibility > 0.5 and landmark_31.visibility > 0.5:
            if dot_x < line1_x1 and dot_x < line1_x2:
                
                if not buzzer_active:
                    ser.write(b'1')  # Send signal to activate buzzer
                    buzzer_active = True

                # Draw dot on landmark 31
        if landmark_31.visibility > 0.5 and landmark_31.visibility > 0.5:
            if dot_x > line1_x1 and dot_x > line1_x2:
                
                if buzzer_active:
                    ser.write(b'0')  # Send signal to activate buzzer
                    buzzer_active = False
        else:
            buzzer_active = False
            ser.write(b'0')  # Send signal to deactivate buzzer

    # Display the frame
    cv2.imshow("Video", im)
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

import cv2

import mediapipe as mp

import numpy as np

import urllib.request

import serial

url = 'http://192.168.43.219/cam-hi.jpg'

# Open video file

cap = cv2.VideoCapture(0)

mpPose = mp.solutions.pose

pose = mpPose.Pose()

mpDraw = mp.solutions.drawing_utils

# Define line coordinates

line1_x1, line1_y1, line1_x2, line1_y2 = 400, 0, 400, 600 # Coordinates for line 1 (x1, y1, x2, y2)

# Establish serial connection with Arduino

ser = serial.Serial('COM5', 9600) # Replace 'COM' with the appropriate port and baud rate

# Flag to track if the landmark has crossed the line

buzzer_active = False

while cap.isOpened():

img_resp = urllib.request.urlopen(url,timeout=10)

imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

im = cv2.imdecode(imgnp, -1)

ret, frame = cap.read()

if not ret:

break

imgRGB = cv2.cvtColor(im, cv2.COLOR_BGR2RGB)

results = pose.process(imgRGB)

cv2.line(im, (line1_x1, line1_y1), (line1_x2, line1_y2), (0, 255, 0), 2)

if results.pose_landmarks:

mpDraw.draw_landmarks(im, results.pose_landmarks, mpPose.POSE_CONNECTIONS)

landmark_31 = results.pose_landmarks.landmark[31]

dot_x = int(landmark_31.x * im.shape[1])

dot_y = int(landmark_31.y * im.shape[0])

cv2.circle(im, (dot_x, dot_y), 10, (0, 99, 255), cv2.FILLED)

# Draw dot on landmark 31

if landmark_31.visibility > 0.5 and landmark_31.visibility > 0.5:

if dot_x < line1_x1 and dot_x < line1_x2:

if not buzzer_active:

ser.write(b'1') # Send signal to activate buzzer

buzzer_active = True

# Draw dot on landmark 31

if landmark_31.visibility > 0.5 and landmark_31.visibility > 0.5:

if dot_x > line1_x1 and dot_x > line1_x2:

if buzzer_active:

ser.write(b'0') # Send signal to activate buzzer

buzzer_active = False

else:

buzzer_active = False

ser.write(b'0') # Send signal to deactivate buzzer

# Display the frame

cv2.imshow("Video", im)

if cv2.waitKey(1) & 0xFF == ord('q'):

break

cap.release()

cv2.destroyAllWindows()

Code Explanation:

This code is designed to detect a specific landmark (landmark 31) on a person’s body using the Mediapipe library in Python. The purpose is to activate a buzzer when the landmark crosses a defined line which I call as a virtual laser. Anyway, Let’s go through the code step by step.

First, the necessary libraries are imported: cv2 for computer vision operations, mediapipe for pose estimation, numpy for numerical operations, urllib.request for opening a URL, and serial for establishing a serial connection with an Arduino.

Next, the URL for the video feed is specified. In this case, it is set to this url ‘http://192.168.43.219/cam-hi.jpg’, which suggests that the code is accessing a video stream from an IP camera.

A video capture object is created using cv2.VideoCapture(0) to access the default camera of the device.

The code then initializes the mpPose object for pose estimation using Mediapipe and creates an instance of the pose estimation model.

The line coordinates for the line that needs to be crossed are defined. In this case, it is set to (200, 0) and (200, 600), indicating a vertical line starting at x-coordinate 200 and spanning the full height of 600 pixels.

A serial connection is established with an Arduino board using the serial.Serial function. Make sure you select the correct communication port and baud rate. You can check in the device manager, which port your Arduino board is connected to. In my case, its connected to COM5.

A boolean variable buzzer_active is initialized as False. This variable keeps track of whether the landmark has crossed the line or not.

Inside the main loop, the code retrieves an image from the specified URL using urllib.request.urlopen and converts it to a NumPy array.

The captured frame from the video feed is read using cap.read(). If the frame is not successfully captured, the loop breaks.

The RGB image is obtained by converting the captured frame from BGR to RGB using cv2.cvtColor.

The pose estimation model processes the RGB image to detect the landmarks using pose.process. The detected landmarks are stored in the results variable.

The line is drawn on the image using cv2.line based on the defined line coordinates.

If there are pose landmarks detected in the results, the code proceeds to draw the landmarks on the image using mpDraw.draw_landmarks.

The specific landmark of interest, landmark 31, is extracted, and its x and y coordinates on the image are calculated. A circle is drawn on the image at the location of landmark 31 using cv2.circle.

If the visibility of landmark 31 is greater than 0.5, it means the landmark is clearly visible. The code checks if the x-coordinate of the landmark is less than both x-coordinates of the line. If this condition is met, it means the landmark has crossed the line.

If the buzzer_active variable is False, the code sends the signal ‘1’ to the Arduino board to turn ON the buzzer and the Buzzer_active flag is set to True.

If the visibility of landmark 31 is below the threshold or the x-coordinate is not less than both x-coordinates of the line, it means the landmark has not crossed the line. In this case, the buzzer_active variable is set to False, and the code sends the signal ‘0’ to the Arduino board to turn OFF the buzzer.

The resulting image with drawn landmarks and the line is displayed using cv2.imshow.

The loop continues until the ‘q’ key is pressed, at this point the video capture is released using cap.release() and all windows are closed using cv2.destroyAllWindows().

So, that’s all about the programming. For the practical demonstration; watch the video tutorial and don’t forget to like, share, and Subscribe; if you don’t want to miss any of my upcoming videos and articles.