Sitemap

Advanced Image Analysis via Camera: Person Detection, Gaze Direction, and Clothing Color Analysis using WebRTC, TensorFlow.js, and JavaScript

5 min readSep 7, 2024
camera analysis

Introduction

In today’s technology landscape, image processing and computer vision applications are gaining increasing importance. In this article, we will examine the process of developing a browser-based application that analyzes camera footage to detect the number of people, determine their gaze directions, and analyze the color of their clothing.

In this project, we will harness the power of modern web technologies to perform complex image analysis tasks. The key technologies we will use are:

  1. WebRTC: For camera access
  2. TensorFlow.js: For object detection and image analysis
  3. JavaScript: To manage all logic and operations

Project Objectives

The main objectives of this project are:

  1. Capturing live footage from a web camera
  2. Detecting the number of people in the image
  3. Determining the gaze direction of detected individuals
  4. Analyzing the dominant color of the clothing worn by detected individuals

While achieving these objectives, we will utilize the capabilities of the web browser to their fullest extent and perform all analyses on the client-side without requiring any server-side processing.

Technical Infrastructure

HTML Structure

The HTML structure forming the foundation of our project is quite simple. It contains a video element, a canvas element, and a paragraph element to display information.

<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Camera Analysis</title>
<script src="<https://cdn.jsdelivr.net/npm/@tensorflow/tfjs>"></script>
<script src="<https://cdn.jsdelivr.net/npm/@tensorflow-models/coco-ssd>"></script>
<style>
#videoElement {
width: 100%;
max-width: 600px;
}
canvas {
position: absolute;
top: 0;
left: 0;
}
</style>
</head>
<body>
<video id="videoElement" autoplay></video>
<canvas id="outputCanvas"></canvas>
<p id="info"></p>
<script src="script.js"></script>
</body>
</html>

This structure contains all the necessary elements to display the camera footage, draw analysis results, and print information.

Application Logic with JavaScript

The heart of our application beats in our JavaScript code. Let’s examine step by step how this code works.

1. Camera Access

Our first step is to access the user’s camera. We do this using the WebRTC API:

navigator.mediaDevices.getUserMedia({ video: true })
.then(stream => {
video.srcObject = stream;
video.play();
detectObjects();
})
.catch(error => {
console.error("Camera access error: ", error);
});

This code snippet requests camera access permission from the user and, when granted, connects the camera stream to the video element.

2. Object Detection

After obtaining the camera footage, we perform object detection using TensorFlow.js’s Coco-SSD model:

async function detectObjects() {
const model = await cocoSsd.load();
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;

setInterval(async () => {
context.drawImage(video, 0, 0, canvas.width, canvas.height);
const predictions = await model.detect(video);
drawPredictions(predictions);
}, 500);
}

This function loads the Coco-SSD model and analyzes the camera footage every 500 milliseconds. The detected objects are stored in the predictions array.

3. Processing Detection Results

We use the drawPredictions function to process and visualize the detected objects:

function drawPredictions(predictions) {
context.clearRect(0, 0, canvas.width, canvas.height);

predictions.forEach(prediction => {
if (prediction.class === 'person') {
const [x, y, width, height] = prediction.bbox;
info.innerText = `Number of people: ${predictions.length}`;
const clothingColor = getClothingColor(x, y + height * 0.75, width, height * 0.25);
info.innerText += `, Clothing Color: ${clothingColor}`;
const lookingDirection = getLookingDirection(prediction);
info.innerText += `, Gaze Direction: ${lookingDirection}`;
context.strokeStyle = "#00FF00";
context.lineWidth = 4;
context.strokeRect(x, y, width, height);
}
});
}

This function draws a box for each detected person, analyzes the clothing color, and determines the gaze direction.

4. Clothing Color Analysis

To analyze the clothing color, we calculate the average color of pixels in the lower part of the detected person:

function getClothingColor(x, y, width, height) {
const imageData = context.getImageData(x, y, width, height);
const data = imageData.data;
let r = 0, g = 0, b = 0;

for (let i = 0; i < data.length; i += 4) {
r += data[i];
g += data[i + 1];
b += data[i + 2];
}
const pixelCount = data.length / 4;
r = Math.round(r / pixelCount);
g = Math.round(g / pixelCount);
b = Math.round(b / pixelCount);
return `rgb(${r}, ${g}, ${b})`;
}

This function determines the dominant color by taking the average of RGB values of pixels in a specific area.

5. Gaze Direction Detection

We use a simple approach to determine the gaze direction. We look at the position of the center point of the person’s face relative to the center of the camera image:

function getLookingDirection(personPrediction) {
const faceCenterX = personPrediction.bbox[0] + (personPrediction.bbox[2] / 2);
const canvasCenterX = canvas.width / 2;

return faceCenterX < canvasCenterX ? 'Left' : 'Right';
}

While this approach is simple, it can be improved by using more complex face analysis algorithms.

Challenges and Solutions

Some challenges we may encounter while developing this project and their solutions are:

  1. Performance Improvements: Continuous image analysis can cause issues, especially on low-performance devices. To solve this problem, we can adjust the frequency of analysis or only perform analysis under certain conditions.
  2. Lighting Conditions: Different lighting conditions can affect color analysis. Color normalization techniques can be used to solve this problem.
  3. Multiple Person Detection: When there are multiple people, separate analysis may be required for each. In this case, a separate analysis loop can be created for each detected person.
  4. Privacy Concerns: Camera usage can lead to privacy concerns. It’s important to inform users and obtain necessary permissions.

Future Developments

This project has many interesting areas for development:

  1. Face Recognition: By adding TensorFlow.js face recognition models, we can identify and track individuals.
  2. Emotion Analysis: We can detect people’s emotions by analyzing facial expressions.
  3. Motion Tracking: By tracking people’s movements, we can detect specific movements or gestures.
  4. Augmented Reality: We can create AR experiences by adding virtual objects on top of detected objects.

Conclusion

In this project, we developed a complex image analysis application using modern web technologies. We provided camera access with WebRTC, performed object detection with TensorFlow.js, and processed all this data with JavaScript to obtain meaningful results.

Such applications can be used in a wide range of areas, from security systems to interactive advertisements, from educational applications to entertainment applications. With the continuous development of web technologies, the capabilities and usage areas of such applications will continue to expand.

Remember that when developing image analysis and artificial intelligence applications, it is extremely important to consider ethical rules and respect user privacy. Adopting a responsible and sensitive approach while using the power of technology will ensure that our projects are beneficial to society.

--

--

No responses yet