AI Video Proctoring System
A browser-native AI proctoring system built on TensorFlow.js and MediaPipe Face Mesh. Real-time face detection, gaze estimation, and head-pose anomaly detection running entirely inside the candidate's browser tab. No video data ever leaves the device. No GPU server required.
94%+
Face Detection Accuracy
<30ms
Per-Frame Inference
91%
Anomaly Recall
0 bytes
Video Transmitted

The Problem We Set Out to Solve
Online examinations exploded post-pandemic, but the proctoring solutions available in 2024 fell into one of two broken categories: enterprise platforms charging $5–$15 per session with invasive screen recording, or basic honor-system forms with zero integrity enforcement. For Indian EdTech platforms and corporate recruitment teams running hundreds of assessments daily, neither option was viable.
The challenge DarsLab accepted was to build a production-ready AI proctoring system that operates entirely inside the browser using open-source ML models, with no video data transmission, no per-session cost, and no invasive desktop agent installation. The result had to match enterprise-grade cheating detection accuracy while being deployable on any MERN stack web application.
We built a full MERN application where the exam session React component simultaneously renders the webcam feed, runs MediaPipe Face Mesh at 30fps, and feeds 468 facial landmark coordinates into a TensorFlow.js gaze estimation model, all without a single GPU server in the loop.
How the Browser-Native AI Pipeline Works
MediaPipe Face Mesh (Face Detection)
The webcam stream is captured via the browser's getUserMedia API and piped into MediaPipe's BlazeFace detector, which runs a lightweight two-stage model: a lightweight anchor-based face detector followed by a 468-point 3D landmark regression network. This extracts the precise 3D position of every facial feature at 30fps with sub-5ms latency per frame on modern hardware.
TensorFlow.js Gaze Estimation Model
The 468 landmark coordinates are passed into a custom TensorFlow.js model that computes normalized eye-corner vectors and pupil centroid positions. These are compared against a baseline "focus" gaze established at session start. Deviation beyond configurable thresholds (horizontal: 25 degrees, vertical: 15 degrees) triggers an anomaly event. The WebGL backend processes this under 25ms using the client GPU.
Client-Side Anomaly Classification
Four anomaly types are classified locally: face_absent (no face detected for 3+ seconds), gaze_deviation (sustained eye movement off-screen), secondary_presence (two faces detected for 15+ consecutive frames), and head_pose_extreme (head rotation greater than 30 degrees). Each classification runs a frame-count threshold to eliminate false positives from natural eye movements.
Structured Event API and MongoDB Storage
When an anomaly is classified, a structured JSON event object containing session ID, candidate ID, event type, confidence score, timestamp, and frame count is AES-256 encrypted and sent to the Node.js/Express API via a secure WebSocket. No video data is included. MongoDB stores events with session and user references for post-exam admin review.
Technical Note: Why WebGL Backend
TensorFlow.js supports three compute backends: CPU, WebGL, and WebGPU. Our performance benchmarks showed the WebGL backend outperforming CPU by 11x on the gaze estimation model (from 330ms to 28ms per inference). WebGPU offers further speedups but lacks browser support coverage for production deployment as of 2024. The WebGL backend is supported on all Chromium-based browsers, Safari 14+, and Firefox 90+.
What Made This Hard to Build
30fps Inference Without Video Frame Drops
Running a multi-stage ML pipeline (face detection, landmark extraction, gaze estimation) at 30fps inside a single React component while rendering the live webcam feed required aggressive model quantization. We used 8-bit integer quantization on the gaze model, reducing it from 4.2MB to 1.1MB with only a 1.3% accuracy degradation. We also offloaded inference scheduling to a Web Worker to keep the main thread available for UI rendering.
Ambient Light Variance and Webcam Quality
MediaPipe Face Mesh degrades significantly in poor lighting conditions, leading to false positive face_absent events. We implemented a client-side lux estimation step using the canvas API to analyze webcam frame brightness histograms. If mean luminance falls below a calibrated threshold, the session onboarding flow displays an explicit warning and refuses to start the exam until lighting is adequate.
Multi-Face Detection False Positives
Brief reflections in glasses, posters on walls, and ambient screen light could trigger secondary_presence events incorrectly. Our solution was a 15-frame persistence threshold: the secondary face must be detected with confidence greater than 0.85 for 15 consecutive frames (0.5 seconds at 30fps) before flagging. This eliminated all false positives from reflections in our test dataset while maintaining 100% detection of genuine secondary persons.
JWT Session Security Across WebSocket and REST
The exam session requires two simultaneous authenticated connections: a REST API for exam content delivery and a WebSocket for real-time anomaly event streaming. Both connections share the same JWT token, which carries session ID, candidate ID, and exam ID as custom claims. Token refresh logic was implemented to handle sessions longer than the token expiry window without disrupting the ML inference pipeline.
The Live System


Performance Outcomes
94%+
Detection Accuracy
MediaPipe Face Mesh accuracy in standard webcam environments across varied lighting conditions.
<30ms
Inference Latency
WebGL-accelerated TensorFlow.js processes each video frame in under 30ms, maintaining smooth 30fps rendering.
91%
Anomaly Recall
Head-pose deviation model correctly flagged 91% of simulated cheating scenarios in controlled testing.
Business Impact
For platforms running 500+ assessments per month, the elimination of human proctoring fees translates to a direct cost saving of roughly $2,500–$7,500/month at typical market rates. Because all ML inference runs on client devices, the system scales horizontally without server cost increases 500 concurrent exams and 5,000 concurrent exams cost identical infrastructure.
The privacy-by-design architecture (zero video transmission) eliminates the legal exposure that accompanies video storage, making the platform compatible with India's DPDP Act, the EU's GDPR, and FERPA regulations in the United States without requiring separate compliance engineering.
Where This Technology Applies
EdTech and Universities
Secure remote examination for student cohorts of any size. Generate per-student integrity reports automatically. No invigilation staff required for online batches.
Technical Recruitment
Proctor live coding assessments and take-home tests for software engineering roles. Detect candidate impersonation or unauthorized resource use during the assessment window.
Certification Bodies
High-stakes professional certification exams (finance, legal, medical) that require documented integrity trails for regulatory compliance and accreditation audits.
Technology Stack Breakdown
Client-side inference engine. Runs the quantized gaze estimation model at 30fps using GPU acceleration via WebGL. Eliminates server-side GPU compute costs entirely.
Google's browser-native face landmark detection library. Extracts 468 3D facial keypoints per frame using BlazeFace for detection + a lightweight landmark regression model.
React.js + Canvas API
The exam UI renders the webcam stream, overlays real-time landmark visualizations on a canvas element, and manages session state. Web Workers handle ML scheduling off-thread.
Node.js + Express.js API
RESTful API and WebSocket server. Receives structured anomaly event objects, validates JWT tokens, and persists events to MongoDB. Never processes video or audio data.
MongoDB
Document database storing candidate profiles, exam sessions, and anomaly event arrays. Schema-free design allows flexible event structure as anomaly types evolve.
JWT Authentication
Stateless authentication across both REST and WebSocket connections. Custom claims carry session context. Refresh token rotation handles long exam sessions.
Built Across Multiple Disciplines
This project combined DarsLab's machine learning integration expertise with production full-stack engineering. If your platform needs a similar AI-native feature built into an existing web product, these are the service areas that apply:
Custom AI Solutions and Integration
We integrate ML models, LLMs, and computer vision into production web apps.
Enterprise Full-Stack Web Development
MERN, Next.js, and Python-based backend systems built for scale.
Performance-Focused UI Design
Interfaces that handle real-time data streams without layout jank.
AI Chatbot Development India
Browser-native and server-side AI interfaces for Indian businesses.
AI Proctoring FAQ
Q.How does AI video proctoring ensure student privacy?
All ML inference runs locally on the student's device using TensorFlow.js. No video pixels are ever transmitted to any server. Only structured metadata (timestamps, anomaly confidence scores, event types) is sent to the API, making the system GDPR-compliant and compatible with India's DPDP Act.
Q.What is the accuracy of the gaze estimation model?
The system achieves 91% anomaly recall on simulated cheating scenarios. Face presence accuracy using MediaPipe Face Mesh exceeds 94% in standard webcam conditions. Inference latency is below 30ms per frame using the TensorFlow.js WebGL backend.
Q.Does the AI proctoring system work without a dedicated GPU server?
Yes. TensorFlow.js uses the client device's GPU via WebGL for all ML inference. This means zero server-side compute cost for video analysis. The system scales to thousands of simultaneous exam sessions with no GPU infrastructure cost increase.
Q.Which browsers are supported?
Google Chrome (recommended), Microsoft Edge, and Safari 15+. All require WebGL 2.0 support. Firefox is not recommended for production proctoring due to inconsistent WebGL performance with TensorFlow.js models.
Q.Can DarsLab build a custom AI proctoring system for our platform?
Yes. We build custom AI proctoring solutions tailored to your platform's existing authentication, exam engine, and admin stack. We integrate face detection, gaze estimation, multi-face detection, and reporting dashboards. Contact us to discuss your exact requirements.
Need a Custom AI Integration?
Build Your AI-Powered Application
From browser-native ML to LLM integrations. We engineer AI solutions that run at production scale on your existing MERN or Next.js stack.