Public security threats are an increasing concern to law enforcement, civilians, and first responders. In most cases of public attacks, first responders must take action after the attack is underway, forcing them to use lethal weapons such as firearms in defense.
Our client envisioned building a security surveillance system for both home and commercial use which can detect and alert openly carried weapons (various types of guns) from a surveillance camera in real time. This needs to be achieved by processing the video data locally, on the edge , and the results pushed to mobile app.
Working with our client, we built an AI solution for smart cameras. This is used for threat detection and response monitoring system that can detect weapons in real-time.
In this blogpost, we talk about our journey of leveraging Insight framework to build the solution:
Below is the high-level view of end-to-end solution deployed to Nvidia Jetson Xavier-NX
- 8MP camera is used to capture the continuous video feed. HD recording with a resolution is used to detect weapons from distance
- Nvidia’s hardware accelerated GStreamer elements to process raw camera video stream. Frames extracted from video are resized
- Frames are sent to weapon detection models running on Jetson Xavier-NX edge device
- Raise an alert for gun detection based on threshold and notify the first responders
- Encode the video stream using Nvidia’s hardware codecs (H.264)
- Publish the detected frames in encoded video stream format over the network to Ant Media server on AWS over RTMP protocol for remote monitoring
Video recordings of people carrying authorized weapons (long gun, hand gun, shot gun, knives etc.) are recorded. Surveillance CCTV is positioned indoor and outdoor environments with day and night modes. Recorded around 60 hours of video with security personnel carrying their authorized weapons to mimic the threat.
|Original size||220,000 images|
Dataset is augmented to supplement the original data. Applied data over-sampling technique to balance the dataset.
|Augmented size||1.2 M images|
Image Labelling Workflow on AWS:
We leveraged Wavelabs AI Insight framework to set up the image labelling workflow on AWS.
Below is the high-level description of workflow:
- Video Recordings (.mp4) are stored in S3
- Built code to Filter frames that did not meet the quality bar
- Built code to Select frames and generate unique ID’s. 1 out of 5 frames are selected
- CVAT tool is set up and a labelling task is created
- Defined a set of labelling instructions (exclude frames where gun visibility is < 20%)
- Built code to augment the images
- Build code to convert annotations to YOLO format. Labelled frames and annotations are stored in S3
Automated Annotation on Nvidia Tesla K80
Did the annotations for 19 classes in total. Out of which 10 are gun related classes. To reduce the false positives, included objects similar to guns in our object detection model. Also did annotation to detect the face along with the gun.
Did the set-up of annotation pipeline on AWS EC2 instances. Used CVAT tool for defining the bounding box for 10 different guns.
Did the inferencing of 8 COCO classes on Tesla K80 with an inference latency of ~ 2 sec per image. Frames of the video recordings are sent to pre-trained object detection model for inferencing. The annotations are stored in JSON (class name and boundary box dimensions)
Did the detection of faces on Tesla K80 with an inference latency of ~ 1 sec per image. Frames of the video recordings are sent to pre-trained face detection model for inferencing. The annotations are stored in JSON (class name and boundary box dimensions)
Data scientists trained models to identify the best model that meets the performance criteria. Leveraged the Insight training pipeline to train the models on 4 V100 GPU’s using data parallelism mechanism in Keras. Accelerated the training time by 3 times
Distributed training pipeline on AWS
Model flow diagram
Below table provides details of the algorithm and hyper-parameters used for model training:
|Model learning strategy||Open-source model fine-tuned on custom dataset|
|Training dataset||1.2 M images|
|Training CPU||4 x Tesla V100|
|Training duration||5 days on 4 V100 CPU’s|
Model performance for long gun class is shown below:
AI Engineers optimized the model using TensorRT to maximize throughput and minimize inference latency. We leveraged the Insight serving pipeline to deploy optimized model to Nvidia Jetson Xavier-NX.
Did the inferencing on edge leveraging Nvidia Jetson family devices for both weapon and face detection. Below table provides the summary of metrics for various devices. Xavier series performed well at high FPS and thermals are under control. Xavier-AGX supported high throughput. Considering economic viability Xavier-NX is chosen as choice for deployment
We talked about how we built a weapon detection system leveraging Insight and deployed it to Nvidia Jetson edge device. Continuous video feed from camera is processed by AI model for detection of weapons and raise an alert to the first responders.
About Wavelabs Insight:
Wavelabs Insight is an end-to-end AI framework to accelerate journey to launch AI models. It leverages open-source software modules and custom-built components to accelerate the time-to-deployment of AI solutions & ROI on your AI Initiatives. It provides components for data processing, model development and deployment.
Our systematic approach of identifying business needs through use case discovery and designing and building innovative AI solutions for deployment and scale translates data insights into business value.