ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy

This is a follow-up work on TOD, which selects one of multiple deep neural networks (DNNs) to perform real-time video analytics (object detection) on low-end devices, e.g., on the camera itself. TOD uses the median object size to determine which one out of 4 YOLO DNNs will meet the real-time requirement best, with respect to object size and speed of the objects. TOD requires specific knowledge of the device to select appropriate thresholds on the median object sizes, and needs to be retuned for each computing device.

ROMA removes the limitation that TOD imposes by performing a more detailed analysis of the image content. In particular, ROMA separately estimates the impact of the selected DNN on object size and on object speed. Its formulation is sufficiently flexible to adapt to changes in the computational power of the device such that it does not need to be retrained when migrating across hardware. Moreover, this way, ROMA can adapt to runtime changes in computational power, which may arise from power management features on the device, or from other workloads which share the device. ROMA does however have hyper-parameters that are dependent on the YOLO DNNs.

ROMA will be presented at the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).

You can read the post-print on arXiv.

TOD: Transprecise Object Detection

Real-time video analytics on the edge is challenging as the computationally constrained resources typically cannot analyse video streams at full fidelity and frame rate, which results in loss of accuracy. We propose a Transprecise Object Detector (TOD) which maximises the real-time object detection accuracy on an edge device by selecting an appropriate Deep Neural Network (DNN) on the fly with negligible computational overhead.

TOD makes two key contributions over the state of the art: (1) TOD leverages characteristics of the video stream such as object size and speed of movement to identify networks with high prediction accuracy for the current frames; (2) it selects the best-performing network based on projected accuracy and computational demand using an effective and low-overhead decision mechanism.

Experimental evaluation on a Jetson Nano demonstrates that TOD improves the average object detection precision by 34.7 % over the YOLOv4-tiny-288 model on average over the MOT17Det dataset. In the MOT17-05 test dataset, TOD utilises only 45.1 % of GPU resource and 62.7 % of the GPU board power without losing accuracy, compared to YOLOv4-416 model. We expect that TOD will maximise the application of edge devices to real-time object detection, since TOD maximises real-time object detection accuracy given edge devices according to dynamic input features without increasing inference latency in practice.

TOD was presented at the 5th International Conference on Fog and Edge Computing (ICFEC). Read the paper on arXiv.