This is a follow-up work on TOD, which selects one of multiple deep neural networks (DNNs) to perform real-time video analytics (object detection) on low-end devices, e.g., on the camera itself. TOD uses the median object size to determine which one out of 4 YOLO DNNs will meet the real-time requirement best, with respect to object size and speed of the objects. TOD requires specific knowledge of the device to select appropriate thresholds on the median object sizes, and needs to be retuned for each computing device.
ROMA removes the limitation that TOD imposes by performing a more detailed analysis of the image content. In particular, ROMA separately estimates the impact of the selected DNN on object size and on object speed. Its formulation is sufficiently flexible to adapt to changes in the computational power of the device such that it does not need to be retrained when migrating across hardware. Moreover, this way, ROMA can adapt to runtime changes in computational power, which may arise from power management features on the device, or from other workloads which share the device. ROMA does however have hyper-parameters that are dependent on the YOLO DNNs.
ROMA will be presented at the 2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV).
You can read the post-print on arXiv.