Active-Learning method : An effective way to generate ground truth data to test & validate ADAS functions

Rashmi Katariya; Anita Kumari

In today’s autonomous industry, Machine learning plays an important role in development of various functions for self-driving vehicles. The learning models are trained using data, and their performance is determined by the quality of data they get trained on. For autonomous driving, sometimes, we don't have valid data and our model doesn't know what to do. As a result, the models must be trained on as much as diverse data as possible to make them efficient to deal with unusual situations in the actual world. Huge amount of unlabeled data is collected by cars moving around, equipped with multiple sensors to capture the surrounding environment in a frame. It becomes difficult to annotate each data manually and find data frame with more information. Therefore, automated methods are needed to identify and extract the most informative data from collected ones. Active learning is a powerful data selection technique which improves data efficiency for learning models. In Active Learning, the uncertainty score for each frame is calculated using various uncertainty sampling strategies such as Least Confident, Margin-based, Entropy based, etc. The frames with high uncertainty scores are sampled from unlabeled pool of data and are annotated manually. Further, these labelled images are added to the training dataset of a model. The required performance is achieved after a certain number of iterations. This paper contributes towards evaluating a model using active learning to sample data frames with good information and train the model based on those labelled frames to reduce human efforts. It also provides a comprehensive review and testing of the main families of active learning algorithms, including pool based, Least confident sampling strategy, and Entropy-based approaches. In this paper, a pre-trained YOLOv3 model trained on COCO dataset is considered to calculate the confidence value for each object detected in each frame. Least Confident Sampling strategy and Entropy based Sampling strategy are used to calculate the uncertainty scores. This gives the informative measure for each frame in dataset. The frames with high informativeness measure are considered for manual labeling, instead of whole unlabeled dataset. Depending on complexity of dataset it is observed that manual labeling effort is reduced from 5% to 35%. With Least Confident sampling strategy the accuracy of 63% is achieved with 9 iterations, whereas for Entropy based sampling strategy, the accuracy achieved is 73% in 11 iterations. The open problems and future directions are also described further in this study.

Active-Learning method : An effective way to generate ground truth data to test & validate ADAS functions

2024-26-0364

1/16/2024