Synthetic Data for Machine Learning

Abdulrahman Kerim; Lancaster University (United Kingdom)

Supervised machine learning methods require large-scale training datasets to converge. Collecting and annotating training data is expensive, time-consuming, error-prone, and not always practical. Usually, synthetic data is used as a feasible data source to increase the amount of training data. However, just directly using synthetic data may actually harm the model's performance or may not be as effective as it could be. This thesis addresses the challenges of generating large-scale synthetic data, improving domain adaptation in semantic segmentation, advancing video stabilization in adverse conditions, and conducting a rigorous assessment of synthetic data usability in classification tasks. By contributing novel solutions to these multifaceted problems, this work bolsters the field of computer vision, offering strong foundations for a broad range of applications for utilizing synthetic data for computer vision tasks.In this thesis, we divide the study into three main problems: (i) Tackle the problem of generating diverse and photorealistic synthetic data; (ii) Explore synthetic-aware computer vision solutions for semantic segmentation and video stabilization; (iii) Assess the usability of synthetically generated data for different computer vision tasks.We developed a new synthetic data generator called Silver. Photo-realism, diversity, scalability, and full 3D virtual world generation at run-time are the key aspects of this generator. The photo-realism was approached by utilizing the stateof-the-art High Definition Render Pipeline (HDRP) of the Unity game engine. In parallel, the Procedural Content Generation (PCG) concept was employed to create a full 3D virtual world at run-time, while the scalability (expansion and adaptability) of the system was attained by taking advantage of the modular approach followed as we built the system from scratch. Silver can be used to provide clean, unbiased, and large-scale training and testing data for various computer vision t

Synthetic Data for Machine Learning

Abstract

Metrics

Details

Synthetic Data for Machine Learning

Abstract

Metrics

Details

Usage Policy