Synthetic data is a game changer for Deep Learning

Alexey Mitin
1 min readJul 6, 2022

Dedication. To my father, Vladimir Mitin, for his interest in technology and science that formed my way of thinking.

Synthetic data generation will allow to release the power of Deep Learning as Software 2.0 at a significantly greater scale.

Model learned is a program “written” by DL algorithm, one neuron is a base unit of this program, neuron weights are code of the unit. The code is being learned out of data. Millions to billions “micro-developers” automatically program neurons code by crunching data passed into their input and corrections propagated backward.

Feeding DL algorithm with adjusted datasets allows to address model issues. If model works worse for specific data subset, then its tuning with data belonging to that subset will improve the situation. And preparing the adjusted subset is a manual bottleneck in the extremely efficient automatic process.

Synthetic data generation looks like a good way to solve this bottleneck. It potentially enables possibility to create fully automated model improvement loop, where:
- generated data is being used for learning a model out of it;
- model quality is being assessed;
- adjusted data is being generated to address the issues found.

--

--