Let's not underestimate Synthetic Data !
In 2018 (which is like ages in technology time) I wrote this post about how leveraging synthetic augmented data helped me get a better score in a Kaggle challenge.
A simple context and domain-aware synthetic data can do wonders for our ability to scale ML to problem areas where human-captured data is either limited by quantity or quality.
In the next 2 years, we will see more and more domain-specific fundamental models being released where the moat will be the ability of teams to create synthetic data while balancing the bias. The age of AI even beyond the hyperscaling equations will need domain experts who can build the fodder of synthetic data to help the models become better as benchmarks become harder and only hard, and real problems remain which most of the AI is still quite bad at solving.