Generative AI - a more colorful take

Tin Nguyen

Founder at AtherLabs, a web3 gaming studio. Developing Sipher Odyssey. Telegram @AskTinNguyen

Published Mar 17, 2023

It’s been 7 months since I’ve started spending the weekends and late nights into this space of generative AI, thanks to the release of Midjourney, ChatGPT and Stable Diffusion. What I’ve learned from this journey, is what I’m sharing today and hopefully more in the future. I’ll try to keep it in simple english and if anything is factually incorrect, please forgive me.

The Big Step Changes (at least to me)

The three big break through moments in this AI space in recent history

“Attention is all you need”
GPT-3 Moment - [Large] Size matters
Stable Diffusion - [Small] Size matters

“Attention is all you need”

Language Models is a probability distribution over sequences of tokens, symbols or words in a language. It’s main purpose is to as best as possible, predict with high probability the correctness of the next word, based on trained model and prompted inputs.

For example, a good language model of English will be able to understand and generate ‘correct’ responses with high probability

The paper “Attention is all you need” in 2017 was a big break-though due to 2 main innovations:

It proposes the transformer architecture with multi-headed self-attention which allow parallelize training and thus being able to harness the power of multi-GPUs
Multi-headed attentions allow neural network to learn multiple ways to capture more relationships of words and choose which part it pays more ‘attention’ to.

Transformer give us sequence-to-sequence (seq2seq) models which are comprised of encoders and decoder models. Seq2seq models are comprised of an encoder and a decoder. The encoder takes in a sequence of input data and creates a fixed-length representation of that input known as a context vector. The decoder then uses that context vector to generate a sequence of output data. We can use different models either as “Encoders” or “Decoders” or combine multiple of them to solve specific tasks.

Seq2seq models have been used successfully in a variety of applications, ie. BERT, ChatGPT, DALL-E and Stable Diffusion!

[Large] Size matters

GPT-2 was interesting, but couldn’t make an impact. GPT-3 in 2020 was a watershed moment with it’s 175 billion parameters trained was the big through in LLMs that we needed. And 2 years later, ChatGPT, an overlaid interface with rich affordances UX, coupled with fine tuning via reinforced learning with human feedback (RLHF), gave us the amazing product we all come to know and use today.

In LLMs, indeed large SIZE does matter. What awaits us is now the exciting arms race to deliver extremely large model trained in trillions of parameters, coupled with further RLHF. We all will have Microsoft, Facebook, Google, OpenAI and Moore’s Law to AI compute power and computer energy efficiency to thank for!

For context, GPT-2 XL has 1,5B parameters whereas GPT3 has 175B parameters and GPT-4 has around 100,000B parameters, a ~600 increase in size.

Internally, we are already using ChatGPT to help us significantly in the narrative and creative writing process. Exploring multiple options and iterate very quickly upon content writing style, plot changes and more. Then we feed it into our special Stable Diffusion workflow to then convert our written words into visual languages to communicate our ideas internally.

Recommended by LinkedIn

The Top AI Companies in the World Today: Redefining…

Geoff De Weaver 6 months ago

The Essential Guide to Understanding Generative AI

Dejan M. 12 months ago

The Transformer Paper: The Foundation of Today's…

Rajiv Saxena 2 months ago

[Small] Size matters

The next significant step change is with the release of Stable Diffusion in 2022. Being a small model in size, It allows people to see that:

we don’t need to own a huge model to benefit due to its open source nature
we can run the models on our home computer or mobile device (SD 1.5 is 5gb in size). We can experiment, build things around and ontop of it. We can train it using accessible hardwares

And within the span of less than 1 year, we saw:

Different diffusion models trained on different visual datasets (from realism to anime)

Opensource communities release hundreds of addons to SD WebUIs (namely Automatic 1111) that allows us to:

control the output in much better way than before using ControlNet, InPaint & OutPaint
train cheap and effective LoRA models to combine with base models to further control outputs
and so many other amazing plugins releasing on a weekly basis

The ability to have controls over outputs is exactly what got me super excited about Stable Diffusion potential use cases in our industry.

Here are some examples of the outputs I’m able to achieve for Sipher at Ather Labs using a combination of: Stable Diffusion, ControlNet OpenPose, Canny & Depth, Inpainting and LoRA model training.

If you are interested in learning more about our efforts in making use of this amazing technology, please say hello!

What’s next?

This is an exciting space, especially since now we have efficient LLMs that could make meaningful embedding connections in the embedded vector space.

Size matters as well, as smaller and more nimble indie studio like us will be able to embrace these type of technologies much faster towards our workflow, with the goals of freeing up our team mates to focus on the higher creativity works that AI cannot perform.

My future posts will be about what we at Ather Labs are using this technology for, especially in the creative, game pre-production, production & game marketing context!

Generative AI - a more colorful take

Tin Nguyen

Founder at AtherLabs, a web3 gaming studio. Developing Sipher Odyssey. Telegram @AskTinNguyen

The Big Step Changes (at least to me)

“Attention is all you need”

[Large] Size matters

Recommended by LinkedIn

[Small] Size matters

What’s next?

Insights from the community

Others also viewed

The Transformer Paper: The Foundation of Today's Generative AI & How it will impact organizational efficiency and productivity

The Generative AI Revolution: How AI's Newest Wave Will Transform Business

AI Sees the World: Unlocking GPT-4V and Claude 3’s Visual Understanding

AI, ML, and deep learning; a 2min summary

What Is Generative AI?

Generative AI vs. Predictive AI: A Teen's Guide to Understanding the Future of Technology

A Deep Dive into Generative AI Techniques

Generative AI - A Trillion Dollar Transformation

Generative AI - Chapter 1 [The beginning, context leading to singularity]

GPT-3 and the Future of Knowledge/ "Coincidensity" and Serendipitous Software/ Artificial Muscle Made of Sewing Thread/ Deep Teaching/ Robotic Skin

Explore topics