Which Generative AI Tools You Really Need: 101 Selection Guide
Selecting the right tool becomes increasingly challenging. AI Tool repositories, such as Futurepedia, Future Tools, TopAI.tools, and There’s an AI for that list up to 10,000 different AI Tools, and every month hundreds of new tools arrive. Usually, these repositories are categorized by different use cases, but even in these segments, it becomes impossible to test all options.
Most AI tool guides fail to provide clear instructions on what really matters when selecting AI tools. Therefore, I like to share my method of categorizing AI tools not only by their use case, but instead by their capacity level, which is in my eyes a more useful scheme for comparing tools.
Fundamental considerations
Learning curve
Every tool needs some time to dive into the mechanics. Using AI tools is a comparable skill to ‘googling’. It took the majority of people years to understand how to apply e.g. ‘long tail keywords’, ‘date filters’, and ‘exact matching quotes’ to get desired results. Unfortunately, I see many people using AI prompts with their search techniques and getting frustrated with miserable results, without realizing that they are not interacting with a search engine. Set aside time to ‘unlearn’ and approach the situation differently. A good starting point is to treat the AI as if it’s an inexperienced, clumsy junior assistant who needs every single step explained. Given the time required to learn each tool, it is not productive to hastily jump between tools.
In 2023 the number of AI tools exploded, logically often by startups founded this year. As we can expect, all tools that don’t attract sufficient users will die in the next two years. My approach here is simple: I use the “similarweb” browser extension first to compare how many people visit a tool’s website. Additionally, you can count the tool’s likes in the aforementioned repositories. Both are no guarantee, but in summary a sufficient quick check of their long-term survivability. Additionally, to a revenue stream, every widespread tool collects immense data about what works which can stack up to a competitive advantage over time.
Switching costs
Modern Generative AI Tools are much more than LLM (large language models), but recombine a plethora of machine learning techniques. So learning is two ways: Also tools get better if you interact with them. To get similar results to a different tool of the same category, you often need to transfer conversions, contexts, and data to the new tool. Unfortunately, there are no established standards yet that enable true interoperability between tools, therefore this could result in tedious copy & paste tasks.
Thus, if you compare the performance of multiple AI tools, take your invested time into account. Reasonably, a well-known playmate will outperform the new kid on the block, but that is no evidence of their long-term capacities.
In a nutshell: Add missing capacities vertically first before you replace tools horizontally on the same level. In general, there are two distinct, but equally valid strategies:
7 AI capacity levels
Models
Have you ever used a crappy AI chat on a website? Usually, the root cause is that someone believed it had been a good idea to develop their own model. Don’t get me wrong: There are rare cases when a custom model makes sense, but a widespread mistake is to underestimate the effort: In companies like OpenAI, Google, and Meta there are hundreds of employees only working on the model layer. It is naive to believe that you can generate a model of similar quality with less investment.
A reason to build your own model could be different data structures (e.g. protein folding) or extraordinary data security requirements. But even most higher-level tools don’t use their own models, instead, they connect to the APIs of larger models. Sometimes it is hard to find out the exact version they connect to, but usually, you get better results if you combine the next two layers (embeddings + fine-tuning), instead of building your own model.
There are heated debates about which is the best model, but build your own opinion by looking it up directly, e.g. in Hugging Face’s Leaderboard. In general, for language output, GPT4 is the benchmark, but other models may have other advantages. For example, Google’s bard has access to the internet and keyword data, making it perfectly suitable for keyword research. I am personally not a big fan of GPT4’s tone of voice (despite all customization), Claude 2 should shine forth here (unfortunately not publically available in Germany yet). Sadly, I don’t know an equally comprehensive leaderboard for image, video, music, or other output formats.
General Recommendations:
Budget Recommendations:
Embeddings
People often built their own models in the past, when they wanted to get results based on customized data. While being a legit requirement, e.g. for a chatbot or a research tool for internal documents, they often forget that a model is only a stochastical parrot without true understanding. This means that if an association is not made in the training data, a model wouldn’t be able to respond intelligently, for example, if you use synonyms. The training effort is traditionally underestimated, resulting in mediocre and frustrating results.
You might wonder yourself, if you can’t enter the data via a prompt. Yes, you can extend the conversation context, but there are long-term memory limitations. For example, GPT-3.5 had a maximum of 4000 tokens (~6 pages, although they have recently increased these limits and offered more extensive models). Yes, you can enter more data, but if you reach the limit, then the data will be compressed. This is actually comparable to a real-world conversation. If you join a long-term meeting, no one knows exactly what was discussed in the beginning, our brains just store the essence.
Alternatively, you can use the recently added GPTs upload function, but even here you quickly run into limitations (10 uploads / 20 via Assistant API). So if you expect AI to respond accurately to a larger dataset, the better strategy is embeddings (together with fine-tuning). As the name suggests, you are ‘embedding’ your custom knowledge into an existing model. Major advantage: All the features and associations of an original model stay in place and are just extended.
Usually, this step needs some computation power, because your knowledge needs to be converted into vectors, and the relations to existing nodes need be be calculated. This process can be carried out locally using tools like Hugging Face, but I’ not aware of any budget option that doesn’t need at least some technical skills.
General Recommendations:
Budget Recommendations:
Fine-Tunings
Let’s assume you build your own custom chatbot with embeddings. If you have a conversation, the model will base its response on both: the original training data and your custom data indistinguishable. Usually, that is not what you want. Here comes fine-tuning into play: You tell the model with examples, how it should respond, e.g. use factual data ONLY from your custom data set, or only use the original data if there is nothing in your custom data.
Again: You can add these conversation examples directly to the prompt as well, but you run into the same memory limit as described above. Unlike embeddings, fine-tunings are not designed to add new data, but just to steer the underlying model. With fine-tuning you can limit the data set, but you can also guide on the kind of response, the tone of voice, the structure, and style. Of course, fine-tuning can be used without embeddings as well. This becomes especially handy if you have a brand voice you like to enforce. Even if you develop your own chatbot, that is the way, you can enable e.g. humorous, sarcastic answers.
Recommended by LinkedIn
General Recommendations:
Budget Recommendations:
Instructions + Parameters
A prompt may change during the conversation, while an instruction stays the same. An instruction performs therefore a similar function as fine-tuning, but unlike fine-tunings, you don’t use conversation examples, but descriptively tell what type of response you are expecting. Imagine instructions as ‘role characters’ that guide how AI should respond in a conversation. You could include them in the prompt, so instructions are mostly a comfort feature to make sure, that they stay always on top of the memory, and do not fade out quickly. Additionally, to verbal instructions, there are also sampling parameters (GPT-only: temperature, top_p…) that change the behavior of the model. This is also the reason, why Bing AI gives much shorter answers than ChatGPT, even though both are powered by GPT-4.
OpenAI recently revealed its new GPTs feature, which created a lot of buzz. But actually, that is mostly just a convenient way to manage instructions. There are some debates, about whether it is possible to change sampling parameters via chat input, but probably you need to use either the playground or a 3rd party tool.
General Recommendations:
Budget recommendation:
Prompts
Prompts are the heart of Generative AI and usually the biggest set screw in getting good results. If someone dives into using AI, I would always recommend learning prompting techniques first. But most people are lazy which explains the existence of most high-level AI tools: Most of them started as merely prompt template collections. There is nothing wrong with reusing working templates instead of reinventing the wheel by yourself. Prompt templates are like convenience food: A solid starting point, but with little effort you get better results by doing it manually.
Instead of only consuming convenience prompt templates, I would recommend building your own prompt collection over time. A good starting point could be Maximilian’s Vogel collections here: 3000+ prompts, courses, and techniques.
Recommendations:
Budget Recommendation
Integrations
Using ChatGPT as a standalone tool is already the Pareto solution, but you get quickly annoyed by copying & pasting your prepared prompts, exchanging instructions, and posting the result to a usable platform (e.g. Social Media). That’s just a lot of tedious context-switching. It is much more convenient and less distractive if your entire workflow is truly integrated into one tool and you never need to leave the window.
ChatGPT plugins appear to be tempting solutions as they can communicate with 3rd party tools. However, their implementation is really disappointing; they often crash and are slower than using the tools directly (not sure why). On the other hand, the first-class integration in Bard is surprisingly good, actually. You can create and export tables to Google Sheets, and have access to the internet and to recent keyword data, which makes it my go-to tool for a keyword or any other market research.
Moreover, I am using Notion AI. Unfortunately, still powered by GPT-3.5, but e.g. sufficient for summarizations, and creating action items. You even can them integrate into your notion databases and automate them, e.g. if you add a new item it automatically adds tags. Moreover, they just released Notion AI Q&A, which allows you to have a conversation with your complete Notion knowledge base. For grammar and spelling, I use the free version of Grammarly, which integrates with their plugin smoothly with most writing tools.
For coding, I prefer Github Copilot and Codium, which integrate via plugin smoothly into my preferred editor (VSCode). I just recently added Codium to the list, because I think their test case generation is beautiful and helps you to focus on the part of the job which is enjoyable. Numerous studies show that development speed can increase as well as mental energy consumption can decrease, without sacrificing quality. Actually, with AI’s explanation and improvement skills, they can act like a peer programmer. It already started to change the IT industry fundamentally. E.g. I don’t think, that there will be extensive technical documentation stored in a wiki anymore in the future, because AI will be able to generate it on-demand, real-time, based on the actual running code. Also, all the low-code tools will get a problem, because the coding simplifies a lot, without sacrificing abilities.
Recommendations:
Budget Options
Agents
Maybe you already realized, that creating an holistic master prompt doesn’t guarantee the best results. On the one hand, you run into memory limitations, on the other hand, you get better results if you lead AI step-by-step through the process. This is sometimes called ‘Chain-of-Thought’ and is long-known if you like to get logical consistent answers. An AI follows stochastically probable pathways, which impose limitations on its ability to think laterally. But this principle also applies to other tasks. For example, you can get surprisingly different results if you request a conclusion with an explanation in one step (e.g. ”Mac vs PC: What is better and explain why”), or if you split it into two steps (”1. List reasons why an operation system may be better than another one” “2. Draw a conclusion if Mac or PC is better”).
Agents automate this step, by leading the AI through multiple conversation steps, until you get a sufficient answer, e.g. planning a travel or preparing a family party. So, an agent chains prompts in a meaningful way. They are therefore an autonomous prompt driver. An AI for AI.
These kinds of tools are just in their baby shoes, but I expect to see many new ones in that category in 2024.
Budget Recommendation: