lytix ai reposted this
Getting started with your LLM evaluations layer can feel like a chicken and egg problem - you don’t really know what to evaluate for before you launch, but do you want to launch without evals? And if you’re using techniques like LLM as Judge or Agent as Judge, how do you know if you’ve correctly instrumented your models to catch the errors you do know about? We've found that creating evaluation agents from a dataset of example events can be a great way to get started. The literature, and our experience building evaluations for customers, have found benefits to including high-signal events in your evaluators. We spun up a tactical deep dive on how to find high signal example events, and turn them into your first evaluation agents. https://lnkd.in/g_fjnfck