Does your chatbot or AI agent need large-scale evaluation? We partnered with a team working on chatbot improvements to 𝘀𝗰𝗮𝗹𝗲 𝗵𝘂𝗺𝗮𝗻 𝗲𝘃𝗮𝗹 𝗯𝘆 𝟮𝟱𝟬𝘅 𝗮𝗻𝗱 𝗮𝗰𝗵𝗶𝗲𝘃𝗲 𝘀𝘁𝗮𝗯𝗹𝗲 𝗾𝘂𝗮𝗹𝗶𝘁𝘆 𝗮𝘁 𝟴𝟱% 𝗽𝗿𝗲𝗰𝗶𝘀𝗶𝗼𝗻 𝗮𝗻𝗱 𝗿𝗲𝗰𝗮𝗹𝗹. Here’s how we did it: ✅ Rapidly built eval pipelines tailored to the client’s chatbot dialogs ✅ Onboarded and trained vetted experts to conduct evaluations ✅ Ran in-depth evaluation across 5 criteria: transparency, groundedness, accuracy, responsiveness, helpfulness ✅ Adjusted the data pipeline dynamically to meet specific needs ✨ Quality control is our superpower. Read the case study: https://lnkd.in/gDzG8gSW #AIevaluation #Chatbots #CaseStudy
Toloka
IT-services en consultancy
Your high quality data partner for all stages of AI development
Over ons
Toloka empowers businesses to build high quality, safe, and responsible AI. We are the trusted data partner for all stages of AI development from training to evaluation. Toloka has over a decade of experience supporting clients with our unique methodology and optimal combination of machine learning technology and human expertise, offering the highest quality and scalability in the market.
- Website
-
https://toloka.ai/
Externe link voor Toloka
- Branche
- IT-services en consultancy
- Bedrijfsgrootte
- 51 - 200 medewerkers
- Hoofdkantoor
- Amsterdam
- Type
- Naamloze vennootschap
- Opgericht
- 2014
- Specialismen
- Data Annotation, Data Labeling, Machine Learning, Computer Vision, Autonomous Driving, Training Data, Deep Learning, Search, Data Collection , Text creation, Crowdsourcing, Web research, Tagging, Categorization, Surveys, Sentiment analysis, AI Training Data, Natural Language Processing (NLP), LLM Benchmarking en AI Red Teaming
Producten
Toloka
Datawetenschap- en machinelearningplatforms
Empower AI Development and LLM Fine-Tuning Elevate your ML with next-level expert data for SFT and RLHF. Access skilled experts in 20+ domains and 40+ languages with unlimited scalability, backed by an advanced technology platform.
Locaties
Medewerkers van Toloka
-
Andrew Braun
Global Accounts at Toloka, a global leader in crowd science and AI
-
Dmitriy Kachin
VP of SaaS Product, Toloka AI | Ex COO of Chatfuel (YC W'16)
-
Tania Ignatova
Director of Finance @ Toloka | Financial Planning and Analysis | ex-Microsoft
-
Oleg Levchuk
CPO at Toloka AI, ex-Yandex
Updates
-
See you next week at #COLING2025! Our team will be participating in the following events along with our partners at MBZUAI (Mohamed bin Zayed University of Artificial Intelligence). Don’t miss out! January 19th @ 14:00 - Hands-on Tutorial: Labeling With LLMs and Human-in-the-Loop. Co-organizers Akim Tsvigun (University of Amsterdam and Nebius), Dominik Schlechtweg(University of Stuttgart), with Natalia Fedorova, Boris Obmoroshev, Sergei Tilga, Ekaterina Artemova, and Konstantin Chernyshev from Toloka. January 20th @ 11:00 - Ekaterina Artemova joins Panel Discussion 1 at SUMEval 2025: Challenges of Collecting Culturally Grounded Multilingual Data for Training and Evaluation of NLP Systems. January 20th @ 11:15 - Shared task at Gen AI detection workshop on Binary Multilingual Machine-Generated Text Detection (Human vs. Machine). Co-organizers: Prof. Preslav Nakov, Prof. Iryna Gurevych (MBZUAI), Prof. Nizar Habash (NYU Abu Dhabi) and Ekaterina Artemova (Toloka) #AI #ML #AIContentDetection #COLING2025 #Partnership #GenerativeAI #Research
-
*2025 trends: AI Agents* AI agents are the buzzword of the moment—systems designed to make decisions and take actions independently. While we’re seeing early iterations, the industry is still figuring out what’s feasible and how best to approach building and applying agentic systems. In 2025, the industry will distill the concept of AI agents and define what they are capable of. At Toloka, we’re delivering data to enhance the capabilities of agents in real-world applications. Our current explorations are focused on three areas: effective methods for post-training, high-quality data collection, and safety evaluation and red teaming to identify and mitigate risks inherent in agent applications.
-
✨ 3 benchmarking initiatives from Toloka Research: math, science, and AI detection ✨ As AI models become more fluent, it’s getting harder to identify errors and biases. Our research team has released several domain-targeted benchmarks that match the practical demands of real-world AI evaluation. Read about our latest benchmarks: https://lnkd.in/gh_gpxA7 ➡️ University-level math (U-MATH and μ-MATH benchmarks) ➡️ Natural sciences / STEM ➡️ Artificial text detection (Beemo benchmark) But benchmarks are just a starting point for model evaluation. To pinpoint model weaknesses and areas for fine-tuning, our in-depth approach: - Leverages a global network of domain experts for deep insights. - Augments and assists human experts with fine-tuned LLMs for efficiency and precision. We can help you go beyond the basics of evaluation and fine-tuning. Connect with our team. #AIevaluation #MachineLearning #AIModels #AIbenchmarks
-
Toloka heeft dit gerepost
In the AI industry, precision in niche domains is overtaking broad knowledge as the benchmark for success. Check out Toloka's CEO Olga Megorskaya's latest article for Forbes to dive into how SFT is shaping the future of AI and making real-world applications more effective.
From Generalist to Specialist: the latest article for Forbes explores how Supervised Fine-Tuning (SFT) is the key to transforming large language models (LLMs) into domain-specific experts. At Toloka, we are helping developers fine-tune LLMs with high-quality, domain-specific data provided by our Mindrift community of experts. From coding to finance and beyond, SFT enables models to deliver precise insights tailored to specialized fields. https://lnkd.in/eh8NDiYJ
-
In the AI industry, precision in niche domains is overtaking broad knowledge as the benchmark for success. Check out Toloka's CEO Olga Megorskaya's latest article for Forbes to dive into how SFT is shaping the future of AI and making real-world applications more effective.
From Generalist to Specialist: the latest article for Forbes explores how Supervised Fine-Tuning (SFT) is the key to transforming large language models (LLMs) into domain-specific experts. At Toloka, we are helping developers fine-tune LLMs with high-quality, domain-specific data provided by our Mindrift community of experts. From coding to finance and beyond, SFT enables models to deliver precise insights tailored to specialized fields. https://lnkd.in/eh8NDiYJ
Council Post: From Generalist To Specialist: The Role Of SFT In LLM Evolution
social-www.forbes.com
-
*2025 trends: Generative AI in the enterprise* As the new year gets rolling, our CEO, Olga Megorskaya, shared realistic expectations for GenAI adoption and how we're addressing it in 2025. The rapid evolution of frontier AI models presents a universal challenge: each new iteration, like the next GPT, raises the bar unpredictably, making it difficult for companies to determine where to invest. Legacy enterprises are still struggling to translate GenAI experimentation into real-world applications. Many view it as risky and lack reliable methods to evaluate solutions built on GenAI models. At Toloka, we’re devoting experiments and research to improving evaluation methods and benchmarks that can inform enterprise decisions and support GenAI adoption.
-
The truth about LLM math reasoning skills: The best problem solvers are not the best judges. 🔹 Our U-MATH benchmark ranks LLMs on university-level math 🔹 Our μ-MATH meta-benchmark ranks LLMs as judges of math solutions We tested large and small models to uncover their strengths and weaknesses. What we found: ➡️ Smaller specialized models like Qwen2.5-Math-7B outperform the larger LLaMA-3.1-70B in terms of judging skills. ➡️ Gemini’s performance as a judge is surprisingly poor, due to over-tuning. Do you know where your model stands? Download U-Math and μ-MATH on Hugging Face to test the LLMs you are using or building: https://lnkd.in/gVd8bA_t Learn more: https://lnkd.in/guXbUmXB
U-MATH & μ-MATH: Assessing LLMs on university-level math
toloka.ai
-
Attending #COLING2025 in Abu Dhabi? Join us for a Hands-on Tutorial: Labeling With LLMs and Human-in-the-Loop. You’ll learn how to speed up data annotation and reduce costs and human workload. 🔹 When to use synthetic training data, active learning, and hybrid labeling 🔹 Real-life case studies and practical applications 🔹 Best practices to optimize quality of the final dataset 🔹 Hands-on workshop to set up hybrid annotation Co-organizers: Akim Tsvigun (University of Amsterdam and Nebius), Dominik Schlechtweg (University of Stuttgart), with Natalia Fedorova, Boris Obmoroshev, Sergei Tilga, Ekaterina Artemova, and Konstantin Chernyshev from Toloka. Learn more: https://lnkd.in/ghc3vh-m 🗓️ January 19, 2025. See you there! #AI #ML #COLING2025 #annotation #DataLabeling
-
Highlights of 2024 from our parent company, Nebius Group — what an exciting year it's been!
It has been an amazing year for Nebius Group, and as we head into the holiday season, we’ve been reflecting on what we’ve achieved in 2024. This year we launched as a group, but it has been so much more than that. It’s hard to capture everything in one post, but if we had to choose a few key highlights: 💹 Restarting trading on Nasdaq – our shares restarted trading in October under our new ticker symbol “NBIS”. 💰 Securing USD 700M – in December we announced a strategic equity financing with investors including NVIDIA, Accel and Orbis Investments to accelerate our full-stack AI infrastructure rollout. 🚀 Launching our AI-native cloud – in October Nebius unveiled its new cloud computing platform, built from scratch for the age of AI to manage the full ML lifecycle all in one place. 🤝 Avride x Uber – also in October, our self-driving business partnered with Uber, bringing Avride’s delivery robots and vehicles to Uber and Uber Eats. 🏆 TripleTen scoops bootcamp award – Fortune Magazine in May named TripleTen as the best software engineering boot camp in the US to teach total beginners the language and tools needed to become a full-stack developer. 💫 Toloka moves into GenAI – Our data business evolved to become a trusted partner for leading global foundation model producers and Big Tech companies. Thank you to all of our colleagues, friends and partners who have been on this journey with us so far. See you in 2025 to continue to build on this year’s successes!