Attending #COLING2025 in Abu Dhabi? Join us for a Hands-on Tutorial: Labeling With LLMs and Human-in-the-Loop. You’ll learn how to speed up data annotation and reduce costs and human workload. 🔹 When to use synthetic training data, active learning, and hybrid labeling 🔹 Real-life case studies and practical applications 🔹 Best practices to optimize quality of the final dataset 🔹 Hands-on workshop to set up hybrid annotation Co-organizers: Akim Tsvigun (University of Amsterdam and Nebius), Dominik Schlechtweg (University of Stuttgart), with Natalia Fedorova, Boris Obmoroshev, Sergei Tilga, Ekaterina Artemova, and Konstantin Chernyshev from Toloka. Learn more: https://lnkd.in/ghc3vh-m 🗓️ January 19, 2025. See you there! #AI #ML #COLING2025 #annotation #DataLabeling
Toloka
IT-services en consultancy
Your high quality data partner for all stages of AI development
Over ons
Toloka empowers businesses to build high quality, safe, and responsible AI. We are the trusted data partner for all stages of AI development from training to evaluation. Toloka has over a decade of experience supporting clients with our unique methodology and optimal combination of machine learning technology and human expertise, offering the highest quality and scalability in the market.
- Website
-
https://toloka.ai/
Externe link voor Toloka
- Branche
- IT-services en consultancy
- Bedrijfsgrootte
- 51 - 200 medewerkers
- Hoofdkantoor
- Amsterdam
- Type
- Naamloze vennootschap
- Opgericht
- 2014
- Specialismen
- Data Annotation, Data Labeling, Machine Learning, Computer Vision, Autonomous Driving, Training Data, Deep Learning, Search, Data Collection , Text creation, Crowdsourcing, Web research, Tagging, Categorization, Surveys, Sentiment analysis, AI Training Data, Natural Language Processing (NLP), LLM Benchmarking en AI Red Teaming
Producten
Toloka
Datawetenschap- en machinelearningplatforms
Empower AI Development and LLM Fine-Tuning Elevate your ML with next-level expert data for SFT and RLHF. Access skilled experts in 20+ domains and 40+ languages with unlimited scalability, backed by an advanced technology platform.
Locaties
Medewerkers van Toloka
-
Andrew Braun
Global Accounts at Toloka, a global leader in crowd science and AI
-
Dmitriy Kachin
VP of SaaS Product, Toloka AI | Ex COO of Chatfuel (YC W'16)
-
Tania Ignatova
Director of Finance @ Toloka | Financial Planning and Analysis | ex-Microsoft
-
Oleg Levchuk
CPO at Toloka AI, ex-Yandex
Updates
-
Highlights of 2024 from our parent company, Nebius Group — what an exciting year it's been!
It has been an amazing year for Nebius Group, and as we head into the holiday season, we’ve been reflecting on what we’ve achieved in 2024. This year we launched as a group, but it has been so much more than that. It’s hard to capture everything in one post, but if we had to choose a few key highlights: 💹 Restarting trading on Nasdaq – our shares restarted trading in October under our new ticker symbol “NBIS”. 💰 Securing USD 700M – in December we announced a strategic equity financing with investors including NVIDIA, Accel and Orbis Investments to accelerate our full-stack AI infrastructure rollout. 🚀 Launching our AI-native cloud – in October Nebius unveiled its new cloud computing platform, built from scratch for the age of AI to manage the full ML lifecycle all in one place. 🤝 Avride x Uber – also in October, our self-driving business partnered with Uber, bringing Avride’s delivery robots and vehicles to Uber and Uber Eats. 🏆 TripleTen scoops bootcamp award – Fortune Magazine in May named TripleTen as the best software engineering boot camp in the US to teach total beginners the language and tools needed to become a full-stack developer. 💫 Toloka moves into GenAI – Our data business evolved to become a trusted partner for leading global foundation model producers and Big Tech companies. Thank you to all of our colleagues, friends and partners who have been on this journey with us so far. See you in 2025 to continue to build on this year’s successes!
-
Last week our team members in Amsterdam and Belgrade came together to celebrate the holidays and a great year for Toloka. We are gearing up for 2025 and excited for what's to come! Happy holidays from our team to yours. #TolokaAI #TolokaTeam
-
Researchers from Toloka and CERN evaluated LLMs on complex science questions, with a new benchmark dataset created by domain experts. Highlights: 🏆 Llama outperformed every model in Bioinformatics 🏆 GPT-4o won overall Summary of the benchmark: - 10 subjects in the natural sciences - 10 criteria evaluated - 5 LLMs tested: Qwen2-7B-Instruct, Llama-3-8B-Instruct, Mixtral-8x7B, Gemini-1.0-pro, and GPT-4o Where all LLMs struggle to perform: - Depth and Breadth - Reasoning and Problem-Solving - Conceptual and Factual Accuracy What does it mean? - Accuracy varies across science domains. - All tested LLMs underperform on complex questions. - LLM responses can be misleading to non-experts. 👉 Read the article to find out more: https://lnkd.in/g-dWGtsP #AI #STEM #NaturalSciences #LLM #Benchmarking #GPT4o #Llama #GeminiPro #Mixtral #Qwen2
-
How much will your SFT data improve LLM performance? Find out before you start training. Our custom SFT dataset had a win rate of 74% when compared to similar open-source data. Here’s how we test the data we generate to evaluate quality and estimate fine-tuning results before delivery. Link in comments 👇 #GenAI #LLMs #FineTuning #DataQuality
-
Can multilingual data be effective for LLM alignment? One of the standout papers from #EMNLP caught our attention: "RLHF Can Speak Many Languages: Unlocking Multilingual Preference Optimization for LLMs" from Cohere & Cohere For AI (John Dang, Arash Ahmadian, Kelly Marchisio, Julia Kreutzer, Ahmet Üstün, Sara Hooker) * Findings and insights: ✅ Multilingual LLMs benefit greatly from alignment via preference optimization with multilingual data. ✅ Preference optimization allows for cross-language generalization, and adding multilingual data significantly enhances these effects. ✅ Data quality pays off — the authors put in extra effort to guide synthetic data generation and avoid common pitfalls. * Why this resonates with us at Toloka: ➡️ The findings are in line with what we’ve seen firsthand in our multilingual studies and production projects. ➡️ Preference data is tremendously valuable, especially for specific domains and languages, where it can help achieve best-in-class performance for a given niche — outperforming large proprietary models on a client's use case, or outperforming generalist models in a specialized domain. ➡️ Data quality is crucial, and it requires purposefully designed, intelligent pipelines. At Toloka, we carefully craft synthetic data honed to our needs, but we push data quality one step further by incorporating human signals into our pipelines. * Ready to collect the best reinforcement learning data for your needs? Come talk to us: https://lnkd.in/gv8m5eY2 Read the paper: https://lnkd.in/ge8n4eCy #EMNLP24 #GenAI #MultilingualData #LLMalignment #RLHF
-
U-MATH and μ-MATH: Our pivotal LLM benchmarks for university-level mathematics are available now on Hugging Face. Here’s what sets the benchmarks apart: 🔹 Multimodality. 1,100 problems across 6 math subjects, with 20% of problems including graphs or visuals to interpret. 🔹 Complexity. Academic experts collaborating with #Gradarius designed university-level math problems based on curriculum from top US universities. 🔹 Subset to test judging skills. We use our meta-benchmark, μ-MATH, to choose the best LLM to judge U-MATH evaluations and discover judging biases. We tested small, large, and proprietary LLMs. Our findings: ➡️ LLMs need tremendous improvements in math reasoning and visual math tasks, across the board. ➡️ Gemini 1.5 Pro performed best out of the models we tested. But even Gemini could only solve 63% of text-only math tasks and just 45% of tasks with image processing, for an overall score of 60% on U-MATH. ➡️ Small, specialized models have an edge: #Qwen2.5-Math beat large models like #Llama and can even compete with #GPT4 and #Gemini. We created U-MATH in collaboration with Gradarius, an online platform for learning calculus, and academic experts from Stevens Institute of Technology. Learn more on the U-Math page: https://lnkd.in/gyAQCSEv Explore on Hugging Face (link in comments) #GenAI #LLMeval #AIbenchmarks #MathAI
-
📊 Poll Results Are In: 𝗕𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝗮𝗹𝗼𝗻𝗲 𝗮𝗿𝗲𝗻’𝘁 𝗲𝗻𝗼𝘂𝗴𝗵 - 𝗯𝘂𝘁 𝗻𝗲𝘄 𝗯𝗲𝗻𝗰𝗵𝗺𝗮𝗿𝗸𝘀 𝗰𝗼𝘂𝗹𝗱 𝗵𝗲𝗹𝗽 𝗯𝗿𝗶𝗱𝗴𝗲 𝘁𝗵𝗲 𝗴𝗮𝗽. Our recent poll confirmed what many in the AI community already know: measuring model performance requires more than current benchmarks can offer. Most teams are blending multiple evaluation types or leaning on human evaluation for deeper insights. The truth is, existing benchmarks fall short when it comes to real-world performance. To bridge the gap, we’re working on domain-targeted benchmarks that reflect industry needs. Here are some recent projects: ➡️ University-level math (U-MATH and μ-MATH benchmarks) ➡️ Natural sciences / STEM ➡️ Artificial text detection (Beemo benchmark) Read about our benchmarks: https://lnkd.in/eTjc7xbD The best evaluation strategies are nuanced and purpose-driven, tailored to the domain and end goals. Benchmarks are just a starting point. That’s why we take a hybrid approach to model evaluation: - Leveraging a global network of domain experts for deep insights. - Augmenting and assisting their work with fine-tuned LLMs for efficiency and precision. We can help you go beyond the basics of evaluation and fine-tuning. Connect with our team. #AIevaluation #MachineLearning #AIModels #AIbenchmarks
-
We’re so excited to be at #NeurIPS24! Stop by Booth #617 to say hi to our team: Mikhail Lazovskiy Jessica Sargent David Garabedian Natasha Kazachenko Dmitriy Kachin Students and researchers, come talk to us about our research fellowship! We’d also love to chat about: - Everything related to human and hybrid annotation - Synthetic data generation and automated feedback - Data quality and its impact on model performance - Data-efficient training techniques - Evaluation metrics and benchmarks - Agent systems and their applications - Safety and biases - LLM advancements in multimodality, reasoning, coding, multilinguality, and domain-specific tasks Enjoy the conference! 🎉
-
In case you missed EMNLP this year, we’ll be highlighting impactful research from the conference that made a big impression on us. 💡 First in line is research presented by Google: Beyond Human-Only: Evaluating Human-Machine Collaboration for Collecting High-Quality Translation Data by Zhongtao Liu, Parker Riley, Daniel Deutsch, Alison Lui, Mengmeng Niu, Apurva Chamaria, and Markus Freitag. This study evaluates 11 methods for collecting translation data, comparing human-only, machine-only, and hybrid approaches. The findings show that machine enhancements are uniquely effective with human-written translations, and vice versa — translations sourced from automatic systems benefit the most from human editing. As a result, hybrid methods come out on top in terms of quality and can even surpass human-only approaches at just 60% of the cost. Interestingly, at Toloka, we take a similar approach. Some of the techniques we use are very close to those described by the authors, combining human expertise with automation to build cost-effective and scalable pipelines for post-training datasets, including high-quality SFT and RLHF data. Respect to the authors for their excellent work. 👏 👏 Stay tuned for more #EMNLP2024 highlights! https://lnkd.in/ghyfFCWz