Data-Juicer: A One-Stop Data Processing System for Large Language Models

Daoyuan Chen, Yilun Huang 0004, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou. Data-Juicer: A One-Stop Data Processing System for Large Language Models. In Pablo Barceló, Nayat Sánchez Pi, Alexandra Meliou, S. Sudarshan 0001, editors, Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024. pages 120-134, ACM, 2024. [doi]

Abstract

Abstract is missing.

  翻译: