Data-Juicer: A One-Stop Data Processing System for Large Language Models

Daoyuan Chen, Yilun Huang 0004, Zhijian Ma, Hesen Chen, Xuchen Pan, Ce Ge, Dawei Gao, Yuexiang Xie, Zhaoyang Liu, Jinyang Gao, Yaliang Li, Bolin Ding, Jingren Zhou. Data-Juicer: A One-Stop Data Processing System for Large Language Models. In Pablo Barceló, Nayat Sánchez Pi, Alexandra Meliou, S. Sudarshan 0001, editors, Companion of the 2024 International Conference on Management of Data, SIGMOD/PODS 2024, Santiago AA, Chile, June 9-15, 2024. pages 120-134, ACM, 2024. [doi]

Authors

Daoyuan Chen

This author has not been identified. Look up 'Daoyuan Chen' in Google

Yilun Huang 0004

This author has not been identified. Look up 'Yilun Huang 0004' in Google

Zhijian Ma

This author has not been identified. Look up 'Zhijian Ma' in Google

Hesen Chen

This author has not been identified. Look up 'Hesen Chen' in Google

Xuchen Pan

This author has not been identified. Look up 'Xuchen Pan' in Google

Ce Ge

This author has not been identified. Look up 'Ce Ge' in Google

Dawei Gao

This author has not been identified. Look up 'Dawei Gao' in Google

Yuexiang Xie

This author has not been identified. Look up 'Yuexiang Xie' in Google

Zhaoyang Liu

This author has not been identified. Look up 'Zhaoyang Liu' in Google

Jinyang Gao

This author has not been identified. Look up 'Jinyang Gao' in Google

Yaliang Li

This author has not been identified. Look up 'Yaliang Li' in Google

Bolin Ding

This author has not been identified. Look up 'Bolin Ding' in Google

Jingren Zhou

This author has not been identified. Look up 'Jingren Zhou' in Google
  翻译: