In this advanced use case, #VeniceDB is used to merge batch and stream inputs, and to transport the data to serving nodes which load it into GPUs, thus updating their ML model in near real time. Check out the paper for more details!
This week we are presenting our paper "LiNR: Model Based Neural Retrieval on GPUs at LinkedIn" accepted at CIKM 2024 (https://lnkd.in/gUrWqRcD). Please stop by and say hi to Aman Gupta, who will be there in person :) We discuss our experiences and challenges in creating scalable, differentiable search indexes using TensorFlow and PyTorch at production scale. In LiNR, both items and model weights are integrated into the model binary. Viewing index construction as a form of model training, we describe scaling our system for large indexes, incorporating full scans and efficient filtering. We believe LiNR represents one of the industry's first Live-updated model-based retrieval indexes at production scale. Talented co-authors include Fedor Borisyuk, Qingquan Song, Mingzhou Zhou, Ganesh Parameswaran, Madhulekha Arun, Siva P., Tugrul Bingol, Zhoutao Pei, Stanley(Kuang) Lee, Lu Z., Hugh Shao, Syed Ali Naqvi, Sen Zhou, Aman Gupta