Abstract is missing.
- ElasticRoom: Multi-Tenant DNN Inference Engine via Co-design with Resource-constrained Compilation and Strong Priority SchedulingLixian Ma, Haoruo Chen, En Shao, Leping Wang, Quan Chen 0002, Guangming Tan. 1-14 [doi]
- GNNOne: A Unified System Optimizations for GNN KernelsYidong Gong, Pradeep Kumar. 15-27 [doi]
- Efficient all-to-all Collective Communication Schedules for Direct-connect TopologiesPrithwish Basu, Liangyu Zhao, Jason Fantl, Siddharth Pal, Arvind Krishnamurthy, Joud Khoury. 28-41 [doi]
- ESG: Pipeline-Conscious Efficient Scheduling of DNN Workflows on Serverless Platforms with Shareable GPUsXinning Hui, Yuanchao Xu 0001, Zhishan Guo, Xipeng Shen. 42-55 [doi]
- ETS: Deep Learning Training Iteration Time Prediction based on Execution Trace Sliding WindowZichao Yang, Hao Guo, Heng Wu, Yuewen Wu, Hua Zhong 0007, Wenbo Zhang 0006, Chuan Zhou, Yan Liu. 56-68 [doi]
- IDT: Intelligent Data Placement for Multi-tiered Main Memory with Reinforcement LearningJuneseo Chang, Wanju Doh, Yaebin Moon, Eojin Lee, Jung Ho Ahn. 69-82 [doi]
- FPBOXer: Efficient Input-Generation for Targeting Floating-Point Exceptions in GPU ProgramsAnh Tran, Ignacio Laguna, Ganesh Gopalakrishnan. 83-93 [doi]
- FaaSKeeper: Learning from Building Serverless Services with ZooKeeper as an ExampleMarcin Copik, Alexandru Calotoiu, Pengyu Zhou, Konstantin Taranov, Torsten Hoefler. 94-108 [doi]
- A Portable, Fast, DCT-based Compressor for AI AcceleratorsMilan Shah, Xiaodong Yu 0001, Sheng Di, Michela Becchi, Franck Cappello. 109-121 [doi]
- Accelerating Function-Centric Applications by Discovering, Distributing, and Retaining Reusable Context in Workflow SystemsThanh Son Phung, Colin Thomas, Logan T. Ward, Kyle Chard, Douglas Thain. 122-134 [doi]
- ADTopk: All-Dimension Top-k Compression for High-Performance Data-Parallel DNN TrainingZhangqiang Ming, Yuchong Hu, Wenxiang Zhou, Xinjue Zheng, Chenxuan Yao, Dan Feng 0001. 135-147 [doi]
- EvoStore: Towards Scalable Storage of Evolving Learning ModelsRobert Underwood, Meghana Madhyastha, Randal C. Burns, Bogdan Nicolae. 148-159 [doi]
- HAM-SpMSpV: an Optimized Parallel Algorithm for Masked Sparse Matrix-Sparse Vector Multiplications on multi-core CPUsLei Xu 0023, Haipeng Jia, Yunquan Zhang, Luhan Wang, Xianmeng Jiang. 160-173 [doi]
- Faast: An Efficient Serverless Framework Made Snapshot-based Function Response FastYongshu Bai, Zhihui Yang, Feng Gao. 174-185 [doi]
- DLHT: A Non-blocking Resizable Hashtable with Fast Deletes and Memory-awarenessAntonios Katsarakis, Vasilis Gavrielatos, Nikos Ntarmos. 186-199 [doi]
- Extending Sparse Patterns to Improve Inverse Preconditioning on GPU ArchitecturesSergi Laut, Ricard Borrell, Marc Casas. 200-213 [doi]
- FaaSRail: Employing Real Workloads to Generate Representative Load for Serverless ResearchChristos Katsakioris, Chloe Alverti, Konstantinos Nikas, Dimitrios Siakavaras, Stratos Psomadakis, Nectarios Koziris. 214-226 [doi]
- DataStates-LLM: Lazy Asynchronous Checkpointing for Large Language ModelsAvinash Maurya, Robert Underwood, M. Mustafa Rafique, Franck Cappello, Bogdan Nicolae. 227-239 [doi]
- Reinforcement Learning-based Adaptive Mitigation of Uncorrected DRAM Errors in the FieldIsaac Boixaderas, Sergi Moré, Javier Bartolome, David Vicente, Petar Radojkovic, Paul M. Carpenter, Eduard Ayguadé. 240-252 [doi]
- FASOP: Fast yet Accurate Automated Search for Optimal Parallelization of Transformers on Heterogeneous GPU ClustersSunyeol Hwang, Eungyeong Lee, Hongseok Oh, Youngmin Yi. 253-266 [doi]
- Loki: A System for Serving ML Inference Pipelines with Hardware and Accuracy ScalingSohaib Ahmad, Hui Guan 0001, Ramesh K. Sitaraman. 267-280 [doi]
- Can Large Language Models Write Parallel Code?Daniel Nichols, Joshua Hoke Davis, Zhaojun Xie, Arjun Rajaram, Abhinav Bhatele. 281-294 [doi]
- ScaleDFS: Accelerating Decentralized and Private File Sharing via Scaling Directed Acyclic Graph ProcessingMansub Song, Lan Anh Nguyen, Sunggon Kim, Hyeonsang Eom, Yongseok Son. 295-308 [doi]
- CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu 0001, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie, Franck Cappello. 309-321 [doi]
- SIMCoV-GPU: Accelerating an Agent-Based Model for ExascaleKirtus G. Leyba, Steven A. Hofmeyr, Stephanie Forrest, Judy L. Cannon, Melanie E. Moses. 322-333 [doi]
- Near-Optimal Wafer-Scale ReducePiotr Luczynski, Lukas Gianinazzi, Patrick Iff, Leighton Wilson, Daniele De Sensi, Torsten Hoefler. 334-347 [doi]
- A Practical Introduction to Quantum Computing and NetworkingClaudio Cicconetti. 348-349 [doi]
- Tutorial on Variational Quantum Algorithms for Resource Management in Cloud/Edge ArchitecturesCarlo Mastroianni, Andrea Vinci. 350-351 [doi]
- Programming Tools for High-Performance Data AnalysisDomenico Talia, Paolo Trunfio. 352-355 [doi]
- Network Management and Orchestration with Data Engineering: A Practical GuideEngin Zeydan, Josep Mangues, Jorge Baranda. 356-357 [doi]
- k-Dispatch: Enabling Cost-Optimized Biomedical Workflow OffloadingMarta Jaros, Jirí Jaros. 358-360 [doi]
- Techniques for Efficient Fourier Transform Computation in Ultrasound SimulationsOndrej Olsak, Jirí Jaros. 361-363 [doi]
- K-RAF: A Kubernetes-based Resource Augmentation Framework for Edge DevicesYoungwoo Jang, Jiseob Byun, Soonbeom Kwon, Illyoung Choi, Dukyun Nam, Byungchul Tak, Gap-Joo Na, Young-Kyoon Suh. 364-366 [doi]
- Swarm Storm: An Automated Chaos Tool for Docker Swarm ApplicationsTravis Higgins, Devki Nandan Jha, Rajiv Ranjan 0001. 367-369 [doi]
- Acceleration of Ultrasound Neurostimulation Using Mixed-Precision ArithmeticJirí Jaros, Radek Duchon. 370-372 [doi]
- Constrained Approximate Query Processing with Error and Response Time-Bound Guarantees for Efficient Big Data AnalyticsSungsoo Kim, Choon Seo Park, Taewhi Lee, Kihyuk Nam. 373-376 [doi]
- Seamless HW-accelerated AI serving in heterogeneous MEC Systems with AI@EDGEAchilleas Tzenetopoulos, George Lentaris, Aimilios Leftheriotis, Panos Chrysomeris, Javier Palomares, Estefanía Coronado, Raman Kazhamiakin, Dimitrios Soudris. 377-380 [doi]
- TEACHING Platform for Human-Centric Autonomous Applications: Design and OverviewValerio De Caro, Christos Chronis, Massimo Coppola, Vincenzo Lomonaco, Claudio Gallicchio, Konstantinos Tserpes, Davide Bacciu. 381-384 [doi]
- EMPYREAN: Trustworthy, Cognitive and AI-driven Collaborative Associations of IoT Devices and Edge Resources for Data ProcessingAristotelis Kretsis, Panagiotis C. Kokkinos, Emmanouel A. Varvarigos, Dimitris Syrivelis, Paraskevas Bakopoulos, Márton Sipos, Marcel Fehér, Daniel Enrique Lucani, José Manuel Bernabé Murcia, Antonio F. Skarmeta, Ivan Paez, Luca Cominardi, Michael Mercier, Pedro Velho, Yiannis Georgiou, Charalampos Mainas, Anastassios Nanos, Javier Martin, Aitor Fernández Gómez, Roberto Gonzalez, Panos Ilias, Theodoros Chalazas, Keshav Chintamani. 385-388 [doi]
- Fast, Accurate and Distributed Simulation of novel HPC systems incorporating ARM and RISC-V CPUsNikolaos Tampouratzis, Ioannis Papaefstathiou. 389-392 [doi]
- EDGELESS: A Software Architecture for Stateful FaaS at the EdgeClaudio Cicconetti, Emanuele Carlini, Raphael Hetzel, Richard Mortier, Antonio Paradell, Markus Sauer. 393-396 [doi]
- Towards a Comprehensive Approach to Resource and Conflict Management in Cloud-Edge SettingsJacopo Massa. 397-400 [doi]
- A runtime infrastructure for the Continuum of ComputingEdoardo Tinto, Tullio Vardanega. 401-404 [doi]
- Trade-off Analysis between Knowledge Distillation and Federated Learning in Distributed Edge SystemMolo Mbasa Joaquim. 405-408 [doi]
- Efficient Stream Join Processing: Novel Approaches and ChallengesAdeel Aslam, Giovanni Simonini. 409-412 [doi]
- Semantic-Aware Log Understanding and AnalysisShaohan Huang, Zhongzhi Luan. 413-416 [doi]
- Full-Stack Revision of Memory and Data Management in PDES on Multi-Core MachinesFederica Montesano. 417-420 [doi]