Abstract is missing.
- Sparsity in Deep Neural Nets (Keynote)Nir Shavit. 1 [doi]
- Scaling Up Transactions with Slower ClocksPedro Ramalhete, Andreia Correia. 2-16 [doi]
- Locks as a Resource: Fairly Scheduling Lock Occupation with CFLJonggyu Park, Young Ik Eom. 17-29 [doi]
- Are Your Epochs Too Epic? Batch Free Can Be HarmfulDaewoo Kim, Trevor Brown 0001, Ajay Singh 0002. 30-41 [doi]
- Liger: Interleaving Intra- and Inter-Operator Parallelism for Distributed Large Model InferenceJiangsu Du, Jinhui Wei, Jiazhi Jiang, Shenggan Cheng, Dan Huang, Zhiguang Chen, Yutong Lu. 42-54 [doi]
- A Holistic Approach to Automatic Mixed-Precision Code Generation and Tuning for Affine ProgramsJinchen Xu, Guanghui Song, Bei Zhou, Fei Li, Jiangwei Hao, Jie Zhao 0002. 55-67 [doi]
- Language-Agnostic Static Deadlock Detection for FuturesStefan K. Muller. 68-79 [doi]
- Recurrence Analysis for Automatic Parallelization of Subscripted SubscriptsAkshay Bhosale, Rudolf Eigenmann. 80-93 [doi]
- OsirisBFT: Say No to Task Replication for Scalable Byzantine Fault Tolerant AnalyticsKasra Jamshidi, Keval Vora. 94-108 [doi]
- Towards Scalable Unstructured Mesh Computations on Shared Memory Many-CoresHaozhong Qiu, Chuanfu Xu, Jianbin Fang, Liang Deng, Jian Zhang, Qingsong Wang, Yue Ding 0001, Zhe Dai, Yonggang Che, Shizhao Chen, Jie Liu. 109-119 [doi]
- Extreme-scale Direct Numerical Simulation of Incompressible Turbulence on the Heterogeneous Many-core SystemJiabin Xie, Guangnan Feng, Han Huang, Junxuan Feng, Zhiguang Chen, Yutong Lu. 120-132 [doi]
- Pure: Evolving Message Passing To Better Leverage Shared Memory Within NodesJames Psota, Armando Solar-Lezama. 133-146 [doi]
- INFINEL: An efficient GPU-based processing method for unpredictable large output graph queriesSungwoo Park, Seyeon Oh, Min-Soo Kim. 147-159 [doi]
- GraphCube: Interconnection Hierarchy-aware Graph ProcessingXinbiao Gan, Guang Wu, Shenghao Qiu, Feng Xiong, Jiaqi Si, Jianbin Fang, Dezun Dong, Chunye Gong, Tiejun Li, Zheng Wang. 160-174 [doi]
- Exploiting Fine-Grained Redundancy in Set-Centric Graph Pattern MiningZhiheng Lin, Ke Meng, Chaoyang Shui, Kewei Zhang, Junmin Xiao, Guangming Tan. 175-187 [doi]
- Memory Bounds for Concurrent Bounded QueuesVitaly Aksenov, Nikita Koval, Petr Kuznetsov, Anton Paramonov. 188-199 [doi]
- VERLIB: Concurrent Versioned PointersGuy E. Blelloch, Yuanhao Wei. 200-214 [doi]
- Practical Hardware Transactional vEB TreesMohammad Khalaji, Trevor Brown 0001, Khuzaima Daudjee, Vitaly Aksenov. 215-228 [doi]
- Tetris: Accelerating Sparse Convolution by Exploiting Memory Reuse on GPUXiaoyan Liu, Xuegui Zheng, Hailong Yang, Zhongzhi Luan, Depei Qian. 229-242 [doi]
- Shared Memory-contention-aware Concurrent DNN Execution for Diversely Heterogeneous System-on-ChipsIsmet Dagli, Mehmet E. Belviranli. 243-256 [doi]
- Training one DeePMD Model in Minutes: a Step towards Online LearningSiyu Hu, Tong Zhao, Qiuchen Sha, Enji Li, Xiangyu Meng, Liping Liu, Lin-Wang Wang, Guangming Tan, Weile Jia. 257-269 [doi]
- ParlayANN: Scalable and Deterministic Parallel Graph-Based Approximate Nearest Neighbor Search AlgorithmsMagdalen Dobson Manohar, Zheqi Shen, Guy E. Blelloch, Laxman Dhulipala, Yan Gu 0001, Harsha Vardhan Simhadri, Yihan Sun 0001. 270-285 [doi]
- Parallel k-Core Decomposition with Batched Updates and Asynchronous ReadsQuanquan C. Liu, Julian Shun, Igor Zablotchi. 286-300 [doi]
- Parallel Integer Sort: Theory and PracticeXiaojun Dong 0001, Laxman Dhulipala, Yan Gu 0001, Yihan Sun. 301-315 [doi]
- Fast American Option Pricing using Nonlinear StencilsZafar Ahmad, Reilly Browne, Rezaul Chowdhury, Rathish Das, Yushen Huang, Yimin Zhu. 316-332 [doi]
- ConvStencil: Transform Stencil Computation to Matrix Multiplication on Tensor CoresYuetao Chen, Kun Li, Yuhao Wang, Donglin Bai, Lei Wang, Lingxiao Ma, Liang Yuan, Yunquan Zhang, Ting Cao, Mao Yang. 333-347 [doi]
- CPMA: An Efficient Batch-Parallel Compressed Set Without PointersBrian Wheatman, Randal C. Burns, Aydin Buluç, Helen Xu 0001. 348-363 [doi]
- Gallatin: A General-Purpose GPU Memory ManagerHunter McCoy, Prashant Pandey 0001. 364-376 [doi]
- A Row Decomposition-based Approach for Sparse Matrix Multiplication on GPUsMeng Pang, Xiang Fei, Peng Qu, Youhui Zhang, Zhaolin Li. 377-389 [doi]
- Fast Kronecker Matrix-Matrix Multiplication on GPUsAbhinav Jangda, Mohit Yadav. 390-403 [doi]
- Arrow Matrix Decomposition: A Novel Approach for Communication-Efficient Sparse Matrix MultiplicationLukas Gianinazzi, Alexandros Nikolaos Ziogas, Langwen Huang, Piotr Luczynski, Saleh Ashkboosh, Florian Scheidl, Armon Carigiet, Chio Ge, Nabil Abubaker, Maciej Besta, Tal Ben-Nun, Torsten Hoefler. 404-416 [doi]
- FastFold: Optimizing AlphaFold Training and Inference on GPU ClustersShenggan Cheng, Xuanlei Zhao, Guangyang Lu, Jiarui Fang, Tian Zheng, Ruidong Wu, Xiwen Zhang, Jian Peng 0007, Yang You 0001. 417-430 [doi]
- AGAThA: Fast and Efficient GPU Acceleration of Guided Sequence Alignment for Long Read MappingSeongyeon Park, JungUk Hong, Jaeyong Song, Hajin Kim, Youngsok Kim, Jinho Lee. 431-444 [doi]
- POSTER: Accelerating High-Precision Integer Multiplication used in Cryptosystems with GPUsZhuoran Ji, Zhaorui Zhang, Jiming Xu, Lei Ju. 445-447 [doi]
- POSTER: Enabling Extreme-Scale Phase Field Simulation with In-situ Feature ExtractionZhichen Feng, Jialin Li, Yaqian Gao, Shaobo Tian, Huang Ye, Jian Zhang. 448-450 [doi]
- POSTER: FineCo: Fine-grained Heterogeneous Resource Management for Concurrent DNN InferencesLixian Ma, Haoruo Chen, En Shao, Leping Wang, Quan Chen, Guangming Tan. 451-453 [doi]
- POSTER: Optimizing Collective Communications with Error-bounded Lossy Compression for GPU ClustersJiajun Huang, Sheng Di, Xiaodong Yu 0001, Yujia Zhai, Jinyang Liu, Yafan Huang, Ken Raffenetti, Hui Zhou, Kai Zhao 0008, Zizhong Chen, Franck Cappello, Yanfei Guo, Rajeev Thakur. 454-456 [doi]
- POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table DesignGuofeng Feng, Weile Jia, Ninghui Sun, Guangming Tan, Jiajia Li 0001. 457-459 [doi]
- POSTER: LLM-PQ: Serving LLM on Heterogeneous Clusters with Phase-Aware Partition and Adaptive QuantizationJuntao Zhao, Borui Wan, Chuan Wu, Yanghua Peng, Haibin Lin. 460-462 [doi]
- POSTER: OCToPus: Semantic-aware Concurrency Control for Blockchain TransactionsdePaul Miller, Henry F. Korth, Roberto Palmieri. 463-465 [doi]
- POSTER: Pattern-Aware Sparse Communication for Scalable Recommendation Model TrainingJiaao He, Shengqi Chen 0001, Jidong Zhai. 466-468 [doi]
- POSTER: ParGNN: Efficient Training for Large-Scale Graph Neural Network on GPU ClustersShunde Li, Junyu Gu, Jue Wang 0013, Tiechui Yao, Zhiqiang Liang, Yumeng Shi, Shigang Li, Weiting Xi, Shushen Li, Chunbao Zhou, Yangang Wang, Xuebin Chi. 469-471 [doi]
- POSTER: RadiK: Scalable Radix Top-K Selection on GPUsYifei Li, Bole Zhou, Jiejing Zhang, Xuechao Wei, Yinghan Li, Yingda Chen. 472-474 [doi]
- POSTER: RELAX: Durable Data Structures with Swift RecoveryAlmog Zur, Nachshon Cohen, Michal Friedman 0001, Erez Petrank. 475-476 [doi]
- POSTER: StructMG: A Fast and Scalable Structured MultigridYi Zong, Xinliang Wang, Haopeng Huang, Chensong Zhang, Xiaowen Xu, Jian Sun, Bowen Yan, Qin Wang, Sicong Li, Zhaohui Ding, Wei Xue. 478-480 [doi]