Abstract is missing.
- Interval-based memory reclamationHaosen Wen, Joseph Izraelevitz, Wentao Cai, H. Alan Beadle, Michael L. Scott. 1-13 [doi]
- Harnessing epoch-based reclamation for efficient range queriesMaya Arbel-Raviv, Trevor Brown 0001. 14-27 [doi]
- A persistent lock-free queue for non-volatile memoryMichal Friedman, Maurice Herlihy, Virendra J. Marathe, Erez Petrank. 28-40 [doi]
- Superneurons: dynamic GPU memory management for training deep neural networksLinnan Wang, Jinmian Ye, Yiyang Zhao, Wei Wu, Ang Li, Shuaiwen Leon Song, Zenglin Xu, Tim Kraska. 41-53 [doi]
- Juggler: a dependence-aware task-based execution framework for GPUsMehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, Laxmi N. Bhuyan. 54-67 [doi]
- HPVM: heterogeneous parallel virtual machineMaria Kotsifakou, Prakalp Srivastava, Matthew D. Sinclair, Rakesh Komuravelli, Vikram S. Adve, Sarita V. Adve. 68-80 [doi]
- Hierarchical memory management for mutable stateAdrien Guatto, Sam Westrick, Ram Raghunathan, Umut A. Acar, Matthew Fluet. 81-93 [doi]
- Bridging the gap between deep learning and sparse matrix format selectionYue Zhao, Jiajia Li, Chunhua Liao, Xipeng Shen. 94-108 [doi]
- Optimizing N-dimensional, winograd-based convolution for manycore CPUsZhen Jia, Aleksandar Zlateski, Fredo Durand, Kai Li. 109-123 [doi]
- vSensor: leveraging fixed-workload snippets of programs for performance variance detectionXiongchao Tang, Jidong Zhai, Xuehai Qian, Bingsheng He, Wei Xue, Wenguang Chen. 124-136 [doi]
- Cache-tries: concurrent lock-free hash tries with constant-time operationsAleksandar Prokopec. 137-151 [doi]
- Featherlight on-the-fly false-sharing detectionMilind Chabbi, Shasha Wen, Xu Liu. 152-167 [doi]
- Register optimizations for stencils on GPUsPrashant Singh Rawat, Fabrice Rastello, Aravind Sukumaran-Rajam, Louis-Noël Pouchet, Atanas Rountev, P. Sadayappan. 168-182 [doi]
- FlashR: parallelize and scale R for machine learning using SSDsDa Zheng, Disa Mhembere, Joshua T. Vogelstein, Carey E. Priebe, Randal C. Burns. 183-194 [doi]
- DisCVar: discovering critical variables using algorithmic differentiation for transient faultsHarshitha Menon, Kathryn Mohror. 195-206 [doi]
- Practical concurrent traversals in search treesDana Drachsler-Cohen, Martin T. Vechev, Eran Yahav. 207-218 [doi]
- Communication-avoiding parallel minimum cuts and connected componentsLukas Gianinazzi, Pavel Kalvoda, Alessandro De Palma, Maciej Besta, Torsten Hoefler. 219-232 [doi]
- Safe privatization in transactional memoryArtem Khyzha, Hagit Attiya, Alexey Gotsman, Noam Rinetzky. 233-245 [doi]
- Making pull-based graph processing performantSamuel Grossman, Heiner Litz, Christos Kozyrakis. 246-260 [doi]
- An effective fusion and tile size model for optimizing image processing pipelinesAbhinav Jangda, Uday Bondhugula. 261-275 [doi]
- Lazygraph: lazy data coherency for replicas in distributed graph-parallel computationLei Wang, Liangji Zhuang, Junhang Chen, Huimin Cui, Fang Lv, Ying Liu, Xiaobing Feng 0002. 276-289 [doi]
- PAM: parallel augmented mapsYihan Sun, Daniel Ferizovic, Guy E. Blelloch. 290-304 [doi]
- Efficient shuffle management with SCache for DAG computing frameworksZhouwang Fu, Tao Song, Zhengwei Qi, Haibing Guan. 305-316 [doi]
- High-performance genomic analysis framework with in-memory computingXueqi Li, Guangming Tan, Bingchen Wang, Ninghui Sun. 317-328 [doi]
- Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelismYang Liu 0044, Jianguo Wang 0001, Steven Swanson. 327-337 [doi]
- swSpTRSV: a fast sparse triangular solve with sparse level tile layout on sunway architecturesXinliang Wang, Weifeng Liu 0002, Wei Xue, Li Wu. 338-353 [doi]
- VerifiedFT: a verified, high-performance precise dynamic race detectorJames R. Wilcox, Cormac Flanagan, Stephen N. Freund. 354-367 [doi]
- Efficient parallel determinacy race detection for two-dimensional dagsYifan Xu, I-Ting Angelina Lee, Kunal Agrawal. 368-380 [doi]
- Performance challenges in modular parallel programsUmut A. Acar, Vitaly Aksenov, Arthur Charguéraud, Mike Rainey. 381-382 [doi]
- Reducing the burden of parallel loop schedulers for many-core processorsMahwish Arif, Hans Vandierendonck. 383-384 [doi]
- Reducing transaction aborts by looking to the futureNachshon Cohen, Erez Petrank, James R. Larus. 385-386 [doi]
- Strong trylocks for reader-writer locksAndreia Correia, Pedro Ramalhete. 387-388 [doi]
- SecureMR: secure mapreduce using homomorphic encryption and program partitioningYao Dong, Ana Milanova, Julian Dolby. 389-390 [doi]
- A scalable distance-1 vertex coloring algorithm for power-law graphsJesun Sahariar Firoz, Marcin Zalewski, Andrew Lumsdaine. 391-392 [doi]
- Shared-memory parallelization of MTTKRP for dense tensorsKoby Hayashi, Grey Ballard, Yujie Jiang, Michael J. Tobia. 393-394 [doi]
- Revealing parallel scans and reductions in sequential loops through function reconstructionPeng Jiang, Gagan Agrawal. 395-396 [doi]
- Performance modeling for GPUs using abstract kernel emulationChangwan Hong, Aravind Sukumaran-Rajam, Jinsung Kim, Prashant Singh Rawat, Sriram Krishnamoorthy, Louis-Noël Pouchet, Fabrice Rastello, P. Sadayappan. 397-398 [doi]
- Two concurrent data structures for efficient datalog query processingHerbert Jordan, Bernhard Scholz, Pavle Subotic. 399-400 [doi]
- A scalable queue for work distribution on GPUsBernhard Kerbl, Jörg Müller 0001, Michael Kenzel, Dieter Schmalstieg, Markus Steinberger. 401-402 [doi]
- Designing scalable FPGA architectures using high-level synthesisJohannes de Fine Licht, Michaela Blott, Torsten Hoefler. 403-404 [doi]
- Layrub: layer-centric GPU memory reuse and data migration in extreme-scale deep learning systemsBo Liu, Wenbin Jiang, Hai Jin, Xuanhua Shi, Yang Ma. 405-406 [doi]
- Register-based implementation of the sparse general matrix-matrix multiplication on GPUsJunhong Liu, Xin He, Weifeng Liu 0002, Guangming Tan. 407-408 [doi]
- Quantifying and reducing execution variance in STM via model driven commit optimizationGirish Mururu, Ada Gavrilovska, Santosh Pande. 409-410 [doi]
- Transparent GPU memory management for DNNsJung-Ho Park, Hyungmin Cho, Wookeun Jung, Jaejin Lee. 411-412 [doi]
- Stamp-it, amortized constant-time memory reclamation in comparison to five other schemesManuel Pöter, Jesper Larsson Träff. 413-414 [doi]
- A predictable synchronisation algorithmStefan Reif, Wolfgang Schröder-Preikschat. 415-416 [doi]
- Automated code acceleration targeting heterogeneous openCL devicesHeinrich Riebler, Gavin Vaz, Tobias Kenter, Christian Plessl. 417-418 [doi]
- Graph partitioning applied to DAG scheduling to reduce NUMA effectsIsaac Sánchez Barrera, Marc Casas, Miquel Moretó, Eduard Ayguadé, Jesús Labarta, Mateo Valero. 419-420 [doi]
- A microbenchmark to study GPU performance modelsVasily Volkov. 421-422 [doi]
- SIMD code generation for stencils on brick decompositionsTuowen Zhao, Mary W. Hall, Protonu Basu, Samuel Williams, Hans Johansen. 423-424 [doi]