Abstract is missing.
- High performance supercomputers: should the individual processor be more than a brick?Yale N. Patt. 1-2 [doi]
- Distributed replay protocol for distributed uniprocessorsMengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han. 3-14 [doi]
- Characterizing and improving the use of demand-fetched caches in GPUsWenhao Jia, Kelly A. Shaw, Margaret Martonosi. 15-24 [doi]
- One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translationZiyu Guo, Bo Wu, Xipeng Shen. 25-36 [doi]
- Fast loop-level data dependence profilingHongtao Yu, Zhiyuan Li. 37-46 [doi]
- Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessorsNishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar. 47-58 [doi]
- UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management techniqueSomayeh Sardashti, David A. Wood. 59-68 [doi]
- Fault tolerant preconditioned conjugate gradient for sparse linear system solutionManu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan. 69-78 [doi]
- Data-driven fault tolerance for work stealing computationsWenjing Ma, Sriram Krishnamoorthy. 79-90 [doi]
- Fault resilience of the algebraic multi-grid solverMarc Casas-Guix, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz. 91-100 [doi]
- Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architectureJanani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez. 101-110 [doi]
- CVP: an energy-efficient indirect branch prediction with compiler-guided value patternMingxing Tan, Xianhua Liu, Tong Tong, Xu Cheng. 111-120 [doi]
- Congestion avoidance on manycore high performance computing systemsMiao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu. 121-132 [doi]
- Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitrationYi Xu, Jun Yang 0002, Rami G. Melhem. 133-142 [doi]
- HiRe: using hint & release to improve synchronization of speculative threadsLiang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck. 143-152 [doi]
- Enhancing the performance of assisted execution runtime systems through hardware/software techniquesGokcen Kestor, Roberto Gioiosa, Osman S. Unsal, Adrián Cristal, Mateo Valero. 153-162 [doi]
- CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architecturesQuan Chen, Minyi Guo, Zhiyi Huang 0001. 163-172 [doi]
- CRQ-based fair scheduling on composable multicore architecturesTao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui. 173-184 [doi]
- Quantifying the effectiveness of load balance algorithmsOlga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato. 185-194 [doi]
- Sparse matrix-vector multiply on the HICAMP architectureJohn P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton. 195-204 [doi]
- On the communication complexity of 3D FFTs and its implications for ExascaleKenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard W. Vuduc. 205-214 [doi]
- Composable, non-blocking collective operations on power7 IHGabriel Ilie Tanase, Gheorghe Almasi, Hanhong Xue, Charles Archer. 215-224 [doi]
- Collective algorithms for sub-communicatorsAnshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar 0001. 225-234 [doi]
- Space-round tradeoffs for MapReduce computationsAndrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal. 235-244 [doi]
- Blue Gene/Q: design for sustained multi-petaflop computingMichael Gschwind. 245-246 [doi]
- An analysis of computational workloads for the ORNL Jaguar systemWayne Joubert, Shi-Quan Su. 247-256 [doi]
- Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunitiesNagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan. 257-266 [doi]
- Unified memory optimizing architecture: memory subsystem control with a unified predictorYasuo Ishii, Mary Inaba, Kei Hiraki. 267-278 [doi]
- Locality & utility co-optimization for practical capacity management of shared last level cachesDongyuan Zhan, Hong Jiang, Sharad C. Seth. 279-290 [doi]
- Exploiting communication and packaging locality for cost-effective large scale networksKeith D. Underwood, Eric Borch. 291-300 [doi]
- Hardware support for enforcing isolation in lock-based parallel programsParuj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn. 301-310 [doi]
- High-performance code generation for stencil computations on GPU architecturesJustin Holewinski, Louis-Noël Pouchet, P. Sadayappan. 311-320 [doi]
- An efficient work-distribution strategy for gridding radio-telescope data on GPUsJohn W. Romein. 321-330 [doi]
- GPU merge path: a GPU merging algorithmOded Green, Robert McColl, David A. Bader. 331-340 [doi]
- SnuCL: an OpenCL framework for heterogeneous CPU/GPU clustersJungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee. 341-352 [doi]
- clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUsBor-Yiing Su, Kurt Keutzer. 353-364 [doi]
- Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systemsFengguang Song, Stanimire Tomov, Jack Dongarra. 365-376 [doi]
- An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUsJiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun. 377-386 [doi]