International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012 - researchr publication

researchr

You are not signed in
Sign in
Sign up

Utpal Banerjee, Kyle A. Gallivan, Gianfranco Bilardi, Manolis Katevenis, editors, International Conference on Supercomputing, ICS'12, Venice, Italy, June 25-29, 2012. ACM, 2012. [doi]

Conference: ics

Abstract is missing.

High performance supercomputers: should the individual processor be more than a brick?Yale N. Patt. 1-2 [doi]

Distributed replay protocol for distributed uniprocessorsMengjie Mao, Hong An, Bobin Deng, Tao Sun, Xuechao Wei, Wei Zhou, Wenting Han. 3-14 [doi]

Characterizing and improving the use of demand-fetched caches in GPUsWenhao Jia, Kelly A. Shaw, Margaret Martonosi. 15-24 [doi]

One stone two birds: synchronization relaxation and redundancy removal in GPU-CPU translationZiyu Guo, Bo Wu, Xipeng Shen. 25-36 [doi]

Fast loop-level data dependence profilingHongtao Yu, Zhiyuan Li. 37-46 [doi]

Apricot: an optimizing compiler and productivity tool for x86-compatible many-core coprocessorsNishkam Ravi, Yi Yang, Tao Bao, Srimat T. Chakradhar. 47-58 [doi]

UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management techniqueSomayeh Sardashti, David A. Wood. 59-68 [doi]

Fault tolerant preconditioned conjugate gradient for sparse linear system solutionManu Shantharam, Sowmyalatha Srinivasmurthy, Padma Raghavan. 69-78 [doi]

Data-driven fault tolerance for work stealing computationsWenjing Ma, Sriram Krishnamoorthy. 79-90 [doi]

Fault resilience of the algebraic multi-grid solverMarc Casas-Guix, Bronis R. de Supinski, Greg Bronevetsky, Martin Schulz. 91-100 [doi]

Overcoming single-thread performance hurdles in the core fusion reconfigurable multicore architectureJanani Mukundan, Saugata Ghose, Robert Karmazin, Engin Ipek, José F. Martínez. 101-110 [doi]

CVP: an energy-efficient indirect branch prediction with compiler-guided value patternMingxing Tan, Xianhua Liu, Tong Tong, Xu Cheng. 111-120 [doi]

Congestion avoidance on manycore high performance computing systemsMiao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu. 121-132 [doi]

Channel borrowing: an energy-efficient nanophotonic crossbar architecture with light-weight arbitrationYi Xu, Jun Yang 0002, Rami G. Melhem. 133-142 [doi]

HiRe: using hint & release to improve synchronization of speculative threadsLiang Han, Xiaowei Jiang, Wei Liu, Youfeng Wu, James Tuck. 143-152 [doi]

Enhancing the performance of assisted execution runtime systems through hardware/software techniquesGokcen Kestor, Roberto Gioiosa, Osman S. Unsal, Adrián Cristal, Mateo Valero. 153-162 [doi]

CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architecturesQuan Chen, Minyi Guo, Zhiyi Huang 0001. 163-172 [doi]

CRQ-based fair scheduling on composable multicore architecturesTao Sun, Hong An, Tao Wang, Haibo Zhang, Xiufeng Sui. 173-184 [doi]

Quantifying the effectiveness of load balance algorithmsOlga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz, Nancy M. Amato. 185-194 [doi]

Sparse matrix-vector multiply on the HICAMP architectureJohn P. Stevenson, Amin Firoozshahian, Alex Solomatnikov, Mark Horowitz, David R. Cheriton. 195-204 [doi]

On the communication complexity of 3D FFTs and its implications for ExascaleKenneth Czechowski, Casey Battaglino, Chris McClanahan, Kartik Iyer, P.-K. Yeung, Richard W. Vuduc. 205-214 [doi]

Composable, non-blocking collective operations on power7 IHGabriel Ilie Tanase, Gheorghe Almasi, Hanhong Xue, Charles Archer. 215-224 [doi]

Collective algorithms for sub-communicatorsAnshul Mittal, Nikhil Jain, Thomas George, Yogish Sabharwal, Sameer Kumar 0001. 225-234 [doi]

Space-round tradeoffs for MapReduce computationsAndrea Pietracaprina, Geppino Pucci, Matteo Riondato, Francesco Silvestri, Eli Upfal. 235-244 [doi]

Blue Gene/Q: design for sustained multi-petaflop computingMichael Gschwind. 245-246 [doi]

An analysis of computational workloads for the ORNL Jaguar systemWayne Joubert, Shi-Quan Su. 247-256 [doi]

Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunitiesNagendra Dwarakanath Gulur, R. Manikantan, Mahesh Mehendale, R. Govindarajan. 257-266 [doi]

Unified memory optimizing architecture: memory subsystem control with a unified predictorYasuo Ishii, Mary Inaba, Kei Hiraki. 267-278 [doi]

Locality & utility co-optimization for practical capacity management of shared last level cachesDongyuan Zhan, Hong Jiang, Sharad C. Seth. 279-290 [doi]

Exploiting communication and packaging locality for cost-effective large scale networksKeith D. Underwood, Eric Borch. 291-300 [doi]

Hardware support for enforcing isolation in lock-based parallel programsParuj Ratanaworabhan, Martin Burtscher, Darko Kirovski, Benjamin G. Zorn. 301-310 [doi]

High-performance code generation for stencil computations on GPU architecturesJustin Holewinski, Louis-Noël Pouchet, P. Sadayappan. 311-320 [doi]

An efficient work-distribution strategy for gridding radio-telescope data on GPUsJohn W. Romein. 321-330 [doi]

GPU merge path: a GPU merging algorithmOded Green, Robert McColl, David A. Bader. 331-340 [doi]

SnuCL: an OpenCL framework for heterogeneous CPU/GPU clustersJungwon Kim, Sangmin Seo, Jun Lee, Jeongho Nah, Gangwon Jo, Jaejin Lee. 341-352 [doi]

clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUsBor-Yiing Su, Kurt Keutzer. 353-364 [doi]

Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systemsFengguang Song, Stanimire Tomov, Jack Dongarra. 365-376 [doi]

An optimized large-scale hybrid DGEMM design for CPUs and ATI GPUsJiajia Li, Xingjian Li, Guangming Tan, Mingyu Chen, Ninghui Sun. 377-386 [doi]

runs on WebDSL

翻译：