Abstract is missing.
- HPC for the human brain projectThomas Lippert. 1 [doi]
- LAWS: locality-aware work-stealing for multi-socket multi-core architecturesQuan Chen, Minyi Guo, Haibing Guan. 3-12 [doi]
- Effective automatic computation placement and dataallocation for parallelization of regular programsChandan Reddy, Uday Bondhugula. 13-22 [doi]
- On the conditions for efficient interoperability with threads: an experience with PGAS languages using cray communication domainsKhaled Z. Ibrahim, Katherine A. Yelick. 23-32 [doi]
- HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnectsMd. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Dhabaleswar K. Panda. 33-42 [doi]
- DTail: a flexible approach to DRAM refresh managementZehan Cui, Sally A. McKee, Zhongbin Zha, Yungang Bao, Mingyu Chen. 43-52 [doi]
- Last-level cache deduplicationYingying Tian, Samira Manabi Khan, Daniel A. Jiménez, Gabriel H. Loh. 53-62 [doi]
- Block value based insertion policy for high performance last-level cachesLingda Li, Junlin Lu, Xu Cheng. 63-72 [doi]
- Multi-stage coordinated prefetching for present-day processorsSanyam Mehta, Zhenman Fang, Antonia Zhai, Pen-Chung Yew. 73-82 [doi]
- Evaluation of methods to integrate analysis into a large-scale shock shock physics codeRon A. Oldfield, Kenneth Moreland, Nathan Fabian, David H. Rogers. 83-92 [doi]
- Input-adaptive parallel sparse fast fourier transform for stream processingShuo Chen, Xiaoming Li. 93-102 [doi]
- Thread-cooperative, bit-parallel computation of levenshtein distance on GPUAlejandro Chacón, Santiago Marco-Sola, Antonio Espinosa, Paolo Ribeca, Juan Carlos Moure. 103-112 [doi]
- Load balancing n-body simulations with highly non-uniform densityOlga Pearce, Todd Gamblin, Bronis R. de Supinski, Tom Arsenlis, Nancy M. Amato. 113-122 [doi]
- 21st century computer architecture keynote at 2014 international conference on supercomputing (ICS)Mark D. Hill. 123 [doi]
- MT-MPI: multithreaded MPI for many-core environmentsMin-Si, Antonio J. Peña, Pavan Balaji, Masamichi Takagi, Yutaka Ishikawa. 125-134 [doi]
- Implementing a classic: zero-copy all-to-all communication with mpi datatypesJesper Larsson Träff, Antoine Rougier, Sascha Hunold. 135-144 [doi]
- Value influence analysis for message passing applicationsPhilip C. Roth, Jeremy S. Meredith. 145-154 [doi]
- Scalable performance analysis of exascale MPI programs through signature-based clustering algorithmsAmir Bahmani, Frank Mueller. 155-164 [doi]
- An optimal distributed load balancing algorithm for homogeneous work unitsAkhil Langer. 165 [doi]
- Addressing bandwidth contention in SMT multicores through schedulingJosué Feliu, Julio Sahuquillo, Salvador Petit, José Duato. 167 [doi]
- An adaptive cross-architecture combination method for graph traversalYang You, Shuaiwen Leon Song, Darren J. Kerbyson. 169 [doi]
- Accelerating cache coherence mechanism with speculationJun Ohno, Kei Hiraki. 171 [doi]
- Reducing energy consumption of NoC by router bypassingTakahiro Naruko. 173 [doi]
- Hardware-assisted scalable flow control of shared receive queueTeruo Tanimoto, Takatsugu Ono, Kohta Nakashima, Takashi Miyoshi. 175 [doi]
- Automating and optimizing data transfers for many-core coprocessorsBin Ren, Nishkam Ravi, Yi Yang, Min Feng, Gagan Agrawal, Srimat T. Chakradhar. 177 [doi]
- Parallelizing and optimizing sparse tensor computationsMuthu Manikandan Baskaran, Benoît Meister, Richard Lethin. 179 [doi]
- Revealing applications' access pattern in collective I/O for cache managementYin Lu, Yong Chen, Rob Latham, Yu Zhuang. 181-190 [doi]
- Supporting storage configuration for I/O intensive workflowsLauro Beltrão Costa, Samer Al-Kiswany, Hao Yang, Matei Ripeanu. 191-200 [doi]
- Understanding the impact of threshold voltage on MLC flash memory performance and reliabilityWei Wang, Tao Xie, Deng Zhou. 201-210 [doi]
- DWC: dynamic write consolidation for phase change memory systemsFei Xia, Dejun Jiang, Jin Xiong, Mingyu Chen, Lixin Zhang 0002, Ninghui Sun. 211-220 [doi]
- Palm: easing the burden of analytical performance modelingNathan R. Tallent, Adolfy Hoisie. 221-230 [doi]
- An end-to-end analysis of file system features on sparse virtual disksRuijin Zhou, Sankaran Sivathanu, Jinpyo Kim, Bing Tsai, Tao Li. 231-240 [doi]
- Improving performance by matching imbalanced workloads with heterogeneous platformsJie Shen, Ana Lucia Varbanescu, Peng Zou, Yutong Lu, Henk J. Sips. 241-250 [doi]
- Long-term resource fairness: towards economic fairness on pay-as-you-use computing systemsShanjiang Tang, Bu-Sung Lee, Bingsheng He, Haikun Liu. 251-260 [doi]
- The future of supercomputingMarc Snir. 261-262 [doi]
- Acceleration of derivative calculations with application to radial basis function: finite-differences on the intel mic architectureGordon Erlebacher, Erik Saule, Natasha Flyer, Evan F. Bollig. 263-272 [doi]
- An efficient two-dimensional blocking strategy for sparse matrix-vector multiplication on GPUsArash Ashari, Naser Sedaghati, John Eisenlohr, P. Sadayappan. 273-282 [doi]
- A programming system for xeon phis with runtime SIMD parallelizationXin Huo, Bin Ren, Gagan Agrawal. 283-292 [doi]
- Unified on-chip memory allocation for SIMT architectureAri B. Hayes, Eddy Z. Zhang. 293-302 [doi]
- Galaxy: a high-performance energy-efficient multi-chip architecture using photonic interconnectsYigit Demir, Yan Pan, Seokwoo Song, Nikos Hardavellas, John Kim, Gokhan Memik. 303-312 [doi]
- A performance perspective on energy efficient HPC linksKarthikeyan P. Saravanan, Paul M. Carpenter, Alex Ramírez. 313-322 [doi]
- Verifying micro-architecture simulators using event tracesHui Meen Nyew, Nilufer Onder, Soner Önder, Zhenlin Wang. 323-332 [doi]
- Scaling up matrix computations on shared-memory manycore systems with 1000 CPU coresFengguang Song, Jack Dongarra. 333-342 [doi]
- Collective memory transfers for multi-core chipsGeorge Michelogiannakis, Alexander Williams, Samuel Williams, John Shalf. 343-352 [doi]
- Scalable analysis of multicore data reuse and sharingMiquel Pericàs, Kenjiro Taura, Satoshi Matsuoka. 353-362 [doi]