PROGRAM

Displayed time zone: GMT+9 (KST)
Use GMT-7 (PDT)
Use GMT-4 (EDT)
Use GMT+2 (CET)

Monday, September 27

00:00 –
05:00
Workshop

The 2nd International Workshop on Machine Learning for Software Hardware Co-Design (MLSH’21)
15:00 –
16:00
Keynote

Speaker: Saman Amarasinghe, Massachusetts Institute of Technology
Title: TBD
Abstract: TBD
16:00 –
16:30
Break
16:30 –
18:10
                
Session 1: Tuning and Lifting

A Flexible Approach to Autotuning Multi-Pass Machine Learning Compilers

Phitchaya Mangpo Phothilimthana, Amit Sabne, Nikhil Sarda, Karthik Srinivasa Murthy, Yanqi Zhou, Christof Angermueller, Mike Burrows, Sudip Roy, Ketan Mandke, Rezsa Farahani, Yu Emma Wang, Berkin Ilbeyi, Blake Hechtman, Bjarke Roune, Shen Wang, Yuanzhong Xu, Samuel J. Kaufman



PolyGym: Polyhedral Optimizations as an Environment for Reinforcement Learning

Alexander Brauckmann, Andrés Goens, Jeronimo Castrillon



Union: A Unified HW-SW Co-Design Ecosystem in MLIR for Evaluating Tensor Operations on Spatial Accelerators

Geonhwa Jeong, Gokcen Kestor, Prasanth Chatarasi, Angshuman Parashar, Po-An Tsai, Sivasankaran Rajamanickam, Roberto Gioiosa, Tushar Krishna



Polygeist: Raising C to Polyhedral MLIR

William S. Moses, Lorenzo Chelini, Ruizhe Zhao, Oleksandr Zinenko



Program Lifting using Gray-Box Behavior

Bruce Collie, Michael O’Boyle


Tuesday, September 28

00:00 –
01:00
Keynote (mirrored session)
01:00 –
01:30
Break
01:30 –
03:10
Session 1: Tuning and Lifting (mirrored session)
03:10 –
04:00
Break
04:00 –
08:00
Tutorial

Title: How to parallelize your own language using OpenCilk components
15:00 –
16:40
                
Session 2: Heterogeneous Systems

NLP-Fast: A Fast, Scalable, and Flexible System to Accelerate Large-Scale Heterogeneous NLP Models

Joonsung Kim, Suyeon Hur, Eunbok Lee, Seungho Lee, Jangwoo Kim



HERTI: a Reinforcement Learning-Augmented System for Efficient Real-Time Inference on Heterogeneous Embedded Systems

Myeonggyun Han, Woongki Baek



X-Layer: Building Composable Pipelined Dataflows for Low-Rank Convolutions

Naveen Vedula, Reza Hojabr, Ahmad Khonsari, Arrvindh Shriraman



InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing

Daehyeon Baek, Soojin Hwang, Taekyung Heo, Daehoon Kim, Jaehyuk Huh



PrecisionBatching: Bitserial Decomposition for Efficient Neural Network Inference on GPUs

Maximilian Lam, Zachary Yedidia, Colby R Banbury, Vijay Janapa Reddi


16:40 –
17:00
Break
17:00 –
18:40
Session 3: Characterization and Near-Memory Computing

AIBench Scenario: Scenario-distilling AI Benchmarking

Wanling Gao, Fei Tang, Jianfeng Zhan, Xu Wen, Lei Wang, Zheng Cao, Chuanxin Lan, Chunjie Luo, Xiaoli Liu, Zihan Jiang



Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks

Amirali Boroumand, Saugata Ghose, Berkin Akin, Ravi Narayanaswami, Geraldo F. Oliveira, Xiaoyu Ma, Eric Shiu, Onur Mutlu



SEER: A Time Prediction Model for CNNs from GPU Kernel’s View

Guodong Liu, Sa Wang, Yungang Bao



PIM-DL: Boosting DNN Inference on Digital Processing In-Memory Architectures via Data Layout Optimizations

Minxuan Zhou, Guoyang Chen, Mohsen Imani, Saransh Gupta, Weifeng Zhang, Tajana Rosing



Ultra Efficient Acceleration for De Novo Genome Assembly via Near-Memory Computing

Minxuan Zhou, Lingxi Wu, Muzhou Li, Niema Moshiri, Kevin Skadron, Tajana Rosing


Wednesday, September 29

00:00 –
01:40
Session 2: Heterogeneous Systems (mirrored session)
01:40 –
02:00
Break
02:00 –
03:40
Session 3: Characterization and Near-Memory Computing (mirrored session)
15:00 –
16:40
                
Session 4: Memory Hierarchy


CBP: Coordinated management of cache partitioning, bandwidth partitioning and prefetch throttling

Nadja Ramhöj Holtryd, Madhavan Manivannan, Per Stenström, Miquel Pericás



Invalidate or Update? Revisiting Coherence for Tomorrow’s Cache Hierarchies

Mingcan Zhu, Amna Shahab, Antonios Katsarakis, Boris Grot



Write Prediction for Persistent Memory Systems

Suyash Mahar, Sihang Liu, Korakit Seemakhupt, Vinson Young, Samira Khan



nuKSM: NUMA-aware Memory De-duplication on Multi-socket Servers

Akash Panda, Ashish Panwar, Arkaprava Basu



CoPlace: Effectively Mitigating Cache Conflicts in Modern Clouds

Xiaowei Shang, Weiwei Jia, Jianchen Shan, Xiaoning Ding


16:40 –
17:00
Break
17:00 –
18:40
Session 5: Graphs and Applications


Dryadic: Flexible and Fast Graph Pattern Matching at Scale

Daniel Mawhirter, Sam Reinehr, Wei Han, Noah Fields, Miles Claver, Connor Holmes, Jedidiah McClurg, Tongping Liu, Bo Wu



Skywalker: Efficient Alias-method-based Graph Sampling and Random Walk on GPUs

Pengyu Wang, Chao Li, Jing Wang, Taolei Wang, Lu Zhang, Jingwen Leng, Quan Chen, Minyi Guo



SumPA: Efficient Pattern-Centric Graph Mining with Pattern Abstraction

Chuangyi Gui, Xiaofei Liao, Long Zheng, Pengcheng Yao, Qinggang Wang, Hai Jin



SURFNet: Super-resolution of Turbulent Flows with Transfer Learning using Small Datasets

Octavi Obiols-Sales, Abhinav Vishnu, Nicholas Malaya, Aparna Chandramowlishwaran



Accelerating Fourier and Number Theoretic Transforms using Tensor Cores and Warp Shuffles

Sultan Durrani, Muhammad Saad Chughtai, Mert Hidayetoglu, Rashid Tahir, Abdul Dakkak, Lawrence Rauchwerger, Fareed Zaffar, Wen-mei Hwu


18:40 –
18:50
ACM SRC Award and Closing Remarks

Thursday, September 30

00:00 –
01:40
                
Session 4: Memory Hierarchy (mirrored session)
01:40 –
02:00
Break
02:00 –
03:40
Session 5: Graphs and Applications (mirrored session)
03:40 –
03:50
ACM SRC Award and Closing Remarks (mirrored session)