PROJECTS | Ameer Haj Ali

LATEST RESEARCH PROJECTS

NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning (Intel Labs)

NeuroVectorizer uses a novel approach for handling loop vectorization automatically, in an end-to-end fashion using deep reinforcement learning (RL). Deep RL can capture different instructions, dependencies, and access patterns to enable learning a sophisticated model that can better predict the actual performance cost and determine the optimal vectorization factors when compared against the currently used fixed-cost models that rely on heuristics. An end-to-end framework is developed: from code to vectorization. It integrates deep RL in the LLVM compiler. Our proposed framework takes benchmark codes as input and extracts the loop codes. These loop codes are then fed to a loop embedding generator that learns an embedding for these loops. The learned embeddings are used as input to a deep RL agent, which dynamically determines the vectorization factors for all the loops.

Ameer Haj-Ali, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica, ”NeuroVectorizer: End-to-End Vectorization with Deep Reinforcement Learning”, International Symposium on Code Generation and Optimization 2020 (CGO 2020), February 2020.
Ameer Haj-Ali, Nesreen Ahmed, Theodore L Willke, Sophia Shao, Krste Asanovic, and Ion Stoica, "Workshop on ML for Systems at NeurIPS 2019", December 2019.
Ameer Haj-Ali, Nesreen Ahmed, Ted Willke, Sophia Shao, Krste Asanovic, Ion Stoica, ”End-to-End Vectorization with Deep Reinforcement Learning”, Compiler, Architecture, and Tools Conference 2019 (CATC 2019), December 2019.
Available on arXiv: https://arxiv.org/abs/1909.13639.

AutoPhase: Overcoming the Compiler Phase-Ordering with Deep Reinforcement Learning (Adept & RISE Labs)

The performance of the code a compiler generates depends on the order in which it applies the optimization passes. Choosing a good order--often referred to as the phase-ordering problem--is an NP-hard problem. As a result, existing solutions rely on a variety of heuristics. In this project, a new technique to address the phase-ordering problem is proposed: deep reinforcement learning.
To this end, a framework is proposed. It takes a program and finds a sequence of passes that optimize the performance of the generated circuit.
Without loss of generality, this framework is instantiated in the context of an LLVM compiler and target high-level synthesis programs.
Random forests are used to quantify the correlation between the effectiveness of a given pass and the program's features. This helps reduce the search space by avoiding orderings that are unlikely to improve the performance of a given program.

Ameer Haj-Ali*, Qijing Huang*, William Moses, John Xiang, Ion Stoica, Krste Asanovic , John Wawrzynek, "AutoPhase: Compiler Phase-Ordering for HLS with Deep Reinforcement Learning," IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM 2019), San Diego, CA, USA, 2019, https://ieeexplore.ieee.org/abstract/document/8735549
Ameer Haj-Ali*, Qijing Huang*, William Moses, John Xiang, John Wawrzynek, Krste Asanovic, Ion Stoica, "Juggling HLS Phase Orderings in Random Forests with Deep Reinforcement Learning,” Submitted.

Gemmini: An Agile Systolic Array Generator Enabling Systematic Evaluations of Deep Learning Architectures (Adept Lab)

Advances in deep learning and neural networks have resulted in rapid development of hardware accelerators that support them. Gemmini is an open source and agile systolic array generator enabling systematic evaluations of deep-learning architectures. Gemmini generates a custom ASIC accelerator for matrix multiplication based on a systolic array architecture, complete with additional functions for neural network inference. Gemmini runs with the RISC-V ISA, and is integrated with the Rocket Chip System-on-Chip generator ecosystem, including Rocket in-order cores and BOOM out-of-order cores.

AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs (BWRC LAB)

Domain specialization under energy constraints in deeply-scaled CMOS has been driving the need for agile development of Systems on a Chip (SoCs). While digital subsystems have design flows that are conducive to rapid iterations from specification to layout, analog and mixed-signal modules face the challenge of a long human-in-the-middle iteration loop that requires expert intuition to verify that post-layout circuit parameters meet the original design specification. Existing automated solutions that optimize circuit parameters for a given target design specification have limitations of being schematic-only, inaccurate, sample-inefficient or not generalizable. This work presents AutoCkt, a deep-reinforcement learning tool that not only finds post-layout circuit parameters for a given target specification but also gains knowledge about the entire design space through a sparse subsampling technique. Our results show that for multiple circuit topologies, the trained AutoCkt agent is able to converge and meet all target specifications on at least 96.3% of tested design goals in schematic simulation, on average 40X faster than a traditional genetic algorithm. Using the Berkeley Analog Generator, AutoCkt is able to design 40 LVS passed operational amplifiers in 68 hours, 9.6X faster than the state-of-the-art when considering layout parasitics.

Keertana Settaluri, Ameer Haj-Ali, Qijing Huang, Suhong Moon, Kourosh Hakhamaneshi, Ion Stoica, Krste Asanovic, Borivoje Nikolic, “AutoCkt: Deep Reinforcement Learning of Analog Circuit Designs,” Design, Automation & Test in Europe Conference & Exhibition (DATE 2020), March 2020.

Screen Shot 2019-10-31 at 6.49.44 PM.png

memristive Memory Processing Unit (mMPU, ASIC^2 Lab)

This project aims to develop a new computer architecture that enables true in-memory processing based on a unit that can both store and process data using the same cells. This unit, called a memristive memory processing unit (mMPU), will substantially reduce the necessity of moving data in computing systems, and as a result, will solve the two main bottlenecks existing in current computing systems, i.e., speed (‘memory wall’) and energy efficiency (‘power wall’). Emerging memory technologies, namely memristive devices, are the enablers of the mMPU. While memristors are naturally used as memory, they can also perform logical operations using a technique we have invented called Memristor Aided Logic (MAGIC), and this combination is the basis of mMPU. The goal of this research is to design a fully functional mMPU, and by that, to demonstrate a real computing system with improved performance and energy efficiency. This research focuses on all of the full system aspects, including mMPU circuit design, system architecture and software, modeling and evaluation, and fabrication.

Ameer Haj-Ali, Ronny Ronen, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky, ”Memristor-Based Processing-in-Memory and Its Application On Image Processing,” Elsevier.
Nishil Talati, Rotem Ben-Hur, Nimrod Wald, Ameer Haj-Ali, John Reuben, and Shahar Kvatinsky, “mMPU - a Real Processing-in-Memory Architecture to Combat the von Neumann Bottleneck,” Springer, 2020.
Rotem Ben-hur, Ronny Ronen, Ameer Haj-Ali, Debjyoti Bhattacharjee, Adi Eliahu, Natan Peled, and Shahar Kvatinsky, ”SIMPLER MAGIC: Synthesis and Mapping of In-Memory Logic Executed in a Single Row to Improve Throughput,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), July 2019.
Tzofnat Greenberg-Toledo, Roee Mazor, Ameer Haj-Ali, and Shahar Kvatinsky, ”Supporting the Momentum Training Algorithm Using a Memristor-Based Synapse,” IEEE Transactions on Circuits and Systems I: Regular Papers," January 2019.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, and Shahar Kvatinsky, ”Not in Name Alone: a Memristive Memory Processing Unit for Real In-Memory Processing,” IEEE Micro, September 2018.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, and Shahar Kvatinsky, “IMAGING: In-Memory AlGorithms for Image processiNG,” IEEE Transactions on Circuits and Systems I: Regular Papers, June 2018.
Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, and Shahar Kvatinsky, “Efficient Algorithms for In-memory Fixed Point Multiplication Using MAGIC,” IEEE International Symposium on Circuits and Systems (ISCAS 2018), May 2018.
Nishil Talati, Ameer Haj-Ali, Rotem Ben-Hur, Nimrod Wald, Ronny Ronen, Pierre- Emmanuel Gaillardon, and Shahar Kvatinsky, “Practical Challenges in Delivering the Promises of Real Processing-in-Memory Machines,” Design, Automation & Test in Europe Conference & Exhibition (DATE 2018), March 2018.
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj-Ali, Pierre-Emmanuel Gaillardon, and Shahar Kvatinsky, “Memristive Logic: A Framework for Evaluation and Comparison,” 27th International Symposium on Power and Timing Modeling, Optimization and Simulation (PATMOS 2017), September 2017.
John Reuben, Rotem Ben-Hur, Nimrod Wald, Nishil Talati, Ameer Haj-Ali, Pierre- Emmanuel Gaillardon, and Shahar Kvatinsky, “ A Taxonomy and Evaluation Framework for Memristive Logic,” Springer, 2017.

To see more or discuss possible work let's talk >>

Let's Talk