Abstract: Matrix-matrix multiplication is commonly used as the core of operation in linear algebra computations and various applications such as finite element analysis and deep neural networks.
Abstract: Machine learning has been widely applied in various emerging data-intensive applications, and has to be optimized and accelerated by powerful engines to process very large scale data.