Abstract: Matrix multiplication is a fundamental computational operation widely used in various engineering applications. To accelerate large-scale matrix multiplication, computing tasks are commonly ...
Abstract: N:M sparsity is an emerging model compression method supported by more and more accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing N:M sparsity ...