CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment

Abstract

Genome analysis is a critical tool in medical and bioscience research, clinical diagnostics and treatment, and disease control and prevention. Seed and extension-based alignment is the main approach in the genome analysis pipeline, and BWA-MEM2, a widely acknowledged tool for genome alignment, performs seeding by searching for super maximal exact match (SMEM). The computation of SMEM searching requires high memory bandwidth and energy consumption, which becomes the main performance bottleneck in BWA-MEM2. State-of-the-Art designs like ERT and GenAx have achieved impressive speed-ups of SMEM-based genome alignment. However, they are constrained by frequent DRAM fetches or computationally intensive intersection calculations for all possible k-mers at every read position. We present a CAM-based SMEM seeding accelerator for genome alignment (CASA), which circumvents the major throughput and power bottlenecks brought by data fetches and frequent position intersections through the co-design of a novel CAM-based computing architecture and a new SMEM search algorithm. CASA mainly consists of a pre-seeding filter table and a SMEM computing unit. The former expands the k-mer size to 19 using limited on-chip memory, which enables the efficient filtration of non-SMEM pivots. The latter applies a new algorithm to filter out disposable SMEMs that are already contained in other SMEMs. We evaluated a 28nm CASA implementation using the human and mouse genome references. CASA achieves 1.2 × and 5.47 × throughput improvement over ERT and GenAx while only requiring less than 30GB/s DRAM bandwidth and keeping the same alignment results as BWA-MEM2. Moreover, CASA provides 2.57 × and 6.69 × higher energy efficiency than ERT and GenAx.

Publication
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture
Avatar
Zhiyu Chen
PhD 2023, now at Apple
Avatar
Kaiyuan Yang
Associate Professor of ECE