rdiaz.dev
ricardo díaz·software engineer·machine learning
← back to projects

FlashAttention FPGA Accelerator

2025–26 · github.com

Tiled FlashAttention IP core in Vitis HLS for FPGA/SoC platforms. Eliminates full N×N attention matrix materialization; seven pipelined modules via AXI4.

  • Designed a tiled FlashAttention IP core in Vitis HLS for FPGA/SoC platforms, eliminating full N×N attention matrix materialization to reduce external memory bandwidth and optimize inference throughput.
  • Implemented seven pipelined modules (Q/K/V Loader, Dot-Product Engine, Online Softmax, Weighted Accumulator, Output Writeback) via AXI4 memory and AXI-Lite control interfaces; applied DATAFLOW pragmas for pipeline-level parallelism.
fpgahlsc++
bufprojects·themedark