Tiled FlashAttention IP core in Vitis HLS for FPGA/SoC platforms. Eliminates full N×N attention matrix materialization; seven pipelined modules via AXI4.
- Designed a tiled FlashAttention IP core in Vitis HLS for FPGA/SoC platforms, eliminating full N×N attention matrix materialization to reduce external memory bandwidth and optimize inference throughput.
- Implemented seven pipelined modules (Q/K/V Loader, Dot-Product Engine, Online Softmax, Weighted Accumulator, Output Writeback) via AXI4 memory and AXI-Lite control interfaces; applied DATAFLOW pragmas for pipeline-level parallelism.