Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs

Last updated on August 7, 2023 pm

Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs

Info

OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

Authors: Yunxuan Yu, Chen Wu, Tiandong Zhao, Kun Wang, Lei He

Published in: TVLSI 2020 Jan.

DOI: 10.1109/TVLSI.2019.2939726

Keywords: CNN overlay processor, FPGA acceleration, hardware-software codesign

Background & Problems

FPGA acceleration for CNNs -> automatic compilers for FPGA CNN accelerators

Can not achieve the best performance
Impossible for edge computing

Key Ideas

RTL-based hand-coded FPGA overlay domain-specific processor unit (OPU) with software-programmability and fast compilation time, targeting at general CNN accelerations.

OPU work flow

Features:

User friendliness like CPU/GPU
Domain-specific ISA with optimized granularity: flexibility, efficiency and lower complexity
FPGA-based high-performance microarchitecture: computation, data communication and reorganization
Compiler with comprehensive optimization

Implementation

ISA

Conditional instructions (C-type)

Unconditional instructions (U-type)

flowchart LR
    A["1 C-type"]
    B["1 to n U-type"]
    PE["1 Processing Element (PE)<br>module"]

    subgraph IB [Instruction Block]
        subgraph IU [Instruction Unit]
            A
            B
        end

        subgraph IU1 [many instruction units]
            C[Instructions]
        end
    end

    IB --- PE

One instruction block is fetched together and distributed to one processing element module.

C-type

Specify target operations and set operation trigger conditions

Operation code (OP code): target operation
trigger condition: when operation is ready to execute

6 types in total:

Memory Read
Memory Write
Data Fetch
Compute
Post Process: combination of pooling, activation, data quantization, intermediate result addition and residual operations
Instruction Read

Each one corresponds to a dedicated operation module in the PE.

U-type

Deliver corresponding operation parameters for its paired C-type

Microarchitecture

Overlay: reconfigurable architectures implemented on top of FPGAs. They are regular designs described using structural HDL, but have reconfigurable capabilities. They may be considered as “softcore FPGA IPs (Semiconductor intellectual property core)”.

Compiler

Do the following on input CNN configuration:

Operation fusion
Network slicing
Throughput optimization

Divided into two major stages: Translation and Optimization.

Extract necessary information from model definition files and reorganize them into a unified intermediate representation (IR)

perform operation fusion to combine closely related operations.
perform data quantization (generate dynamic fixed-point representations)
perform network slicing
perform optimization
rearrange processed weights

Translation: Operation Fusion

merge or concatenate related layer operations

p-fusion: only contributes to off-chip memory access reduction
r-fusion:
- avoids communication latency
- reduces the total number of operations and inference time

Major layers: Convolution and Fully Connected (FC) layers

Affiliated layers: Pooling, Padding, Activation, Residual and Output Concatenation layers

r-fusion-I: batch normalization elimination, avoiding separate computation of batch normalization

r-fusion-II: input sharing, identifying input sharing layers and reassembling them

Data Quantization

Use dynamic quantization scheme to get 8-bit fixed-point values.

Intermediate Representation

Experiment

MAC: multiply and accumulate Multiply–accumulate operation - Wikipedia

Limitation

Y. Yu, C. Wu, T. Zhao, K. Wang and L. He, “OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 1, pp. 35-47, Jan. 2020, doi: 10.1109/TVLSI.2019.2939726.

Unfinished

Paper Notes

#OPU #OpenOPU #FPGA #CNN

Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs

https://lingkang.dev/2023/08/07/read-paper-1-yu/

Author

Lingkang

Posted on

August 6, 2023

Licensed under

Install Lean 3 on Windows Previous

Migrate GitLab from CentOS to Ubuntu on ECS Next