Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs

Last updated on August 7, 2023 pm

Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs

Info

OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks

Authors: Yunxuan Yu, Chen Wu, Tiandong Zhao, Kun Wang, Lei He

Published in: TVLSI 2020 Jan.

DOI: 10.1109/TVLSI.2019.2939726

Keywords: CNN overlay processor, FPGA acceleration, hardware-software codesign

Background & Problems

FPGA acceleration for CNNs -> automatic compilers for FPGA CNN accelerators

  • Can not achieve the best performance
  • Impossible for edge computing

Key Ideas

RTL-based hand-coded FPGA overlay domain-specific processor unit (OPU) with software-programmability and fast compilation time, targeting at general CNN accelerations.

OPU work flow

Features:

  • User friendliness like CPU/GPU
  • Domain-specific ISA with optimized granularity: flexibility, efficiency and lower complexity
  • FPGA-based high-performance microarchitecture: computation, data communication and reorganization
  • Compiler with comprehensive optimization

Implementation

ISA

Conditional instructions (C-type)

Unconditional instructions (U-type)

flowchart LR
    A["1 C-type"]
    B["1 to n U-type"]
    PE["1 Processing Element (PE)<br>module"]

    subgraph IB [Instruction Block]
        subgraph IU [Instruction Unit]
            A
            B
        end

        subgraph IU1 [many instruction units]
            C[Instructions]
        end
    end

    IB --- PE

One instruction block is fetched together and distributed to one processing element module.

C-type

Specify target operations and set operation trigger conditions

  • Operation code (OP code): target operation
  • trigger condition: when operation is ready to execute

6 types in total:

  • Memory Read
  • Memory Write
  • Data Fetch
  • Compute
  • Post Process: combination of pooling, activation, data quantization, intermediate result addition and residual operations
  • Instruction Read

Each one corresponds to a dedicated operation module in the PE.

U-type

Deliver corresponding operation parameters for its paired C-type

Microarchitecture

Overlay: reconfigurable architectures implemented on top of FPGAs. They are regular designs described using structural HDL, but have reconfigurable capabilities. They may be considered as “softcore FPGA IPs (Semiconductor intellectual property core)”.

Compiler

Do the following on input CNN configuration:

  • Operation fusion
  • Network slicing
  • Throughput optimization

Divided into two major stages: Translation and Optimization.

Extract necessary information from model definition files and reorganize them into a unified intermediate representation (IR)

  1. perform operation fusion to combine closely related operations.
  2. perform data quantization (generate dynamic fixed-point representations)
  3. perform network slicing
  4. perform optimization
  5. rearrange processed weights

Translation: Operation Fusion

merge or concatenate related layer operations

  1. p-fusion: only contributes to off-chip memory access reduction
  2. r-fusion:
    • avoids communication latency
    • reduces the total number of operations and inference time

Major layers: Convolution and Fully Connected (FC) layers

Affiliated layers: Pooling, Padding, Activation, Residual and Output Concatenation layers

r-fusion-I: batch normalization elimination, avoiding separate computation of batch normalization

r-fusion-II: input sharing, identifying input sharing layers and reassembling them

Data Quantization

Use dynamic quantization scheme to get 8-bit fixed-point values.

Intermediate Representation

Experiment

MAC: multiply and accumulate Multiply–accumulate operation - Wikipedia

Limitation


Y. Yu, C. Wu, T. Zhao, K. Wang and L. He, “OPU: An FPGA-Based Overlay Processor for Convolutional Neural Networks,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 28, no. 1, pp. 35-47, Jan. 2020, doi: 10.1109/TVLSI.2019.2939726.

Unfinished


Paper Notes: OPU - FPGA-Based Overlay Processor for CNNs
https://lingkang.dev/2023/08/07/read-paper-1-yu/
Author
Lingkang
Posted on
August 6, 2023
Licensed under