Paper Notes: FlipBit - Save Energy for Flash Memory Write with Approximation
Last updated on November 11, 2024 am
Paper Notes: FlipBit - Save Energy for Flash Memory Write with Approximation
Alexander Buck, Karthik Ganesan, and Natalie Enright Jerger. FlipBit: Approximate Flash Memory for IoT Devices. IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024. https://doi.org/10.1109/HPCA57654.2024.00072
1. Observation and Motivation
- Flash memory write is the energy and latency bottleneck for IoT devices.
- Specific inside flash writing, it presents a write asymmetry property. That is, when setting a bit from
0
to1
, it charges the entire block to1
firstly, and then drains the unnecessary1
s to0
. This is energy-consuming. As for the opposite operation, it is quick and energy-efficient. - Such writing method also contributes to flash memory wear-out.
- The data processed and stored by flash in IoT devices is often noisy, providing a room for approximate computing.
For data collected by sensors, which is often the case in IoT devices, it is common to have inherent uncertainty or variability in data.
For example, GPS data may contain noise on speed measurements due to limited sensor precision or signal interference. This may lead to unrealistic speed values.
Approximate computing leverages this inborn noisiness, regarding is as an opportunity to save energy, provide tolerance and improve performance.
2. FlipBit: Approximate Flash Memory Write
Thus, the proposed method for energy-efficient flash memory write is to approximate the data to be written, named FlipBit:
- If the data written to block only requires
1
to0
changes, it is written directly. - If the write contains
0
to1
changes, the operation is approximated to the nearest1
-to-0
-only write, which can be done in a single step. - If the approximate rate exceeds a threshold, the block will perform a precise write to avoid data divergence.
This method can save energy and reduce wear-out, while maintaining the general data integrity.
For example, for a 8-bit flash memory block, assume it is currently 11010100
(212 in base 10) and the data required to be written is 11001111
(207 in base 10). According to the FlipBit method, the write operation would be approximated to 11010000
(208 in base 10), which only contains 1
-to-0
changes with error rate of (208 - 207) / 207 = 0.00483 = 0.483%.
Bit | 7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 | # 0 -> 1 |
# 1 -> 0 |
Base 10 |
---|---|---|---|---|---|---|---|---|---|---|---|
Original | 1 |
1 |
0 |
1 |
0 |
1 |
0 |
0 |
\ | \ | 212 |
Exact | 1 |
1 |
0 |
0 |
1 |
1 |
1 |
1 |
3 | 1 | 207 |
Approx | 1 |
1 |
0 |
1 |
0 |
0 |
0 |
0 |
0 | 1 | 208 |
3. Implementation and Realization
3.1. Exhaustive Searching
The intuitive algorithm (aka the baseline algorithm) for FlipBit is to iterate through the search domain and find the value that minimizes the error rate (i.e., the closest to exact value). The search domain is defined as, bits combinations that only keep all 0
s in the original bits, where other bits are allowed to flip. As the memory block size increases, the search domain grows exponentially, leading to high computational complexity and implementation difficulty.
3.2. One-Bit Approximation
By iterating through the bits from the most significant bit (MSB), the one-bit approximation algorithm set that if a approx bit is flipped from 1
(expect) to 0
(approx output), the approx bits after it whose original corresponding bits are 1
will be set to 1
, as a compensation to the deduction. This implies that the approximated value is always no larger than the exact value.
set_ones
initialized toFalse
- for i in range(block_size):
original_bit
is thei
-th bit of the original dataexact_bit
is thei
-th bit of the exact dataapprox_bit
is thei
-th bit of the approximate data- if
original_bit
is1
- if
exact_bit
is1
orset_ones
isTrue
approx_bit
is1
- else:
approx_bit
is0
- if
- else if
exact_bit
is1
set_ones
set toTrue
approx_bit
is0
- else:
approx_bit
is0
This algorithm is easier to implement but may not always provide the optimal solution.
A Python implementation might make the flow clearer:
1 |
|
3.3. N-Bit Approximation
As a combination of previous two algorithms, the N-bit approximation algorithm is designed to check n
bits at once.
This relies on truth tables to determine the optimal approximation.
In the truth table, x
represents a context-agnostic bit, _
represents an undecided bit.
Original | Exact | Approx | Changes from Exact to Approx | Remark |
---|---|---|---|---|
0x |
xx |
0_ |
Smaller or equal | As long as the original bit is 0 , the approx bit cannot be 1 |
1x |
1x |
1_ |
Equal | Just like one-bit approx, if the original bit and the exact bit are both 1 , the approx bit should be 1 |
10 |
00 |
0_ |
Equal | It’s easy to set 1 to 0 |
10 |
01 |
1_ |
Larger | Keep the 1 bit, as its difficult to get 1 back |
11 |
00 |
0_ |
Equal | It’s easy to set 1 to 0 |
11 |
01 |
0_ |
Equal | It’s easy to set 1 to 0 |
The algorithm is as follows:
set_ones
initialized toFalse
set_zeros
initialized toFalse
- for i in range(block size):
original_bit
is thei
-th bit of the original dataexact_bit
is thei
-th bit of the exact dataapprox_bit
is thei
-th bit of the approximate data- if
set_zeros
isFalse
:- if
original_bit
is1
:- if
exact_bit
is1
orset_ones
isTrue
:approx_bit
is1
- else if ReadTruthTable(original_bits, exact_bits) is
1
:approx_bit
is1
set_zeros
set toTrue
- else:
approx_bit
is0
- if
- else if
exact_bit
is1
:set_ones
set toTrue
approx_bit
is0
- else:
approx_bit
is0
- if
- else:
approx_bit
is0
Basically if 0x
(original) and 1x
(exact) condition (a subset of 0x
and 1x
, i.e., the first row in the truth table) is triggered, which might make the approx smaller than the exact, the set_ones
flag would be triggered to compensate the loss by feeding 1
s to approx.
Similarly for 10
(original) and 01
(exact) condition, the set_zeros
flag would be triggered to provide 0
s to make the approx closer to the exact.
1 |
|
While the N-bit approximation is presented in iteration, it can be parallelized on hardware with repeated logic units, thus is hardware-friendly.
4. Evaluation
FlipBit is evaluated under two common IoT devices’ use cases: 1) sense and send data, and 2) compute and send data, with video capturing and deep neural network (DNN) inference as the representative applications.
5. Some Thoughts
The mean absolute error (MAE) of each write is tracked as a threshold that determines whether a precise write is necessary. The threshold is designed to be configurable, allowing the system to adapt to different application scenarios, where energy savings can be traded off against accuracy. However, a question arises regarding the accumulation of errors over time. Specifically, will these approximations, when compounded across multiple writes, lead to an accumulation of error that ultimately affects the correctness of the system’s outcome?
In the tested use cases presented in the paper, the accumulation of errors is not such significant. For video capturing, the approximations are applied to individual frames, and therefore, any approximation-induced error resides only within those frames. The nature of video data allows for some degree of error without significantly impacting the overall quality, as each frame is independently approximated, and the accumulated error across frames does not propagate. Peak Signal-to-Noise Ratio (PSNR) is an appropriate metric from this perspective.
Similarly, in the case of Deep Neural Network (DNN) inference, each inference is a distinct computation that resets the error. In this scenario, the approximation effectively “resets” after each inference pass, which prevents significant propagation of errors between inferences. This makes FlipBit well-suited to inference-only use cases where maintaining acceptable accuracy at low energy costs is more important than ensuring precise state across multiple iterations.
However, this characteristic of error handling may not generalize to all applications. In particular, scenarios where persistent state is critical, such as local neural network training (such as federated learning for data privacy) on IoT devices, pose a different set of challenges. In such cases, data from one iteration directly influences the subsequent iterations, which could lead to error propagation or accumulation. For example, during local model training, errors introduced into the weights during updates are carried forward into subsequent training iterations. This can lead to accumulated errors that degrade the model’s performance over time, potentially rendering it inaccurate or even divergent. Unlike inference, where the data is essentially “consumed” after each computation, training relies on continuously refining model parameters, making it more vulnerable to compounding errors.
There might be a few ways to address this issue:
- Adaptive approximation managing: decide the approximation acceptance not only based on current write AME, but also the historical error accumulation.
- Periodic evaluation: periodically perform a evaluation to check if the threshold is still appropriate, and adjust it if necessary (for example to break model training in non-convergence).