Paper Notes: FlipBit - Save Energy for Flash Memory Write with Approximation

Last updated on November 11, 2024 am

Paper Notes: FlipBit - Save Energy for Flash Memory Write with Approximation

Alexander Buck, Karthik Ganesan, and Natalie Enright Jerger. FlipBit: Approximate Flash Memory for IoT Devices. IEEE International Symposium on High-Performance Computer Architecture (HPCA), 2024. https://doi.org/10.1109/HPCA57654.2024.00072

1. Observation and Motivation

  1. Flash memory write is the energy and latency bottleneck for IoT devices.
  2. Specific inside flash writing, it presents a write asymmetry property. That is, when setting a bit from 0 to 1, it charges the entire block to 1 firstly, and then drains the unnecessary 1s to 0. This is energy-consuming. As for the opposite operation, it is quick and energy-efficient.
  3. Such writing method also contributes to flash memory wear-out.
  4. The data processed and stored by flash in IoT devices is often noisy, providing a room for approximate computing.

For data collected by sensors, which is often the case in IoT devices, it is common to have inherent uncertainty or variability in data.

For example, GPS data may contain noise on speed measurements due to limited sensor precision or signal interference. This may lead to unrealistic speed values.

Approximate computing leverages this inborn noisiness, regarding is as an opportunity to save energy, provide tolerance and improve performance.

2. FlipBit: Approximate Flash Memory Write

Thus, the proposed method for energy-efficient flash memory write is to approximate the data to be written, named FlipBit:

  1. If the data written to block only requires 1 to 0 changes, it is written directly.
  2. If the write contains 0 to 1 changes, the operation is approximated to the nearest 1-to-0-only write, which can be done in a single step.
  3. If the approximate rate exceeds a threshold, the block will perform a precise write to avoid data divergence.

This method can save energy and reduce wear-out, while maintaining the general data integrity.

For example, for a 8-bit flash memory block, assume it is currently 11010100 (212 in base 10) and the data required to be written is 11001111 (207 in base 10). According to the FlipBit method, the write operation would be approximated to 11010000 (208 in base 10), which only contains 1-to-0 changes with error rate of (208 - 207) / 207 = 0.00483 = 0.483%.

Bit 7 6 5 4 3 2 1 0 # 0 -> 1 # 1 -> 0 Base 10
Original 1 1 0 1 0 1 0 0 \ \ 212
Exact 1 1 0 0 1 1 1 1 3 1 207
Approx 1 1 0 1 0 0 0 0 0 1 208

3. Implementation and Realization

3.1. Exhaustive Searching

The intuitive algorithm (aka the baseline algorithm) for FlipBit is to iterate through the search domain and find the value that minimizes the error rate (i.e., the closest to exact value). The search domain is defined as, bits combinations that only keep all 0s in the original bits, where other bits are allowed to flip. As the memory block size increases, the search domain grows exponentially, leading to high computational complexity and implementation difficulty.

3.2. One-Bit Approximation

By iterating through the bits from the most significant bit (MSB), the one-bit approximation algorithm set that if a approx bit is flipped from 1 (expect) to 0 (approx output), the approx bits after it whose original corresponding bits are 1 will be set to 1, as a compensation to the deduction. This implies that the approximated value is always no larger than the exact value.

  • set_ones initialized to False
  • for i in range(block_size):
    • original_bit is the i-th bit of the original data
    • exact_bit is the i-th bit of the exact data
    • approx_bit is the i-th bit of the approximate data
    • if original_bit is 1
      • if exact_bit is 1 or set_ones is True
        • approx_bit is 1
      • else:
        • approx_bit is 0
    • else if exact_bit is 1
      • set_ones set to True
      • approx_bit is 0
    • else:
      • approx_bit is 0

This algorithm is easier to implement but may not always provide the optimal solution.

A Python implementation might make the flow clearer:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
def one_bit_approx(origin: str, exact: str) -> str:
output = []
set_ones = False
for orig_bit, exact_bit in zip(origin, exact):
if orig_bit == "1":
if exact_bit == "1" or set_ones:
output.append("1")
else:
output.append("0")
elif exact_bit == "1":
set_ones = True
output.append("0")
else:
output.append("0")
return "".join(output)

3.3. N-Bit Approximation

As a combination of previous two algorithms, the N-bit approximation algorithm is designed to check n bits at once.

This relies on truth tables to determine the optimal approximation.

In the truth table, x represents a context-agnostic bit, _ represents an undecided bit.

Original Exact Approx Changes from Exact to Approx Remark
0x xx 0_ Smaller or equal As long as the original bit is 0, the approx bit cannot be 1
1x 1x 1_ Equal Just like one-bit approx, if the original bit and the exact bit are both 1, the approx bit should be 1
10 00 0_ Equal It’s easy to set 1 to 0
10 01 1_ Larger Keep the 1 bit, as its difficult to get 1 back
11 00 0_ Equal It’s easy to set 1 to 0
11 01 0_ Equal It’s easy to set 1 to 0

The algorithm is as follows:

  • set_ones initialized to False
  • set_zeros initialized to False
  • for i in range(block size):
    • original_bit is the i-th bit of the original data
    • exact_bit is the i-th bit of the exact data
    • approx_bit is the i-th bit of the approximate data
    • if set_zeros is False:
      • if original_bit is 1:
        • if exact_bit is 1 or set_ones is True:
          • approx_bit is 1
        • else if ReadTruthTable(original_bits, exact_bits) is 1:
          • approx_bit is 1
          • set_zeros set to True
        • else:
          • approx_bit is 0
      • else if exact_bit is 1:
        • set_ones set to True
        • approx_bit is 0
      • else:
        • approx_bit is 0
    • else:
      • approx_bit is 0

Basically if 0x (original) and 1x (exact) condition (a subset of 0x and 1x, i.e., the first row in the truth table) is triggered, which might make the approx smaller than the exact, the set_ones flag would be triggered to compensate the loss by feeding 1s to approx.

Similarly for 10 (original) and 01 (exact) condition, the set_zeros flag would be triggered to provide 0s to make the approx closer to the exact.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
def n_bit_approx(origin: str, exact: str) -> str:
output = []
set_ones = False
set_zeros = False

for orig_bit, exact_bit in zip(origin, exact):
if not set_zeros:
if orig_bit == "1":
if exact_bit == 1 or set_ones:
output.append("1")
elif read_truth_table(exact, origin, i):
output.append("1")
elif exact_bit == "1":
set_ones = True
output.append("0")
else:
output.append("0")
else:
output.append("0")
return "".join(output)

While the N-bit approximation is presented in iteration, it can be parallelized on hardware with repeated logic units, thus is hardware-friendly.

4. Evaluation

FlipBit is evaluated under two common IoT devices’ use cases: 1) sense and send data, and 2) compute and send data, with video capturing and deep neural network (DNN) inference as the representative applications.

5. Some Thoughts

The mean absolute error (MAE) of each write is tracked as a threshold that determines whether a precise write is necessary. The threshold is designed to be configurable, allowing the system to adapt to different application scenarios, where energy savings can be traded off against accuracy. However, a question arises regarding the accumulation of errors over time. Specifically, will these approximations, when compounded across multiple writes, lead to an accumulation of error that ultimately affects the correctness of the system’s outcome?

In the tested use cases presented in the paper, the accumulation of errors is not such significant. For video capturing, the approximations are applied to individual frames, and therefore, any approximation-induced error resides only within those frames. The nature of video data allows for some degree of error without significantly impacting the overall quality, as each frame is independently approximated, and the accumulated error across frames does not propagate. Peak Signal-to-Noise Ratio (PSNR) is an appropriate metric from this perspective.

Similarly, in the case of Deep Neural Network (DNN) inference, each inference is a distinct computation that resets the error. In this scenario, the approximation effectively “resets” after each inference pass, which prevents significant propagation of errors between inferences. This makes FlipBit well-suited to inference-only use cases where maintaining acceptable accuracy at low energy costs is more important than ensuring precise state across multiple iterations.

However, this characteristic of error handling may not generalize to all applications. In particular, scenarios where persistent state is critical, such as local neural network training (such as federated learning for data privacy) on IoT devices, pose a different set of challenges. In such cases, data from one iteration directly influences the subsequent iterations, which could lead to error propagation or accumulation. For example, during local model training, errors introduced into the weights during updates are carried forward into subsequent training iterations. This can lead to accumulated errors that degrade the model’s performance over time, potentially rendering it inaccurate or even divergent. Unlike inference, where the data is essentially “consumed” after each computation, training relies on continuously refining model parameters, making it more vulnerable to compounding errors.

There might be a few ways to address this issue:

  • Adaptive approximation managing: decide the approximation acceptance not only based on current write AME, but also the historical error accumulation.
  • Periodic evaluation: periodically perform a evaluation to check if the threshold is still appropriate, and adjust it if necessary (for example to break model training in non-convergence).

Paper Notes: FlipBit - Save Energy for Flash Memory Write with Approximation
https://lingkang.dev/2024/11/11/read-paper-flipbit/
Author
Lingkang
Posted on
November 11, 2024
Licensed under