Performance Issue of Memory Alignment
Last updated on July 26, 2023 pm
Performance Issue of Memory Alignment
We have already discussed some tricks for aligning numbers. In this post, we will explore the difference in code performance between aligned and unaligned memory access, especially when accessing and assigning to the members of struct
(and possibly class
). A piece of ready-to-go C++ code is provided in the appendix.
Design
We plan to use struct
to demonstrate the performance difference. Basically, it would contain three members: char
, int
and double
. We will try to align them in different ways and see the performance difference.
1 |
|
The size of each type is as follows, but the size of Struct1
, which is presumably the sum of the size of its members, is not guaranteed.
Data Type | Size in Bytes |
---|---|
char |
1 |
int |
4 |
double |
8 |
Struct1 |
16 |
If we run with the code sizeof(Struct1)
, the result might actually be 16, because the compiler does some padding to make the size of Struct1
a multiple of 4, trying to make it not too inefficient. And we won’t touch this part for simplicity’s sake.
By permutation without repetition, we have 6 ways to align the members of our struct
, namely Struct1
to Struct6
(see the appendix test.hh
).
Implementation
The idea is simple:
- Initialize an object;
- Assign values to its members;
- Loop for many many times;
- Record the total time consumed of one type.
- Compare the time across different types.
So the basic code looks like this:
1 |
|
Just replace Struct1
with Struct2
… Struct6
and we can get the time consumed by each type.
But some changes are needed to make the code more readable and convenient to collect data.
- Implemented a
template
with generic type to avoid code duplication. - Used a vector
time
to store the time consumed by each type. - For each value of
iter
, we run the test for 5 times, or, 5epoch
s. - Used two arrays
min_arr
andmax_arr
to keep track of which type contributes to the max and min consumed time. - Save the result as a csv file for further analysis.
- Prompt out current iteration and epoch so that we can know the progress.
For example, say after one epoch
, when we get the time
vector with values 294146, 276858, 265321, 267835, 278274, 282499
, we can know that Struct3
is the fastest and Struct1
is the slowest. Then we can increment min_arr[2]
and max_arr[0]
(indies start at 0) by 1
, this is done by the count()
function.
Note also that the order in which the members are accessed counts. For example, if we access char
first, then int
, then double
, this is different from accessing double
first, then int
, then char
. This will also affect the performance. Here, we simply access them in the reverse order of their declaration.
Implementation details can be found in the appendix test.cpp
.
Result
We calculated iter
from 100000000 to 100001000 with step 10, and epoch
from 1 to 5. This means that we have a total of 500 rows of data. So the result is quite accurate and convincing from statistical perspective.
The final min_arr
and max_arr
are as follows:
Struct1 |
Struct2 |
Struct3 |
Struct4 |
Struct5 |
Struct6 |
|
---|---|---|---|---|---|---|
MAX Count | 288 | 209 | 2 | 4 | 0 | 2 |
MIN Count | 0 | 0 | 50 | 67 | 331 | 57 |
We can see that in general Struct5
is the fastest and Struct1
is the slowest.
And they are quite dominant over other types.
The full results can be found here.
Analysis
This result is quite self-explanatory. If we check the size of these struct
s, they are not all 16 bytes.
Data Type | Size in Bytes |
---|---|
Struct1 |
16 |
Struct2 |
24 |
Struct3 |
16 |
Struct4 |
24 |
Struct5 |
16 |
Struct6 |
16 |
We can see that Struct2
and Struct4
are 24 bytes, which is not a multiple of 8. The compilers do padding and alignment based on 8 bytes for each structure member. GCC/g++ also has an alignment attribute that can be used to control the alignment of structure members. For example, we can use __attribute__((aligned(8)))
to ensure that the structure is aligned to 8 bytes when using GCC/g++.
References
- Data alignment for speed: myth or reality? – Daniel Lemire’s blog
- Memory Alignment and Performance · Fylux
- Data alignment: Straighten up and fly right - IBM Developer
- c - How to determine if memory is aligned? - Stack Overflow
- Attribute Syntax (Using the GNU Compiler Collection (GCC))
- Common Type Attributes (Using the GNU Compiler Collection (GCC))
Appendix
Environment and Compilation
- Compiler: MinGW32 GCC 9.2.0
- OS: Windows 10
- Arch: X86_64
Build with command g++ -O0 test.cpp -o test
.
test.hh
1 |
|
test.cpp
1 |
|
plot.py
Also include a Python script to plot the result, as a backup.
1 |
|