False Sharing

False Sharing

Recently, we want to enable mavx2 in our project, but that results in core dumps in our test suites. Using gdb to investigate the issue, it was found that all the core dumps occur due to unaligned access in Folly. Upon reviewing the assembly code produced by the compiler, it was found that Clang generates vmovaps, which stands for Move Aligned Packed Single-Precision Floating-Point Values. This instuction needs a 16byte aligned variable. In contrast, using GCC appears to resolve the issue, as no core dumps occur(Apparently, that is incorrect since Clang has used more aggressive optimization). A inappropriate approach to fixing the issue is to remove alignas from the Folly. But why does Folly use alignas despite it will employ techniques like placement new which can violate memory alignment? What will happen if we simply remove alignas? The most important reason may be to prevent False Sharing.

Definition #

False sharing is a performance issue that arises in multithreaded programs when multiple threads access different variables located on the same cache line. Although the threads operate on separate data, the cache coherence protocol treats the entire cache line as a single unit. This results in unnecessary invalidations and cache misses, degrading performance. So Folly uses alignas to separate each variable into different cachelines, which can avoid accessing the same cacheline when accessing different variables.

Solution #

We found two issues related to unaligned access on Github. However, it seems that Folly will not resolve this issue in the near future. Therefore, we have to make tradeoffs between enabling avx and removing alignas. Finally we decided to remove alignas.