On READ_ONCE and Compiler Opts¶
Version History
Date | Description |
---|---|
Apr 13, 2020 | Initial Version |
I decide to write this blog after I once again got tricked by GCC optimizations. I was designing a simple single-producer-single-consumer ring buffer. Since there is a small time gap between slot-being-allocated and slot-being-usable (i.e., data filled), the producer will set a non-atomic flag once the data is filled thus usable. The consumer, running on a seperate CPU, will repeatly checking the usable flag after it has grabbed the slot.
Simple, right? Yet I ran into a lot random stuck during testing.
I didn’t even check the ring buffer design as I was so confident.
There was no timeout checking either. After some digging,
I realized I missed using READ_ONCE
when consumer thread is polling for the usable flag.
Yeah, once again, gcc -O2
tricked me:
it will optmize away repeated memory accesses
if it thinks the accessed variable/data is thread-local.
For instance, the following code snippet shows how gcc -O2 removes the memory access part.
Without -O2, a simple assembly loop is generated. With -O2, gcc generates a deadlock itself.
Original C Assembly Assembly
(gcc -S) (gcc -S -O2)
int x; | |
| .L2: | .L2:
/* Spin until x becomes true */ | movl x(%rip), %eax | jmp .L2
void wait_for_x(void) | cmpl $1, %eax |
{ | je .L2 |
while (x == 1) | |
; | |
} | |
Why this is happening? Because gcc thinks vairable x
is thread-local and will not be accessed
by multiple threads at the same time. Thus gcc thinks the above while (x == 1) ;
check will never break,
so generating an assembly deadlock jmp loop.
Why does this matter? Assume x
is a shared variable.
In the following code snippet, there are two threads, A and B.
Thread A wait until B change x
to 1.
If we compile with -O2, thread A will deadlock.
And this was my bug above.
int x; /* a global shared variable*/
Thread A Thread B
/* Spin until x becomes true */ | /* Set x at some point */
void wait_for_x(void) | x = 1;
{ |
while (x == 1) |
; |
} |
The common approach, is to add volatile
modifier, to explicitly express the concurrency issue.
But volatile is considered harmful by linux kernel, and I agree with it.
I generally use READ_ONCE
, WRITE_ONCE
, ACCESS_ONCE
macros.
They “tell” gcc that the particualr variable is a shared global variable,
thus for each time a C statment is running, the variable should be accessed once and exactly once.
The fix for above case is: while (READ_ONCE(x == 1)) ;
.
I will not go into details about why and how those macros are implemented. For more information, refers to source code, ktsan wiki.
Hope you enjoyed this simple bug-documentation blog.