An unsound cell implementation
Pantos May 08, 2024 #rust #subtyping #variance #ubIntroduction
I recently came across this RustFest 2016 talk giving an example of a manual cell implementation that is inherently unsound.
Let's look at this short example:
Pretty straight forward - our cell is holding a value, that we can set(), despite the fact that
self is immutable. We do this by getting a reference to the contained value, create a *const
raw pointer from it, and cast that to its mutable version *mut. This is then used to directly
overwrite the value inside an unsafe block.
This is unsound for two reasons.
Reason #1: Subtyping? What subtyping?
Consider the following test:
This does not seem right...
The cell content was set to the reference to newval, even though it should only hold references
with static lifetime... Even worse, the reference can still be used after its referenced value
is freed!
So this test should fail hard, right? Well, let's try to...
Execute the test
We run it with cargo test:
running 1 test
test tests::this_is_fine ... ok
Wow, this does not even fail! What the decoherence is going on?
Well, when looking at the comments in the test, we see that we got us a neat use-after-free bug. I still remember getting PTSD from those when I was still coding C - the memory is freed, but the content is still there! We read it, even though we should not be able to, and the test succeeds. But on other occasions, the memory content might have changed.
Let's try to verify this by running cargo test --release:
running 1 test
test tests::this_is_fine ... FAILED
failures:
---- tests::this_is_fine stdout ----
thread 'tests::this_is_fine' panicked at src/lib.rs:42:9:
assertion `left == right` failed
left: 0
right: 13
A-ha! So the implementation of MyCell really is broken. The borrow checker left us stranded in
the desert, letting us compile this junk. But why?
Covariance
The reason why assigning a non-static reference to a MyCell holding a static reference is
possible because MyCell<T> is covariant over T.
Looking at the rustonomicon link, Covariance means:
If
Tis subtype ofU,MyCell<T>will be a subtype ofMyCell<U>.
What? Subtypes? I thought Rust does not have that? Well, not quite - subtyping is used with lifetimes, in order to enable us to use a "more useful" lifetime (covering a bigger region) in a place where only a "less useful" lifetime (covering a smaller region) is required. This is the reason why
'staticis always subtype of any'a.
Covariance is assumed (see rustonomicon table above), because T is owned by MyCell. But with
MyCell<T> being covariant over T, this means that when a MyCell<'a> is required, we can also
use a MyCell<'static>. Furthermore, immutable references are also covariant. So when a
&MyCell<'a> is required, a &MyCell<'static> will do.
Let's look again at the signature of set():
When T is 'a, then &self has to be &MyCell<'a>. But as we learned, Rust is also fine with
&MyCell<'static> in its place, as it thinks it is "more useful". So this is why assigning a
lifetime 'a to MyCell<'static> works!
Note that mutable references are not covariant, but invariant over T. So when using
&mut self instead, the above does not apply. As the "covariance-chain" is interrupted, the
compiler won't accept MyCell<'a> and reject the above code:
38 | cell.set(&newval);
| ^^^^^^^ borrowed value does not live long enough
So how can we fix this, without requiring a mutable self?
PhantomData to the rescue
With PhantomData, we can have an
additional member in MyCell, that prevents the struct from being covariant over T. This won't
add any more actual data to our struct, but will change the way the compiler treats it.
When we check the table in the rustonomicon, we can see that we can use one of:
PhantomData<*mut T>PhantomData<fn(T) -> T>
Both make our struct invariant over T, the main difference is if the struct should be
Send and Sync or not. As we want to tackle
the problem at hand and not agonize over multithreading, we will pick PhantomData<*mut T>.
Now this looks promising! When running cargo test, the compiler will refuse the test and give
us the same error as before:
43 | cell.set(&newval);
| ^^^^^^^ borrowed value does not live long enough
We did it, we fixed our Cell implementation! We are done now, right?
Right...?
Reason #2: Nasal demons
Nope.
There is a nasty thing called Undefined Behavior, about which I will probably do an extra blog post in the future. But here is a short explanation, despite my C PTSD coming back again...
A quick overview
There is a contract between you and the compiler. You are only allowed to do certain things, and so does the compiler. If you do a mistake, the compiler is obliged to complain and not compile the program. But when you do stuff that is considered UB by the standard, the compiler is not obliged to do anything. It can do what it wants! It might nag at you, or just compile the program, but change the actual commands that are executed. Like formatting your hard drive.
Well, usually it doesn't. The compiler creators are on your side (and they use their tool themselves), so they won't put harmful behavior in there. But your UB infested program might just do something that is not described by the source code. Or maybe it will at first, but will do something different after a compiler update. Considering this behavior, you see that it is actually very troublesome to detect UB, because your programming compiling and working like it should does not necessarily mean that your code is sound.
UB in our implementation
Luckily, you can't run into UB when sticking to safe rust, but as we use unsafe blocks, we have to take extra care.
Looking at Rust's list of things considered UB, we can identify one problem: The mutation of bytes pointed to by a shared (i.e. immutable) reference. The only exception allowed is inside an UnsafeCell.
This means that the compiler implicitly "knows" that data inside an (immutably referenced)
UnsafeCell might be mutated, while this is not the case with our implementation.
Actually the compiler would have detected the UB and would have complained, and even proposed
to use an unsafe cell. But for the
sake of example, we used the #[allow] statement to disable the error, to be able to tinker around
with the unsound implementation. If we remove it, our code won't compile. Nice!
Miri to the rescue
There might be other cases that are not as obvious as the one shown here. To detect several different forms of UB, we can use the very neat tool Miri.
First let's delete the previous test and add a new one:
Install Miri with
Run Miri with
This gives us the following error:
running 1 test
test tests::this_is_also_fine ... error: Undefined Behavior: attempting a write access using
<178867> at alloc62087[0x0], but that tag only grants SharedReadOnly permission for this location
So even with the fix, our implementation is still unsound. But at least now we know why and how to detect these kind of errors.
A final try
Apart from the possibility of using UnsafeCell and just going down the path of the stdlib, we
have another option: We can store a raw pointer to the data and use it for access.
When running the above test with Miri, it does not find any UB. And when trying to compile the dangling pointer test, it actually does not compile, nice!
But this has a big drawback: Instead of having the data on the stack, it is now stored on the heap. Not only does this mean we always have to do an additional indirection and follow a pointer, it can also be quite bad for cache efficiency.
Variance yet again
As a sidenote, we could also use a *const instead of a *mut to store value. Casting a
*const raw pointer to a *mut and writing to it is actually not considered UB, as there is no
immutable reference (i.e. &T) pointing to the data while it is mutated.
But with *const, the unsound test from above would compile again. You guessed it - because
*const is covariant over T. We can fix it the same way again, by adding PhantomData:
But as using *mut achieves the same, there is no real reason using *const with Phantomdata.