SE-0512: Document that `Mutex.withLockIfAvailable(_:)` cannot spuriously fail

Proposal: SE-0512
Authors: Jonathan Grynspan
Review Manager: John McCall
Status: Implemented (Swift 6.0)
Implementation: swiftlang/swift#85497
Review: (pre-pitch)

Summary of changes

Modifies the documentation of Mutex.withLockIfAvailable(:)) to add a guarantee that it is not subject to spurious failures. The implementation already cannot spuriously fail.

Motivation

The documentation for Mutex.withLockIfAvailable(_:) does not specify if it can spuriously fail to acquire the mutex. Normally, the lack of any discussion wouldn't be an issue since the current implementation cannot, in fact, spuriously fail. Functions are generally expected to document the failure conditions they actually have, not everything they don't. However, some peer languages of Swift do allow their analogous APIs to spuriously fail. Programmers therefore could reasonably be confused about whether this is possible in Swift.

What is a "spurious failure"?

Consider the following code snippet:

let optionalResult: T? = mutex.withLockIfAvailable { value in ... }

Mutex.withLockIfAvailable(_:) allows a thread to check whether a lock is available. It never waits for another thread to be finished with the lock. Instead, if it observes that another thread has successfully acquired the lock, it must immediately return nil. If it returns nil, it is said to have failed.

Most mutex interfaces offer an operation like this. Most of these APIs, such as POSIX's pthread_mutex_trylock() and the mutexes included in various languages' standard libraries, only allow this operation to fail when another thread has successfully acquired the lock. However some APIs, specifically C's mtx_trylock() and C++'s std::mutex::try_lock(), permit this operation to fail for arbitrary other reasons. If the operation fails for some reason other than that another thread has successfully acquired the lock, it is said to have spuriously failed.

Why should we document our behavior?

There is a split between the C/C++ standards, POSIX, and other language standards as to whether tryLock() might fail spuriously. We should document our behavior to make it clear to developers whether Mutex behaves like a C/C++ mutex or like one from another language. Without this documentation, Swift developers cannot even infer the behavior of Swift based on other languages because they do not all agree.

Why do C and C++ allow spurious failure at all?

C11's mtx_trylock() and C++11's std::mutex::try_lock() are both allowed to spuriously fail[^doThey]. Per the C++11 standard §30.4.1.1/16:

An implementation may fail to obtain the lock even if it is not held by any other thread. [ Note: This spurious failure is normally uncommon, but allows interesting implementations based on a simple compare and exchange [...] ]

The standard is referring to weak compare-and-exchange operations which can fail spuriously at the hardware level; strong compare-and-exchange operations cannot fail in this manner. The intent is to allow conforming implementations to use weak compare-and-exchange operations in their mutexes to improve performance on CPU architectures where strong compare-and-exchange operations are more complex to implement (typically involving retry loops at the assembly level).

[^doThey]: I'm not aware of any real-world C/C++ standard library implementation that actually has this failure mode. But if you choose to code defensively to these language standards, you have chosen to accept spurious failures into your life, and I don't know any software engineer who enjoys dealing with those.

Other languages

POSIX and the standard libraries of other languages have equivalent API:

None of these methods document that they spuriously fail (other than the Rust-specific concept of a "poisoned" mutex which is not germane to this proposal). A reader without further context has no reason to expect that they can spuriously fail because any API that can spuriously fail would surely document it.

Where does Swift stand?

Swift's current documentation is silent on the matter. As a developer who uses the Swift language and its standard library, if I were approaching this documentation in a vacuum, I wouldn't even consider the possibility of a spurious failure. As a rule we don't concern ourselves with "oh it might randomly fail" as something we need to guard against.

But a developer coming from another language will be bringing with knowledge of that language and its standard library. A developer coming to Swift from C or C++ might rightly ask if Mutex.withLockIfAvailable(_:) is safe to use.

I've reviewed the platform-specific implementations of _MutexHandle._tryLock() in the Swift repository and have confirmed that none of these implementations is subject to spurious failure (or, at least, none documents any such failure mode):

| Platform | Implementation Based On | Uses cmpxchg in Swift? | Documents Possible Spurious Failures? | Confirmed from Source Inspection? | |-|-|-|-|-| | Darwin | os_unfair_lock_trylock() | No | No | Yes | | FreeBSD | UMTX_OP_MUTEX_TRYLOCK | No | No | Yes | | Linux/Android | Atomic.compareExchange()/FUTEX_TRYLOCK_PI | Strong | No | Yes | | OpenBSD | pthread_mutex_trylock() | No | No | Yes[^openBSDCAS] | | Wasm | Atomic.compareExchange() | Strong | No-7msfy) | Yes | | Windows | TryAcquireSRWLockExclusive() | No | No | No[^windowsImpl] |

[^openBSDCAS]: OpenBSD uses the GCC builtin _sync_val_compare_and_swap() in its implementation of pthread_mutex_trylock() here. This builtin compiles to a _strong compare-and-swap operation. [^windowsImpl]: Windows is, of course, closed-source, and Microsoft's implementation of TryAcquireSRWLockExclusive() is proprietary. My conclusions are based on Microsoft's documentation and a careful reading of the Wine reimplementation here.

In other words, although we don't (yet) document it, Swift's Mutex.withLockIfAvailable(:) implementations do not spuriously fail. These implementations are, of course, _implementation details and are subject to change over time, but any change to Mutex that causes us to start spuriously failing is a breaking change because it could wreak havoc on code that uses Mutex.withLockIfAvailable(_:) today.

Is there a real-world scenario where a spurious failure is a problem?

For a real-world case study, see Swift Testing's use of Mutex.withLockIfAvailable(_:) as described in this forum post.

In a nutshell: spurious failures in Mutex.withLockIfAvailable(_:) are indistinguishable from real failures (which occur due to the lock being contended).

If the lock is held by another thread, a caller can reasonably assume that the other thread will eventually release the lock and may opt to fall back to Mutex.withLock(:)) or may retry calling Mutex.withLockIfAvailable(:) in a loop (which effectively turns Mutex into a spinlock—I'm not _recommending you take this approach, just listing it as a possible solution).

If the lock is held by the current thread, any solution (other than giving up on acquiring the lock) must result in a deadlock because the caller cannot tell that the failure was real and forward progress is not possible.

Can you appeal to authority for me?

Raymond Chen over at Microsoft gave a decent description on his blog of the difference between weak and strong compare-and-exchange operations and pointed out that you can only really accept a weak compare-and-exchange if failure is cheap:

It comes down to whether spurious failures are acceptable and how expensive they are.
[...]
On the other hand, if recovering from the failure requires a lot of work, such as throwing away an object and constructing a new one, then you probably want to pay for the extra retries inside the strong compare-exchange operation in order to avoid an expensive recovery iteration.
And of course if there is no iteration at all, then a spurious failure could be fatal.

Mr. Chen isn't specifically talking about mutexes here, but rather about compare-and-exchange operations in general. If you are dealing directly with an atomic operation, you are presumably fine-tuning your code and can pick between weak or strong compare-and-exchange operations as appropriate. Swift's Mutex is a general-purpose type, so it cannot know if failure to acquire it is cheap or expensive. Any implementation of Mutex that is ultimately based on a compare-and-exchange operation must pick one or the other up front (or provide two parallel interfaces—see Alternatives considered below.)

Proposed solution

I propose that the documentation for Mutex.withLockIfAvailable(_:) should specifically and clearly indicate that it cannot spuriously fail. That way, developers who use Mutex will know what to expect from it and will (hopefully) avoid being confused if they've also read the documentation for the C/C++'s equivalents.

Detailed design

I propose we strengthen the language in the Return Value documentation section along the lines of:

- The return value, if any, of the `body` closure parameter or `nil` if the lock
- couldn’t be acquired.
+ If the lock is acquired, the return value of the `body` closure. If the lock
+ is already held by any thread (including the current thread), `nil`.

And that we explain what we mean with a callout or similar in the Discussion section such as:

+ - Note: This function cannot spuriously fail to acquire the lock. The behavior
+   of similar functions in other languages (such as C's `mtx_trylock()`) is
+   platform-dependent and may differ from Swift's behavior.

Source compatibility

This change only affects the documentation for an existing symbol. It has no impact on existing Swift source.

ABI compatibility

This proposal is purely a documentation change which can be implemented without any ABI support.

Implications on adoption

The documentation change itself is not something to be "adopted" by developers. However (subjectively speaking) the existence of this guarantee allows those developers to use Mutex.withLockIfAvailable(_:) without needing to be concerned with spurious failures.

This proposal does not change how code is compiled or how it runs. Rather, it strengthens the execution-time guarantees of the existing API. Callers that were relying on spurious failure not happening will continue to work. Callers that were protecting against spurious failure will continue to not experience it and can be simplified at their maintainers' leisure.

Future directions

When we add support in Swift for new platforms, we will need to take care that their implementations of Mutex are also not subject to spurious failures.

There may be a real-world use case for something like Mutex.withLockIfAvailable(weak: true), and it can be added in the future. On most platforms Swift targets, it will behave identically to the existing API. On a subset of platforms where we have control over the relevant compare-and-exchange operations, it will allow developers to choose a weak operation if needed.

Alternatives considered

Making no change.

This leaves our documentation ambiguous to e.g. systems programmers coming from C/C++ vs. those coming from Rust/Go/etc.

Documenting that Mutex.withLockIfAvailable(:) _is subject to spurious failures.

This lets us be more flexible in how we implement the function on current and future platforms, but it is not currently true in any of our implementations. Allowing for spurious failures would therefore be a breaking change. Besides, spurious failures violate the principle of least surprise and are not something you'd expect from a programming language that markets itself as safe and secure! To quote the documentation for GCC's __atomic_compare_exchange_n() (which clang's documentation refers us to): > When in doubt, use the strong variation.

Acknowledgments

Thanks to the community members who provided feedback when I initially raised this issue in the Swift forums.