Co-authored-by: Mathias Stearn <mathias@mongodb.com> GitOrigin-RevId: 81598169837ddb20bfc07024154e7e8358d0d0dd
301 lines
15 KiB
Markdown
301 lines
15 KiB
Markdown
# Exception Architecture
|
||
|
||
MongoDB code uses the following types of assertions that are available for use:
|
||
|
||
- `uassert` and `iassert`
|
||
- Checks for per-operation user errors. Operation-fatal.
|
||
- `tassert`
|
||
- Like uassert in that it checks for per-operation user errors, but inhibits clean shutdown
|
||
in tests. Operation-fatal, but process-fatal in testing environments during shutdown.
|
||
- `massert`
|
||
- Checks per-operation invariants. Operation-fatal.
|
||
- `fassert`
|
||
- Checks fatal process invariants. Process-fatal. Use to detect unexpected situations (such
|
||
as a system function returning an unexpected error status).
|
||
- `invariant`
|
||
- Checks process invariant. Process-fatal. Use to detect code logic errors ("pointer should
|
||
never be null", "we should always be locked").
|
||
|
||
**Note**: Calling C function `assert` is not allowed. Use one of the above instead.
|
||
|
||
The following types of assertions are deprecated:
|
||
|
||
- `MONGO_verify`
|
||
- Checks per-operation invariants. A synonym for massert but doesn't require an error code.
|
||
Process fatal in debug mode. Do not use for new code; use invariant or fassert instead.
|
||
- `dassert`
|
||
- Calls `invariant` but only in debug mode. Do not use!
|
||
|
||
MongoDB uses a series of `ErrorCodes` (defined in [mongo/base/error_codes.yml][error_codes_yml]) to
|
||
identify and categorize error conditions. `ErrorCodes` are defined in a YAML file and converted to
|
||
C++ files using [MongoDB's IDL parser][idlc_py] at compile time. We also use error codes to create
|
||
`Status` objects, which convey the success or failure of function invocations across the code base.
|
||
`Status` objects are also used internally by `DBException`, MongoDB's primary exception class, and
|
||
its children (e.g., `AssertionException`) as a means of maintaining metadata for exceptions. The
|
||
proper usage of these constructs is described below.
|
||
|
||
## Assertion Counters
|
||
|
||
Some assertions will increment an assertion counter. The `serverStatus` command will generate an
|
||
"asserts" section including these counters:
|
||
|
||
- `regular`
|
||
- Incremented by `MONGO_verify`.
|
||
- `warning`
|
||
- Always 0. Nothing increments this anymore.
|
||
- `msg`
|
||
- Incremented by `massert`.
|
||
- `user`
|
||
- Incremented by `uassert`.
|
||
- `tripwire`
|
||
- Incremented by `tassert`.
|
||
- `rollovers`
|
||
- When any counter reaches a value of `1 << 30`, all of the counters are reset and
|
||
the "rollovers" counter is incremented.
|
||
|
||
## Considerations
|
||
|
||
When per-operation invariant checks fail, the current operation fails, but the process and
|
||
connection persist. This means that `massert`, `uassert`, `iassert` and `MONGO_verify` only
|
||
terminate the current operation, not the whole process. Be careful not to corrupt process state by
|
||
mistakenly using these assertions midway through mutating process state.
|
||
|
||
`fassert` failures will terminate the entire process; this is used for low-level checks where
|
||
continuing might lead to corrupt data or loss of data on disk. Additionally, `fassert` will log
|
||
a generic assertion message with fatal severity and add a breakpoint before terminating.
|
||
|
||
To log a custom assertion message and terminate the server, use `LOGV2_FATAL`.
|
||
To avoid printing a stacktrace on failure use `fassertNoTrace` or `LOGV2_FATAL_NO_TRACE`.
|
||
Consider using them if there is only one way to reach this fatal point in code.
|
||
|
||
`tassert` will fail the operation like `uassert`, but also triggers a "deferred-fatality tripwire
|
||
flag". In testing environments, if the tripwire flag is set during shutdown, the process will
|
||
invoke the tripwire fatal assertion. In non-testing environments, there will only be a warning
|
||
during shutdown that tripwire assertions have failed.
|
||
|
||
`tassert` presents more diagnostics than `uassert`. `tassert` will log the assertion as an error,
|
||
log scoped debug info (for more info, see ScopedDebugInfoStack defined in
|
||
[mongo/util/assert_util.h][assert_util_h]), print the stack trace, and add a breakpoint.
|
||
The purpose of `tassert` is to ensure that operation failures will cause a test suite to fail
|
||
without resorting to different behavior during testing. `tassert` should only be used to check
|
||
for unexpected values produced by defined behavior.
|
||
|
||
Both `massert` and `uassert` take error codes, so that all assertions have codes associated with
|
||
them. Currently, programmers are free to provide the error code by either [using a unique location
|
||
number](#choosing-a-unique-location-number) or choosing a named code from `ErrorCodes`. Unique location
|
||
numbers have no meaning other than a way to associate a log message with a line of code.
|
||
|
||
`massert` will log the assertion message as an error, while `uassert` will log the message with
|
||
debug level of 1 (for more info about log debug level, see [docs/logging.md][logging_md]).
|
||
|
||
`iassert` provides similar functionality to `uassert`, but it logs at a debug level of 3 and
|
||
does not increment user assertion counters. We should always choose `iassert` over `uassert`
|
||
when we expect a failure, a failure might be recoverable, or failure accounting is not interesting.
|
||
|
||
### Choosing a unique location number
|
||
|
||
The current convention for choosing a unique location number is to use the 5 or 6 digit SERVER ticket number
|
||
for the ticket being addressed when the assertion is added, followed by a two digit counter to distinguish
|
||
between codes added as part of the same ticket. For example, if you're working on SERVER-12345, the first
|
||
error code would be 1234500, the second would be 1234501, etc. This convention can also be used for LOGV2
|
||
logging id numbers.
|
||
|
||
The only real constraint for unique location numbers is that they must be unique across the codebase. This is
|
||
verified at compile time with a [python script][errorcodes_py].
|
||
|
||
## Exception
|
||
|
||
A failed operation-fatal assertion throws an `AssertionException` or a child of that.
|
||
The inheritance hierarchy resembles:
|
||
|
||
- `std::exception`
|
||
- `mongo::DBException`
|
||
- `mongo::AssertionException`
|
||
- `mongo::UserException`
|
||
- `mongo::MsgAssertionException`
|
||
|
||
See util/assert_util.h.
|
||
|
||
Generally, code in the server should be able to tolerate (e.g., catch) a `DBException`. Server
|
||
functions must be structured with exception safety in mind, such that `DBException` can propagate
|
||
upwards harmlessly. The code should also expect, and properly handle, `UserException`. We use
|
||
[Resource Acquisition Is Initialization][raii] heavily.
|
||
|
||
## ErrorCodes and Status
|
||
|
||
MongoDB uses `ErrorCodes` both internally and externally: a subset of error codes (e.g.,
|
||
`BadValue`) are used externally to pass errors over the wire and to clients. These error codes are
|
||
the means for MongoDB processes (e.g., _mongod_ and _mongo_) to communicate errors, and are visible
|
||
to client applications. Other error codes are used internally to indicate the underlying reason for
|
||
a failed operation. For instance, `PeriodicJobIsStopped` is an internal error code that is passed
|
||
to callback functions running inside a [`PeriodicRunner`][periodic_runner_h] once the runner is
|
||
stopped. The internal error codes are for internal use only and must never be returned to clients
|
||
(i.e., in a network response).
|
||
|
||
Zero or more error categories can be assigned to `ErrorCodes`, which allows a single handler to
|
||
serve a group of `ErrorCodes`. `RetriableError`, for instance, is an `ErrorCategory` that includes
|
||
all retriable `ErrorCodes` (e.g., `HostUnreachable` and `HostNotFound`). This implies that an
|
||
operation that fails with any error code in this category can be safely retried. We can use
|
||
`ErrorCodes::isA<${category}>(${error})` to check if `error` belongs to `category`. Alternatively,
|
||
we can use `ErrorCodes::is${category}(${error})` to check error categories. Both methods provide
|
||
similar functionality.
|
||
|
||
To represent the status of an executed operation (e.g., a command or a function invocation), we
|
||
use `Status` objects, which represent an error state or the absence thereof. A `Status` uses the
|
||
standardized `ErrorCodes` to determine the underlying cause of an error. It also allows assigning
|
||
a textual description, as well as code-specific extra info, to the error code for further
|
||
clarification. The extra info is a subclass of `ErrorExtraInfo` and specific to `ErrorCodes`. Look
|
||
for `extra` in [here][error_codes_yml] for reference.
|
||
|
||
MongoDB provides `StatusWith` to enable functions to return an error code or a value without
|
||
requiring them to have multiple outputs. This makes exception-free code cleaner by avoiding
|
||
functions with multiple out parameters. We can either pass an error code or an actual value to a
|
||
`StatusWith` object, indicating failure or success of the operation. For examples of the proper
|
||
usage of `StatusWith`, see [mongo/base/status_with.h][status_with_h] and
|
||
[mongo/base/status_with_test.cpp][status_with_test_cpp]. It is highly recommended to use `uassert`
|
||
or `iassert` over `StatusWith`, and catch exceptions instead of checking `Status` objects
|
||
returned from functions. Using `StatusWith` to indicate exceptions, instead of throwing via
|
||
`uassert` and `iassert`, makes it very difficult to identify that an error has occurred, and
|
||
could lead to the wrong error being propagated.
|
||
|
||
## Using noexcept
|
||
|
||
Server code should generally be written to be exception safe. Historically,
|
||
we've had bugs due to code being overzealously marked `noexcept`. In such
|
||
contexts, throwing an exception crashes the server, which can compromise
|
||
availability. However, _just_ removing `noexcept` from such code is not a viable
|
||
solution \- exception unsafe code may _need_ to crash in order to avoid causing
|
||
an even worse failure. We want to work towards ensuring that functions that
|
||
ought to be are in fact exception safe, and remove `noexcept` usage where it's
|
||
not warranted. Here, we outline guidelines for doing so.
|
||
|
||
Noexcept is a runtime check that terminates the process rather than allowing
|
||
the function to exit because of a throw. Noexcept may be used when it can be
|
||
thought of as a bug for any uncaught exception to be thrown. There is no
|
||
compile-time check that exceptions will not be thrown within a `noexcept`
|
||
function. Instead, putting `noexcept` on a function may be thought of as similar
|
||
to using invariant in the following way:
|
||
|
||
```c
|
||
// Example noexcept code.
|
||
void func() noexcept {
|
||
...
|
||
}
|
||
|
||
// Similar alternative pseudocode.
|
||
void func() try {
|
||
...
|
||
} catch (...) {
|
||
invariant(!"unexpected exception");
|
||
}
|
||
```
|
||
|
||
**As with invariant, be very careful when putting `noexcept` on a function that
|
||
interacts with untrusted input.** This has been the root cause of serious past
|
||
bugs.
|
||
|
||
### Adding or Removing noexcept
|
||
|
||
When considering removing `noexcept` from a function, the author of that change
|
||
must ensure that the function’s implementation and its callsites are not
|
||
relying on the function not throwing for correctness. Because of this, **be
|
||
careful putting `noexcept` on a function** if there’s a chance it may need to be
|
||
removed later. `noexcept` generally **should not be used** solely for reasons of
|
||
performance optimization. Aside from the cases listed in the next section, it
|
||
should not be assumed to improve performance without solid evidence.
|
||
|
||
If a part of the implementation would benefit from relying on not throwing, but
|
||
`noexcept` is not meant to be a part of the function’s contract, it is acceptable
|
||
to use a try/catch/invariant construction similar to the example above or an
|
||
internal `noexcept` helper function.
|
||
|
||
When adding or removing `noexcept`, also consider what types of exceptions are
|
||
possible in that context and in our codebase. Refer to the “Where Exceptions
|
||
are Possible” section for more details.
|
||
|
||
If you are uncertain about adding or removing `noexcept` in a given situation,
|
||
reach out to \#server-programmability on slack.
|
||
|
||
### Cases Where noexcept is Encouraged
|
||
|
||
This list is not exhaustive and there are cases not enumerated here that are
|
||
valid uses of `noexcept`.
|
||
|
||
#### Move operations
|
||
|
||
Using `noexcept` with move operations allows operations to skip generating
|
||
exception handling code. If a type’s move operation will not throw exceptions,
|
||
it is strictly worse not to use `noexcept`. For instance, std::vector\<T\> can
|
||
use optimized versions of certain operations when T has `noexcept` move
|
||
operations. In these cases, **`noexcept` can be considered a requirement**. Of
|
||
course, if a move operation genuinely needs to throw exceptions, then don’t
|
||
mark it `noexcept`. This should be very rare – moves should be non-throwing in
|
||
almost all cases.
|
||
|
||
#### Swap operations
|
||
|
||
Allows callers to optimize for an exception-free pathway. **Swap operations
|
||
should follow the same `noexcept` guidelines as move operations**.
|
||
|
||
#### Hash functions
|
||
|
||
Allows some hashing library types to optimize for an exception-free pathway.
|
||
This can even affect the behavior, performance, and even layout of certain
|
||
container types (such as libstdc++’s
|
||
[unordered_map](https://gcc.gnu.org/onlinedocs/libstdc++/manual/unordered_associative.html)).
|
||
**Hash functions should follow the `noexcept` guidelines as move operations.**
|
||
|
||
#### Destructors and “Destructor-Safe” Functions
|
||
|
||
Destructors are generally implicitly `noexcept`, and are encouraged to remain
|
||
implicitly `noexcept` \- that is, by not marking them with `noexcept(false)`.
|
||
Functions where “destructor safety” is a core part of their functionality **may
|
||
be marked `noexcept`**. This is not a requirement – destructors are allowed to
|
||
call potentially-throwing functions. It is also not a blanket recommendation to
|
||
consider `noexcept` for all functions called from destructors. When calling a
|
||
potentially-throwing function from a destructor, think about whether or not it
|
||
can indeed throw in that context, and if exceptions need to be handled. If it
|
||
can indeed throw in that context, exceptions almost certainly need to be
|
||
handled \- otherwise the server will crash.
|
||
|
||
The lambda passed to `ON_BLOCK_EXIT()` and `ScopeGuard()` should be treated
|
||
similarly to destructors: it is executed in a `noexcept` context (a destructor)
|
||
and marking it as such is discouraged as being noisy. But code intended to be
|
||
called from them can be.
|
||
|
||
### Where Exceptions are Possible
|
||
|
||
In our codebase, generally DBException is the only type of exception that
|
||
should be crossing API boundaries. If an exception other than a DBException
|
||
does cross an API boundary, it should be considered a bug. Whichever component
|
||
throws the exception should handle it locally, even if only by translating it
|
||
to a DBException. Generally any caller you would consider to be an external
|
||
caller should be able to rely on DBException being the only exception type your
|
||
function will throw.
|
||
|
||
Allocations using the global new allocator or std::allocator in our codebase do
|
||
not throw, instead terminating the process directly when OOM conditions are
|
||
encountered. As such, there is no need to handle exceptions from these sources.
|
||
|
||
## Gotchas
|
||
|
||
Gotchas to watch out for:
|
||
|
||
- Generally, do not throw an `AssertionException` directly. Functions like `uasserted()` do work
|
||
beyond just that. In particular, it makes sure that the `getLastError` structures are set up
|
||
properly.
|
||
- Think about the location of your asserts in constructors, as the destructor would not be
|
||
called. But at a minimum, use `wassert` a lot therein, we want to know if something is wrong.
|
||
- Do **not** throw in destructors or allow exceptions to leak out (if you call a function that
|
||
may throw).
|
||
|
||
[raii]: https://en.wikipedia.org/wiki/Resource_acquisition_is_initialization
|
||
[error_codes_yml]: ../src/mongo/base/error_codes.yml
|
||
[periodic_runner_h]: ../src/mongo/util/periodic_runner.h
|
||
[status_with_h]: ../src/mongo/base/status_with.h
|
||
[idlc_py]: ../buildscripts/idl/idlc.py
|
||
[status_with_test_cpp]: ../src/mongo/base/status_with_test.cpp
|
||
[errorcodes_py]: ../buildscripts/errorcodes.py
|
||
[assert_util_h]: ../src/mongo/util/assert_util.h
|
||
[logging_md]: logging.md
|