366 lines
23 KiB
Markdown
366 lines
23 KiB
Markdown
# Modules API
|
|
|
|
## What is a module
|
|
|
|
A module:
|
|
|
|
- Provides a coherent public API
|
|
- Has internal details that are not intended to be directly accessed from outside the module
|
|
- Is a set of files covering the API (headers), implementations (headers and cpp files), and tests
|
|
|
|
### Submodules
|
|
|
|
TODO
|
|
|
|
## Why are we doing this?
|
|
|
|
Having a clear delineation between public and private APIs for each module will improve the
|
|
maintainability and velocity of our codebase. Teams will have more freedom to evolve their
|
|
internal implementation details without affecting consumers. Consumers will benefit from
|
|
knowing what APIs are intended for their consumption.
|
|
|
|
## Assigning files to modules
|
|
|
|
The file `modules_poc/modules.yaml` contains a list of modules, each containing
|
|
a list of files. Each file must be contained in only one module. Note that
|
|
module assignment is not required to map neatly to team ownership.
|
|
|
|
In cases where multiple globs match a file, the current rule is that the
|
|
longest glob wins. This is used as a simpler-to-implement version of
|
|
most-specific glob wins, which we may switch to in the future.
|
|
|
|
## How do I mark API visibility?
|
|
|
|
This section will just describe the basic process. Later sections will cover the tooling
|
|
available to help, along with caveats to be aware of.
|
|
|
|
First read the documentation in [src/mongo/util/modules.h](https://github.com/mongodb/mongo/blob/master/src/mongo/util/modules.h)
|
|
for the canonical list and description of visibility levels. As a brief overview of the main
|
|
levels from least to most restrictive:
|
|
|
|
- `OPEN`: This is available for usage _and inheritance_ from anywhere in the codebase
|
|
- `PUBLIC`: This is available for usage from anywhere in the codebase. For types, subclasses may
|
|
only be defined in the same module.
|
|
- `NEEDS_REPLACEMENT` and `USE_REPLACEMENT(...)`: These are collectively considered
|
|
"unfortunately public" and are available for use, but should be avoided
|
|
- `PARENT_PRIVATE`: This is similar to `PRIVATE`, but allows usage from any file in the parent
|
|
module, including other submodules
|
|
- `PRIVATE`: This may only be used from the current module or one of its submodules
|
|
- `FILE_PRIVATE`: This may only be used from the current "file family" (roughly, header \+ cpp
|
|
\+ tests). It may not be used by other files, even from the same module.
|
|
|
|
You can think of public vs private similarly to how you would the sections of a `class`: they
|
|
indicate whether something is intended to be part of the API or an implementation detail. The
|
|
difference is that they apply at a wider granularity of code than a single class, with
|
|
implementation details available to either the full module (and its submodules) for `PRIVATE`
|
|
or the file family for `FILE_PRIVATE`.
|
|
|
|
The macros in that header file are attached to declarations and set the visibility level for
|
|
that declaration and all of its "semantic children"[^1]. The macros are C++ attributes which
|
|
means that they need to go in specific places that differ based on what is being marked (for
|
|
templates, the location does not change and is always somewhere after the `template <...>` part):
|
|
|
|
- `MONGO_MOD_PUBLIC;` by itself as the first line after includes in a header sets the default
|
|
for that header (only `PUBLIC`, `PARENT_PRIVATE`, and `FILE_PRIVATE` are allowed here)
|
|
- `namespace MONGO_MOD mongo {` (this does not work with nested namespaces in a single
|
|
declaration like `namespace mongo::repl`)
|
|
- `class MONGO_MOD Foo {` (Ditto for `enum`, `struct`, and `union`)
|
|
- `MONGO_MOD void func(...);`
|
|
- `MONGO_MOD int var;`
|
|
- `concept isFooable MONGO_MOD {`
|
|
|
|
For the cases where it goes at the beginning of the line, if clang-format chooses an unfortunate
|
|
place to break the line, it usually helps to undo the formatting then put the macro on its own
|
|
line above the declaration.
|
|
|
|
APIs are marked one header at a time, by including `"mongo/util/modules.h"` in the header.
|
|
This causes the header to be treated as "modularized" which has the following effects:
|
|
|
|
- All declarations in that header (not transitive includes) default to `PRIVATE`, meaning that
|
|
the public API is what must be marked.
|
|
- Members in `private:` sections in classes default to `PRIVATE`, regardless of the visibility
|
|
of the class. The only way the language would allow them to be used from outside of the module
|
|
is if you have cross-module friendships, which should generally be avoided. If needed
|
|
temporarily, favor `NEEDS_REPLACEMENT` over `PUBLIC` for these declarations.
|
|
- Declarations ending in `_forTest` default to `FILE_PRIVATE` to support the common case where
|
|
they are only intended for testing that class. If they are actually intended to support testing
|
|
of consumers, not just the type they are defined on, they can be explicitly given `PUBLIC` or
|
|
`PRIVATE` visibility.
|
|
- Internal and detail namespaces default to `PRIVATE` and cannot be made less restricted, but
|
|
can still be marked as `FILE_PRIVATE`. Individual declarations within the namespace can be
|
|
exposed as necessary, but they cannot be exposed in bulk without changing the name of the
|
|
namespace to something that doesn't imply private.
|
|
|
|
For internal headers of a module which do not contribute to its public API, simply including
|
|
`modules.h` is sufficient. There is a [tool](#the-private-header-marker) to automate this
|
|
process. You may additionally want to consider whether any APIs should be marked `FILE_PRIVATE`,
|
|
but that is optional.
|
|
|
|
For IDL files, you mark visibility of whole types (`struct`, `enum`, and `command`) with the
|
|
`mod_visibility` option. The value should be the same as one of the `MONGO_MOD` macros, but
|
|
lowercase and without the prefix, for example `mod_visibility: public`. You can set the default
|
|
visibility for all types in that IDL file by putting that in the `global:` section. You cannot
|
|
control visibility of individual functions within the type. Please let us know if you have a
|
|
compelling use case for this.
|
|
|
|
## What tooling exists to help me?
|
|
|
|
Note that all tooling should be run from within a properly set-up python virtual environment.
|
|
This includes running `buildscripts/poetry_sync.sh` to ensure you have the correct dependencies.
|
|
|
|
### The scanner and merger
|
|
|
|
The merger generates a cross reference of all first-party usages of first-party code and stores
|
|
it in `merged_decls.json`, which is used by the rest of our tooling. It is also where we validate
|
|
that there are no disallowed accesses. It will be invoked for you by the browser when you ask it
|
|
to rescan, or you can also manually run it as `modules_poc/merge_decls.py`. If you are interested
|
|
in analyzing that file, [`jq`](https://jqlang.org/) is a powerful tool, or you can just write
|
|
some python.
|
|
|
|
As a rather extreme example of what you can do with `jq`, here is how the progress reports are
|
|
generated:
|
|
|
|
```shell
|
|
# For each mod (and TOTAL):
|
|
# For each file:
|
|
# consider it marked if it has no UNKNOWNs
|
|
# Compute a done percentage
|
|
# Format to a nice string
|
|
jq 'map(., .mod = "TOTAL") | group_by(.mod)[] | group_by(.loc | split(":")[0]) | {mod: .[0].[0].mod, total: length, marked: map(select(any(.visibility == "UNKNOWN") | not)) | length} | .done = (1000 * .marked / .total | round) / 10 | "\(.mod): \(" " * (.mod | 40-length)) \(.done)% (\(.marked) / \(.total))"' -r merged_decls.json
|
|
```
|
|
|
|
Internally, the merger will internally invoke `bazel build --config=mod-scanner //src/mongo/...`
|
|
to run the scanner over the whole codebase (or the parts that have changed since the last scan),
|
|
taking advantage of bazel remote execution to achieve very high levels of parallelism.
|
|
|
|
### The browser
|
|
|
|
The main piece of tooling to run is the browser, which is launched by running
|
|
`modules_poc/browse.py`. If you haven't scanned the codebase recently, it will offer to run it
|
|
for you which will take a few minutes. After modifying the source code, you can rescan at any
|
|
time by pressing `r`. It will only rescan files that have been modified or that transitively
|
|
include modified headers.
|
|
|
|
The browser is primarily intended to assist in labeling public APIs, so the files are sorted
|
|
with the most number of unlabeled declarations ("unknowns") first. You can search for a file
|
|
by pressing `f` or press `m` to filter the files by module.
|
|
|
|
The list of available key bindings is shown on the right. You can toggle that by pressing `?`.
|
|
Other keybinding of note are that you can press `g` to go to the currently highlighted
|
|
declaration or location in your editor (only when running in the vscode or nvim terminal),
|
|
and `p` to toggle an inline preview of the location within the browser. You can press `Tab ↹`
|
|
to toggle between the tree and the code preview. The mouse is fully supported for scrolling
|
|
and expanding rows in the tree, and there are aliases for some basic vim keybinds (`hjkl/`).
|
|
|
|
### The private header marker
|
|
|
|
Once you have scanned the codebase and produced a `merged_decls.json`,
|
|
`modules_poc/private_headers.py` can be used to find all header and IDL files where there are
|
|
no currently detected external usages and automatically mark them as fully private to the
|
|
module. This does not necessarily mean that all automatically marked headers are intended to
|
|
be private. A human should review to ensure that the marked headers match intent. You can pass
|
|
flags to filter on any/all of module, owning team, or path glob. For headers matching the filter,
|
|
the script will also warn of usages of `_forTest` external to the file family that may need to
|
|
be marked `PRIVATE` to make them available to the whole module since they default to only being
|
|
available to the file family for marked headers.
|
|
|
|
Make sure to run `buildscripts/clang_format.py format-my` or `bazel run format` after using it
|
|
to modify any C++ files.
|
|
|
|
Example usage:
|
|
|
|
```shell
|
|
./modules_poc/private_headers.py --team=server_programmability --module=core --glob="src/mongo/executor/*"
|
|
```
|
|
|
|
`--dry-run` can be added to view all of the changes without applying them.
|
|
|
|
### The PR comment generator
|
|
|
|
You can run `modules_poc/mod_diff.py` to output a brief summary of all of the API (including
|
|
visibility levels and usages counts) for each file modified in your branch. When putting up a PR
|
|
to mark API visibility, you should add a comment with its output to the PR as an aide to
|
|
reviewers. The output is intended to be close enough to C++ that you should put it in a
|
|
` ```cpp ` block when making your PR comment to make it more readable. You can also
|
|
pipe it through `bat -lcpp` to make it colorful locally. Note that it will use the last
|
|
scan output, so if you've modified any headers, you should run a rescan prior to running this
|
|
tool.
|
|
|
|
## Workflow
|
|
|
|
The general workflow for each PR will generally be the same:
|
|
|
|
1. Ensure that you are in a python virtualenv, creating one if needed, and run
|
|
`buildscripts/poetry_sync.sh` to update python deps.
|
|
2. Run [the merger](#the-scanner-and-merger) to scan the code base: `modules_poc/merge_decls.py`
|
|
3. Mark some headers
|
|
4. Rerun the merger to ensure that there are no violations, and update `merged_decls.json`
|
|
5. Run [the pr comment generator](#the-pr-comment-generator) to show the APIs that you have marked
|
|
- Look through this to ensure that everything is as you expect.
|
|
6. Put up a PR and include the generated comment in a ` ```cpp ` block
|
|
- I suggest keeping PRs small (say, no more than 10 files at a time) so that they are
|
|
manageable by reviewers. As an exception it seems reasonable to auto-mark many headers as
|
|
private in a single PR, as long as those PRs are separate from those containing any manual
|
|
marking.
|
|
|
|
When first starting to mark a module, I suggest running the [`modules_poc/private_headers.py`](#the-private-header-marker)
|
|
script with `--dry-run` (or `-n`) and `--module=YOUR_MODULE`. For larger modules (in particular,
|
|
the `query` mega module) you may want to pass a `--glob` so that you can focus on a smaller
|
|
subset of the code initially. That will give you an overview of the files that are used from
|
|
outside your module (which contain defacto public APIs today) and those that do not (which can
|
|
automatically be marked as private implementation details).
|
|
|
|
If all of the defacto private headers seem like they should be private, you can remove the
|
|
dry-run flag to have it automatically mark them as private. Be sure to validate that their
|
|
contents are actually intended to be private. Remember that the point of having a human doing
|
|
the marking is to ensure that we correctly capture intent. You can optionally mark implementation
|
|
details within each header as `FILE_PRIVATE`, if you would like to prevent them from being used
|
|
elsewhere even within the module.
|
|
|
|
You can then open [the browser](#the-browser) (`modules_poc/browse.py`) to look at the remaining
|
|
headers. It will show you what is used and from where. It will be particularly useful for things
|
|
that seem like they should be private, but are being used externally.
|
|
|
|
### What should I do when an internal API is currently being used?
|
|
|
|
1. If it is only used from a small number of external files, first check if those files should
|
|
actually belong to your module. We tried to correctly map all files in phase 1, but some files
|
|
may have been assigned to the wrong module. If that happens, try adjusting the globs in
|
|
`modules_poc/modules.yaml` to move them.
|
|
2. If there is already a public API that callers should use instead, mark it as
|
|
`USE_REPLACEMENT(better_api)`. The argument accepts any C++ tokens, but the intent is where
|
|
possible to use the name of the replacement. This will generate a ticket for all teams using
|
|
that code.
|
|
1. If there are very few users, consider just cleaning them up.
|
|
3. Reconsider making this API public if other modules need its functionality, and this is
|
|
the only way to get it.
|
|
4. Otherwise, if there is no public API that fulfills the needs of the callers, but you
|
|
don't want the current API to remain public long-term, use `NEEDS_REPLACEMENT`. This will
|
|
generate a ticket for the team that owns that code.
|
|
1. If the API was "obviously" intended to be private (eg it is in a `details` namespace)
|
|
and callers would be reasonably able to implement the functionality themselves, possibly
|
|
by writing their own version, it seems acceptable to use
|
|
`USE_REPLACEMENT(do not use internal details)`
|
|
|
|
## Caveats and Limitations
|
|
|
|
**OVERARCHING GUIDELINE**: Always try to mark declarations correctly according to intent,
|
|
even if it will not be enforced by the current tooling. This is both to provide the correct
|
|
information to human readers, as well as to avoid issues if we improve the tooling in the
|
|
future to eliminate these limitations
|
|
|
|
The rest of this section is fairly technical and probably not necessary for most readers unless
|
|
they notice something "weird" going on and want to dive into why. Most of these limitations are
|
|
more likely to affect the core modules since most of the rest of our code does not expose APIs
|
|
via macros and templates or have APIs only consumed by templates, and those are where most of
|
|
these issues come up.
|
|
|
|
- We do not track usages of namespaces at all, only the declarations within namespaces. When
|
|
a namespace is marked with a visibility, it does not affect the visibility of the namespace
|
|
itself (since it doesn't have one), it sets the default visibility for all declarations within
|
|
**that namespace block**. Each time a namespace is reopened it is a separate block and the
|
|
visibility markers on other blocks of the same namespace do not apply.
|
|
- The scanner only knows about declarations that it sees being used. For implementation reasons,
|
|
it only discovers declarations by seeing what every usage is using. This can either cause or be
|
|
caused by other limitations.
|
|
- Usages in templates may not be seen. This is especially the case for "dependent types and
|
|
values" which are things that are not known by the compiler before the template is instantiated.
|
|
- This is a problem for functions where any arguments are dependent if it can't figure out
|
|
which overload will be selected. It is even worse for free-functions called unqualified
|
|
(`f(blah)` rather than `ns::f(blah)` or `x.f(blah)`) since due to ADL, overload resolution
|
|
is _always_ delayed for them.
|
|
- Everything that results from a macro expansion is treated as-if it was written at the point
|
|
of expansion. This applies to both declarations and usages. If you have an API that should
|
|
only be used via the defined macros, mark it as `MOD_PUBLIC_FOR_TECHNICAL_REASONS` to signal
|
|
to readers that they should avoid direct usage, even if the tooling won't prevent it. We may
|
|
improve this in the future.
|
|
- Template variables are completely ignored due to some unfortunate clang bugs. Still, try
|
|
to mark them correctly since we may change this in the future.
|
|
- Method calls are assigned to the static type at the call site. This has two important effects:
|
|
- A subclass's overridden method may seem unused if it is only used via calls through a base
|
|
class pointer/reference
|
|
- Calls through a base class pointer/reference count as calls of that class's method, not of
|
|
the interface's
|
|
- Defaulted members (methods, ctors, dtors) are treated as usages of the class itself,
|
|
regardless of whether they implicitly or explicitly defaulted. This is because clang does not
|
|
provide an API to distinguish between those cases.
|
|
- Template normalization woes: we try really hard to report declarations as the template
|
|
`foo<T>` rather than separate instantiations like `foo<int>`, `foo<string>`, etc, **unless**
|
|
they are explicitly specialized, meaning that the instantiation has its own definition different
|
|
from the main template. Unfortunately, clang does a bad job at this and we have a number of
|
|
kludgy workarounds. The most important effects:
|
|
- Explicit specializations of function and variable templates are ignored and always converted
|
|
to the primary template.
|
|
- We do treat explicit specializations of types as separate (using the heuristic of having a
|
|
separate location than the main template), because they can have a different shape and API than
|
|
the main template. In general they should probably have the same visibility though, unless the
|
|
instantiation is using a private type which should be unavailable to consumers anyway.
|
|
- Clang assigns many locations to the site of explicit template instantiations and extern
|
|
template declarations, even when there is a better location that it can see. Luckily these
|
|
are fairly rare.
|
|
- Sometimes clang reports the resolved destination of `using` declarations and type alias, but
|
|
usually it reports the `using` declaration itself. A few notable cases (these are trends and
|
|
may not be absolute\!)
|
|
- `using Base::foo;` to expose a member of a base class is resolved as a usage of `Base::foo`
|
|
rather than `Derived::foo`. This is especially notable when the `Base` class is intended to be
|
|
a private implementation detail. You will need to mark all exposed methods as public.
|
|
- `using Base::Base;` to pull in the base constructors is the opposite and is recorded as a
|
|
usage of `Derived::Base(args)`, which is odd because such a declaration doesn't actually exist.
|
|
- Internal/details namespaces (currently defined as matching the regex `(detail|internal)s?$`)
|
|
implicitly have implicit default visibility of private if `modules.h` is included. It is not
|
|
possible to give the namespace a public visibility, but you can restrict it further with
|
|
`FILE_PRIVATE`. If you want declarations inside it to be usable from outside your module you
|
|
must mark children of the namespace explicitly, or rename it to not use a name that implies
|
|
that it is for internal usage only. A somewhat common case will be marking internal declarations
|
|
that are only intended to be used via macros with `PUBLIC_FOR_TECHNICAL_REASONS`.
|
|
- Be very careful with forward declarations. Try to avoid them wherever possible (unless there
|
|
is a significant benefit). Especially avoid forward declaring anything from another module\!
|
|
Where forward declarations must be used, make sure that they have the same visibility as the
|
|
real definition. As an exception, if every TU that sees the forward declaration will also see
|
|
the definition it is OK to omit marking the forward definition. This may happen when they are
|
|
both in the same header, or the forward declaration is in a private implementation detail header
|
|
which is included by the defining header. Be aware of the implicit visibility marking which also
|
|
applies to forward declaration, if they are the only declaration seen in the TU.
|
|
- Never forward declare functions to avoid including a header. They are much more problematic
|
|
than types, both in general in C++ and specifically for this tooling.
|
|
- We try to use the definition location for types defined in headers, but the "canonical"
|
|
location (clang's term for the first declaration seen in the current TU) for everything else.
|
|
If the type is defined in a .cpp, we use the canonical location.
|
|
- We only consider declarations in headers, never in .cpp files.
|
|
- Be mindful of `_forTest` functions. They default to `FILE_PRIVATE` since they are typically
|
|
intended only for use when testing the type they are defined on, not when testing consumers.
|
|
In the cases where they _are_ intended as part of the API for testing consumers, you can
|
|
explicitly mark them `PUBLIC` or `PRIVATE` depending on whether they should be usable from
|
|
outside your module or not.
|
|
- Things used implicitly (eg implicit conversion operators) are still counted as usages even
|
|
if they are not specifically named at the call site
|
|
- When merging information from multiple TUs, definitions always replace the metadata gathered
|
|
from TUs that only saw a declaration.
|
|
- Note that we aren't guaranteed to see every definition, in particular for functions that
|
|
are not called from the TU that they are defined in. So this cannot be used to find places
|
|
where we deleted the definition but forgot to delete the declaration (we wouldn't see them
|
|
anyway, since we only track things that are used, and undefined things can't really be used,
|
|
except trivially, without breaking the build).
|
|
- `private` members of classes are implicitly `PRIVATE`, and must be explicitly marked otherwise
|
|
if desired. They should probably never be made `PUBLIC` since that implies cross-module
|
|
friendship. In the few places where we have that today, they have been made one of the flavors
|
|
of unfortunately public: `NEEDS_REPLACEMENT` or `USE_INSTEAD`.
|
|
- `public` members of `private` types do not inherit the implicit `PRIVATE` and follow the
|
|
normal rule of looking for their nearest semantic parent with an explicit marker. That means
|
|
that they may be `PUBLIC`. However, the language rules still apply and as long as an
|
|
instance of the type is never handed to consumers they will have no way of accessing those
|
|
members.
|
|
- `protected` members do not default to `PRIVATE`, but because we only allow subclassing from
|
|
`OPEN` classes, the language visibility rules will disallow access from outside the module
|
|
unless you choose to allow it by use `OPEN` classes or `friend`s. Note that making any
|
|
subclass `OPEN` exposes all `protected` members of parents unless they are marked `PRIVATE`.
|
|
- `friend` declarations are mostly ignored, except when they are a definition. So the
|
|
definitions using the "hidden friend" pattern are tracked, but we ignore it if the definition
|
|
is in a cpp file.
|
|
|
|
[^1]:
|
|
Clang distinguishes between "semantic" and "lexical" parents. The primary differences
|
|
are that members of classes (including member types) are semantic children of the class even
|
|
when defined out of line, and conversely `friend` declarations are not, and instead are
|
|
considered semantic children of the nearest namespace.
|