346 lines
17 KiB
Markdown
346 lines
17 KiB
Markdown
# How to Use Antithesis
|
|
|
|
## Context
|
|
|
|
Antithesis is a third party vendor with an environment that can perform network fuzzing. We can
|
|
upload images containing `docker-compose.yml` files, which represent various MongoDB topologies, to
|
|
the Antithesis Docker registry. Antithesis runs `docker-compose up` from these images to spin up
|
|
the corresponding multi-container application in their environment and run a test suite. Network
|
|
fuzzing is performed on the topology while the test suite runs & a report is generated by
|
|
Antithesis identifying bugs. Check out
|
|
https://github.com/mongodb/mongo/wiki/Testing-MongoDB-with-Antithesis to see an example of how we
|
|
use Antithesis today.
|
|
|
|
## Base Images
|
|
|
|
The `base_images` directory consists of the building blocks for creating a MongoDB test topology.
|
|
These images are uploaded to the Antithesis Docker registry [nightly](https://github.com/mongodb/mongo/blob/6cf8b162a61173eb372b54213def6dd61e1fd684/etc/evergreen_yml_components/variants/ubuntu/test_dev_master_and_lts_branches_only.yml#L28) during the
|
|
[`antithesis image build and push`](https://github.com/mongodb/mongo/blob/020632e3ae328f276b2c251417b5a39389af6141/etc/evergreen_yml_components/definitions.yml#L2823) function.
|
|
|
|
### mongo_binaries
|
|
|
|
This image contains the latest `mongo`, `mongos` and `mongod` binaries. It can be used to
|
|
start a `mongod` instance, `mongos` instance or execute `mongo` commands. This is the main building
|
|
block for creating the System Under Test topology.
|
|
|
|
### workload
|
|
|
|
This image contains the latest `mongo` binary as well as the `resmoke` test runner. The `workload`
|
|
container is not part of the actual toplogy. The purpose of a `workload` container is to execute
|
|
`mongo` commands to complete the topology setup, and to run a test suite on an existing topology
|
|
like so:
|
|
|
|
```shell
|
|
buildscript/resmoke.py run --suite antithesis_concurrency_sharded_with_stepdowns_and_balancer
|
|
```
|
|
|
|
**Every topology must have 1 workload container.**
|
|
|
|
Note: During `workload` image build, `evergreen/antithesis_image_build_and_push.sh` runs, which generates
|
|
"antithesis compatible" test suites and prepends them with `antithesis_`. These are the test suites
|
|
that can run in antithesis and are available from within the `workload` container.
|
|
|
|
### Dockerfile
|
|
|
|
This assembles an image with the necessary files for spinning up the corresponding topology. It
|
|
consists of a `docker-compose.yml`, a `logs` directory, a `scripts` directory and a `data`
|
|
directory. If this is structured properly, you should be able to copy the files & directories
|
|
from this image and run `docker-compose up` to set up the desired topology.
|
|
|
|
Example from what `buildscripts/resmokelib/testing/docker_cluster_image_builder.py` generates:
|
|
|
|
```Dockerfile
|
|
FROM scratch
|
|
COPY docker-compose.yml /
|
|
ADD scripts /scripts
|
|
ADD logs /logs
|
|
ADD data /data
|
|
ADD debug /debug
|
|
```
|
|
|
|
All topology images are built and uploaded to the Antithesis Docker registry during the
|
|
`antithesis image build and push` task. Some of these directories are created during the
|
|
`evergreen/antithesis_image_build_and_push.sh` script such as `/data` and `/logs`.
|
|
|
|
Note: These images serve solely as a filesystem containing all necessary files for a topology,
|
|
therefore use `FROM scratch`.
|
|
|
|
### docker-compose.yml
|
|
|
|
This describes how to construct the corresponding topology using the
|
|
`mongo-binaries` and `workload` images.
|
|
|
|
Example from `buildscripts/antithesis/topologies/sharded_cluster/docker-compose.yml`:
|
|
|
|
```yml
|
|
version: '3.0'
|
|
|
|
services:
|
|
configsvr1:
|
|
container_name: configsvr1
|
|
hostname: configsvr1
|
|
image: mongo-binaries:evergreen-latest-master
|
|
volumes:
|
|
- ./logs/configsvr1:/var/log/mongodb/
|
|
- ./scripts:/scripts/
|
|
- ./data/configsvr1:/data/configdb/
|
|
command: /bin/bash /scripts/configsvr_init.sh
|
|
networks:
|
|
antithesis-net:
|
|
ipv4_address: 10.20.20.6
|
|
# Set the an IPv4 with an address of 10.20.20.130 or higher
|
|
# to be ignored by the fault injector
|
|
#
|
|
|
|
configsvr2: ...
|
|
configsvr3: ...
|
|
database1: ...
|
|
container_name: database1
|
|
hostname: database1
|
|
image: mongo-binaries:evergreen-latest-master
|
|
volumes:
|
|
- ./logs/database1:/var/log/mongodb/
|
|
- ./scripts:/scripts/
|
|
- ./data/database1:/data/db/
|
|
command: /bin/bash /scripts/database_init.sh Shard1
|
|
networks:
|
|
antithesis-net:
|
|
ipv4_address: 10.20.20.3
|
|
# Set the an IPv4 with an address of 10.20.20.130 or higher
|
|
# to be ignored by the fault injector
|
|
#
|
|
database2: ...
|
|
database3: ...
|
|
database4: ...
|
|
database5: ...
|
|
database6: ...
|
|
mongos:
|
|
container_name: mongos
|
|
hostname: mongos
|
|
image: mongo-binaries:evergreen-latest-master
|
|
volumes:
|
|
- ./logs/mongos:/var/log/mongodb/
|
|
- ./scripts:/scripts/
|
|
command: python3 /scripts/mongos_init.py
|
|
depends_on:
|
|
- "database1"
|
|
- "database2"
|
|
- "database3"
|
|
- "database4"
|
|
- "database5"
|
|
- "database6"
|
|
- "configsvr1"
|
|
- "configsvr2"
|
|
- "configsvr3"
|
|
networks:
|
|
antithesis-net:
|
|
ipv4_address: 10.20.20.9
|
|
# The subnet provided here is an example
|
|
# An alternative subnet can be used
|
|
workload:
|
|
container_name: workload
|
|
hostname: workload
|
|
image: workload:evergreen-latest-master
|
|
volumes:
|
|
- ./logs/workload:/var/log/resmoke/
|
|
- ./scripts:/scripts/
|
|
command: python3 /scripts/workload_init.py
|
|
depends_on:
|
|
- "mongos"
|
|
networks:
|
|
antithesis-net:
|
|
ipv4_address: 10.20.20.130
|
|
# The subnet provided here is an example
|
|
# An alternative subnet can be used
|
|
networks:
|
|
antithesis-net:
|
|
driver: bridge
|
|
ipam:
|
|
config:
|
|
- subnet: 10.20.20.0/24
|
|
```
|
|
|
|
Each container must have a `command` in `docker-compose.yml` that runs an init script. The init
|
|
script belongs in the `scripts` directory, which is included as a volume. The `command` should be
|
|
set like so: `/bin/bash /scripts/[script_name].sh` or `python3 /scripts/[script_name].py`. This is
|
|
a requirement for the topology to start up properly in Antithesis.
|
|
|
|
When creating `mongod` or `mongos` instances, route the logs like so:
|
|
`--logpath /var/log/mongodb/mongodb.log` and utilize `volumes` -- as in `database1`.
|
|
This enables us to easily retrieve logs if a bug is detected by Antithesis.
|
|
|
|
The `ipv4_address` should be set to `10.20.20.130` or higher if you do not want that container to
|
|
be affected by network fuzzing. For instance, you would likely not want the `workload` container
|
|
to be affected by network fuzzing -- as shown in the example above.
|
|
|
|
Use the `evergreen-latest-master` tag for all images. This is updated automatically in
|
|
`evergreen/antithesis_image_build_and_push.sh` -- if needed.
|
|
|
|
### scripts
|
|
|
|
Take a look at `buildscripts/antithesis/topologies/sharded_cluster/scripts/mongos_init.py` to see
|
|
how to use util methods from `buildscripts/antithesis/topologies/sharded_cluster/scripts/utils.py`
|
|
to set up the desired topology. You can also use simple shell scripts as in the case of
|
|
`buildscripts/antithesis/topologies/sharded_cluster/scripts/database_init.py`. These init scripts
|
|
must not end in order to keep the underlying container alive. You can use an infinite while
|
|
loop for `python` scripts or you can use `tail -f /dev/null` for shell scripts.
|
|
|
|
## How do I create a new topology for Antithesis testing?
|
|
|
|
This should be done with care to ensure we are using our limited resources efficiently.
|
|
|
|
Create a new task extending the `antithesis_task_template`, tagged with `antithesis`, passing the specified `suite` to the `antithesis image build and push` task. See other examples to get started.
|
|
|
|
## How do I test my suite in antithesis?
|
|
|
|
If you provide the evergreen parameter `schedule_antithesis_tests` to your evergreen patch, once we build the antithesis images in your evergreen patch we send antithesis an api request to run your newly created images for an hour. You will get emailed the report when it finishes running in antithesis.
|
|
|
|
Important Note: This will happen for every antithesis task you schedule in your patch. Please do not schedule more than 1 or 2 tasks with this parameter at a time or it will use up a lot of our testing time allocated with antithesis.
|
|
|
|
`evergreen patch --param schedule_antithesis_tests=true`
|
|
|
|
## Types of testing in antithesis
|
|
|
|
### Normal resmoke testing
|
|
|
|
Antithesis constantly runs your resmoke suite with one random test from the suite at a time.
|
|
We support this out-of-the-box with most resmoke suites that use python fixtures.
|
|
This is very similar to how tests run in evergreen.
|
|
Your antithesis tasks in evergreen will default to this if the `antithesis_test_composer_dir` var is not specified on the task.
|
|
|
|
### Test Composer
|
|
|
|
Antithesis offers a resource called [Test Composer](https://antithesis.com/docs/test_templates/) to run "test templates" against our clusters. Test Composer enables autonomous testing by letting you define templates that guide Antithesis in generating thousands of test cases across multiple system states. Your evergreen tasks will automatically use test composure if the `antithesis_test_composer_dir` var is specified in the task as show in the example below.
|
|
|
|
#### What is Test Composer?
|
|
|
|
Test Composer uses an opinionated framework based on naming conventions to detect and run tests. Unlike traditional example-based testing, Test Composer templates tell Antithesis how to handle parallelism, test length, command order, and fault injection to explore your system's behavior comprehensively.
|
|
|
|
#### Test Composer Structure in MongoDB
|
|
|
|
MongoDB's Test Composer implementations are located in `buildscripts/antithesis/test_composer/`. The setup still uses a resmoke suite to determine cluster configuration, but test execution is controlled by Test Composer commands rather than running jstests directly.
|
|
|
|
#### Test Command Types
|
|
|
|
Test commands must be executable and placed directly under `/opt/antithesis/test/v1/<test_dir>/`. Our evergreen tasks handle building the images and putting the tests in the correct place for you. They follow the naming convention `<prefix>_<command>` where the prefix determines the command's behavior.
|
|
|
|
##### Driver Commands
|
|
|
|
Run during fault injection periods. At least one driver or anytime command is required.
|
|
|
|
- **`parallel_driver_<command>`**: Can run concurrently with other parallel drivers (including itself)
|
|
|
|
- Example: [parallel_driver_mongod_find.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_find.sh) - Executes random find queries
|
|
- Example: [parallel_driver_mongod_insert.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_insert.sh) - Inserts random documents
|
|
- Use for: Concurrent client operations, continuous availability checks, parallel workloads
|
|
|
|
- **`singleton_driver_<command>`**: Runs as the only driver command in a history branch
|
|
|
|
- Example: [singleton_driver_resmoke.sh](../../buildscripts/antithesis/test_composer/random_resmoke/singleton_driver_resmoke.sh) - Runs a single random resmoke test
|
|
- Use for: Porting existing integration tests, running complete workloads without interference
|
|
|
|
- **`serial_driver_<command>`**: Runs when no other driver commands are active
|
|
- Example: [serial_driver_resmoke.sh](../../buildscripts/antithesis/test_composer/random_resmoke/serial_driver_resmoke.sh) - Runs resmoke tests sequentially
|
|
- Use for: Full failover operations, validation steps that require quiescence
|
|
|
|
##### Quiescent Commands
|
|
|
|
Run in the absence of faults.
|
|
|
|
- **`first_<command>`**: Optional setup command that runs once before any driver commands
|
|
|
|
- Use for: Data initialization, schema setup, bootstrapping
|
|
|
|
- **`eventually_<command>`**: Runs after driver commands start. Kills all drivers and stops faults, creating a new branch
|
|
|
|
- Use for: Testing eventual consistency, availability after recovery, final state validation
|
|
- Note: Include retry loops for service availability
|
|
|
|
- **`finally_<command>`**: Like eventually, but only runs after all driver commands complete naturally
|
|
- Use for: Testing subtle invariants, final consistency checks
|
|
|
|
##### Advanced Commands
|
|
|
|
- **`anytime_<command>`**: Can run at any time after first command, even during singleton/serial drivers
|
|
- Use for: Continuous invariant checks, monitoring, low-consistency availability checks
|
|
|
|
#### MongoDB Test Composer Examples
|
|
|
|
##### Example 1: basic_js_commands Template
|
|
|
|
This template runs parallel JavaScript operations against MongoDB with built-in retry logic for network failures.
|
|
|
|
**Commands:**
|
|
|
|
- [parallel_driver_mongod_find.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_find.sh) - Random find queries
|
|
- [parallel_driver_mongod_insert.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_insert.sh) - Random inserts
|
|
- [parallel_driver_mongod_fsync.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_fsync.sh) - fsync operations
|
|
- [parallel_driver_mongod_pitread.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_pitread.sh) - Point-in-time reads with snapshot testing
|
|
- [parallel_driver_mongod_validate_collections.sh](../../buildscripts/antithesis/test_composer/basic_js_commands/parallel_driver_mongod_validate_collections.sh) - Collection validation
|
|
|
|
**Shared Logic:** [commands.js](../../buildscripts/antithesis/test_composer/basic_js_commands/js/commands.js) provides retry mechanisms for network errors and connection helpers.
|
|
|
|
**Key Features:**
|
|
|
|
- Automatic retry on `MongoNetworkError`, `MongoServerSelectionError`, `RetryableWriteError`
|
|
- Random test data generation
|
|
- Connection string discovery via `/scripts/print_connection_string.sh`
|
|
|
|
##### Example 2: random_resmoke Template
|
|
|
|
This template runs resmoke tests with randomization, adapting existing test infrastructure for Test Composer.
|
|
|
|
**Commands:**
|
|
|
|
- [singleton_driver_resmoke.sh](../../buildscripts/antithesis/test_composer/random_resmoke/singleton_driver_resmoke.sh) - Single random resmoke test
|
|
- [serial_driver_resmoke.sh](../../buildscripts/antithesis/test_composer/random_resmoke/serial_driver_resmoke.sh) - Sequential random resmoke tests
|
|
|
|
Both use random seeds and shuffling: `--seed $(od -vAn -N4 -tu4 < /dev/urandom) --shuffle --sanityCheck`
|
|
|
|
#### Creating a New Test Template
|
|
|
|
1. **Create a test directory:** `buildscripts/antithesis/test_composer/<your_template_name>/`
|
|
|
|
2. **Write test commands:** Create executable scripts with appropriate prefixes:
|
|
|
|
```bash
|
|
#!/usr/bin/env bash
|
|
# buildscripts/antithesis/test_composer/<template>/parallel_driver_mytest.sh
|
|
|
|
# Your test logic here
|
|
# This can run in parallel with other parallel_driver commands
|
|
```
|
|
|
|
3. **Make scripts executable:** `chmod +x buildscripts/antithesis/test_composer/<template>/*.sh`
|
|
|
|
4. **Helper files:** Use `helper_` prefix or subdirectories for shared code - these are ignored by Test Composer
|
|
|
|
#### Best Practices
|
|
|
|
- **Retry logic**: Always include retry mechanisms for network and transient errors (see [commands.js](../../buildscripts/antithesis/test_composer/basic_js_commands/js/commands.js) for examples)
|
|
- **Add Randomization**: The more randomization you add to your tests, the more it allows antithesis to explore. It can control and reproduce the randomization so if it finds an interesting path it can explore it more.
|
|
- **Start simple**: Begin with a `singleton_driver` to adapt existing tests, then evolve to parallel/serial commands
|
|
- **Idempotency**: Design tests to handle being killed and restarted at any time
|
|
|
|
#### Configuring Test Composer in Evergreen
|
|
|
|
To use Test Composer instead of normal resmoke testing, set the `antithesis_test_composer_dir` variable in your Evergreen task:
|
|
|
|
```yaml
|
|
- <<: *antithesis_task_template
|
|
name: antithesis_resmoke_suite_with_test_template
|
|
tags: ...
|
|
commands:
|
|
...
|
|
- func: "antithesis image build and push"
|
|
vars:
|
|
suite: concurrency_sharded_replication_with_balancer_and_config_transitions_and_add_remove_shard # Still used for cluster topology
|
|
resmoke_args: >- # any args that change the cluster topology can still be used
|
|
--runAllFeatureFlagTests
|
|
antithesis_test_composer_dir: basic_js_commands # Directory name under buildscripts/antithesis/test_composer/
|
|
```
|
|
|
|
## Additional Resources
|
|
|
|
If you are interested in leveraging Antithesis feel free to reach out to #ask-devprod-correctness or #server-testing on Slack.
|