mongo/buildscripts/cost_model/ce_data_settings.py at 1a5cf4b82fd08b052d99fb7ec29f22ef4a444dbb

Files

Timour Katchaounov 3ade0b9fe3 SERVER-72236 Generate random integer data for CE

Generate random data with integers. The approach is as follows:
- There is one collection for each different cardinality. All collections contain the same fields.
- Each field contains the data generated from a certain data distribution. The data could be anything - same type, mixed types, same mathematical distribution (e.g. normal), or a mixed distribution.
- The committed configuration file, and the corresponding data file are reduced to only two small collections. For actual experiments one needs to add more data sizes, and re-generate the data locally. This is done so that Evergreen tests can run fast, and to reduce the size of the git repository.
- All data is saved in a single JavaScript file: jstests/query_golden/libs/data/ce_accuracy_test.data, with a corresponding schema file jstests/query_golden/libs/data/ce_accuracy_test.schema.
- The data file is a JavaScript file that can be loaded directly inside a JS test. When loading this file, it creates a global variable dataSet. The reason is that this is the only way to load an external JSON file that doesn't need to install external tools in Evergreen.

2023-01-10 12:51:54 +00:00

5.6 KiB

Raw Blame History

View Raw

5.6 KiB Raw Blame History

5.6 KiB

Raw Blame History