Generate random data with integers. The approach is as follows: - There is one collection for each different cardinality. All collections contain the same fields. - Each field contains the data generated from a certain data distribution. The data could be anything - same type, mixed types, same mathematical distribution (e.g. normal), or a mixed distribution. - The committed configuration file, and the corresponding data file are reduced to only two small collections. For actual experiments one needs to add more data sizes, and re-generate the data locally. This is done so that Evergreen tests can run fast, and to reduce the size of the git repository. - All data is saved in a single JavaScript file: jstests/query_golden/libs/data/ce_accuracy_test.data, with a corresponding schema file jstests/query_golden/libs/data/ce_accuracy_test.schema. - The data file is a JavaScript file that can be loaded directly inside a JS test. When loading this file, it creates a global variable dataSet. The reason is that this is the only way to load an external JSON file that doesn't need to install external tools in Evergreen.
5.6 KiB
5.6 KiB