The validation toolkit for the Population Health Environmental Surveillance Open Data Model (PHES-ODM) ensures your ODM data is complete and interoperable. Users can check whether their data meets the ODM dictionary format.
At a glance¶
Validation is performed based on a set of rules defined in a schema. The
validate_data
function is used to perform the validation on data
with the schema
specified.
schema = import_schema("schema.yml")
errors = validate_data(schema, data)
Overview¶
The ODM library has the following features:
Validate any ODM data table as a CSV file.
Generate a report with warning and errors that indicate which data field(s) contain invalid data.
Validate any version of the ODM dicionary.
Users can add rules for their specific program. For example, you can add a list of valid testing sites for your surveillance program.
Users can request additional default validation rules.
There are three parts to the ODM validation toolkit.
Python functions to validate ODM-formatted data. These functions check ODM data for missing, incomplete, or incompatible data. The functions return a list of errors and warnings.
A list of validation rules. The rules define what are valid ODM data. Examples of rules include mandatory data fields and valid data types. For example, every measurent must have a date the measurement was performed, and the date type must follow ISO 8601 format. Rules are combined together in a validation schema.
A standard schema method. The validation toolkit extends the cerebrus validation method to provide a standard approach to defining validation rules that allows the rule list to be easily extended. For example, a wastewater surveillance program can can generate a list of the sample sites within their program; and then use the ODM validation toolkit to ensure all their data include a
siteID
with a valid idenfitication. See tutorial X for how to extend the ODM schema.
Installation¶
Make sure you have the latest version of Python installed on your system before starting. The required version can be found in pyproject.toml.
If you only want to install the package, you can run the following command:
pip install "git+https://github.com/Big-Life-Lab/PHES-ODM-Validation.git@main"
If you want to set up a development environment instead, run the following commands:
git clone https://github.com/Big-Life-Lab/PHES-ODM-Validation.git odm-validation
cd odm-validation
pip install -r ./requirements.txt
pip install -e .
Tests can be run with the following commands (after setting up a dev. env.):
pip install -r ./tests/requirements.txt
python -m unittest discover ./tests
The package documentation can be generated by following the instructions in docs/README.md.
Usage¶
Quick ‘how to’ - defined as the steps to solve a specific problem. These compliment our tutorials that are learning oriented documentation for newcommers.
Validate a specific ODM version¶
Validate with you own schema¶
Generate a schema from the ODM dictionary¶
You can genereate your own schema directly from the ODM parts table.
parts = import_dataset("parts.csv")
schema = generate_cerberus_schema(parts)
errors = validate_data(schema, data)
Make you own validation rule¶
{example of validating a list of siteIDs}
Fequently asked questions and support¶
Get support on the ODM Discourse community
forum. Use the #validate
in
your question. Use the #faq
and #validate
to find frequently asked
questions.
See Contributing for how to suggest a new rule.
Error or bugs can be reported on GitHub Issues.