PHES-ODM Validation Documentation

The validation toolkit for the Population Health Environmental Surveillance Open Data Model (PHES-ODM) ensures your ODM data is complete and interoperable. Users can check whether their data meets the ODM dictionary format.

At a glance

Validation is performed based on a set of rules defined in a schema. The validate_data function is used to perform the validation on data with the schema specified.

schema = import_schema("schema.yml")
errors = validate_data(schema, data)

Overview

The ODM library has the following features:

  • Validate any ODM data table as a CSV file.

  • Generate a report with warning and errors that indicate which data field(s) contain invalid data.

  • Validate any version of the ODM dicionary.

  • Users can add rules for their specific program. For example, you can add a list of valid testing sites for your surveillance program.

  • Users can request additional default validation rules.

There are three parts to the ODM validation toolkit.

  1. Python functions to validate ODM-formatted data. These functions check ODM data for missing, incomplete, or incompatible data. The functions return a list of errors and warnings.

  2. A list of validation rules. The rules define what are valid ODM data. Examples of rules include mandatory data fields and valid data types. For example, every measurent must have a date the measurement was performed, and the date type must follow ISO 8601 format. Rules are combined together in a validation schema.

  3. A standard schema method. The validation toolkit extends the cerebrus validation method to provide a standard approach to defining validation rules that allows the rule list to be easily extended. For example, a wastewater surveillance program can can generate a list of the sample sites within their program; and then use the ODM validation toolkit to ensure all their data include a siteID with a valid idenfitication. See tutorial X for how to extend the ODM schema.

Installation

Make sure you have the latest version of Python installed on your system before starting. The required version can be found in pyproject.toml.

If you only want to install the package, you can run the following command:

pip install "git+https://github.com/Big-Life-Lab/PHES-ODM-Validation.git@main"

If you want to set up a development environment instead, run the following commands:

git clone https://github.com/Big-Life-Lab/PHES-ODM-Validation.git odm-validation
cd odm-validation
pip install -r ./requirements.txt
pip install -e .

Tests can be run with the following commands (after setting up a dev. env.):

pip install -r ./tests/requirements.txt
python -m unittest discover ./tests

The package documentation can be generated by following the instructions in docs/README.md.

Usage

Quick ‘how to’ - defined as the steps to solve a specific problem. These compliment our tutorials that are learning oriented documentation for newcommers.

Validate a specific ODM version

Validate with you own schema

Generate a schema from the ODM dictionary

You can genereate your own schema directly from the ODM parts table.

parts = import_dataset("parts.csv")
schema = generate_cerberus_schema(parts)
errors = validate_data(schema, data)

Make you own validation rule

{example of validating a list of siteIDs}

Fequently asked questions and support

Get support on the ODM Discourse community forum. Use the #validate in your question. Use the #faq and #validate to find frequently asked questions.

See Contributing for how to suggest a new rule.

Error or bugs can be reported on GitHub Issues.

Table of Contents