Epidemiology Examples¶
These examples show epidemiological data workflows built around the newer string operations.
Run them from the repo root with:
wow run examples/epi_line_list_cleanup.yaml --base-dir examples
wow run examples/epi_weekly_incidence_cleanup.yaml --base-dir examples
Line List Cleanup¶
This workflow cleans a simple outbreak line list and enriches it with facility metadata.
Inputs¶
epi_line_list_raw.csv:
| case_id | patient_name_raw | sex_raw | age_raw | onset_date_raw | district_code_raw | facility_label | symptom_blob | outcome_note | classification_note |
|---|---|---|---|---|---|---|---|---|---|
| CL-001 | ama njoroge |
f |
34 | 2025-03-02 | dist-7 |
site:HC01|north wing |
fever;cough;headache |
admitted-case |
confirmed |
| CL-002 | JOHN OTIENO |
M |
not_reported |
2025-03-03 | DIST-12 |
site:HC02|field tent |
fever |
home-isolation |
probable |
| CL-003 | liya Hassan |
Female |
17 | 2025-03-04 | dist-3 |
site:HC99|triage |
rash;fever |
transferred-out |
suspected |
epi_sites.csv:
| facility_code | facility_name | county |
|---|---|---|
| HC01 | Kijiji Health Centre | Kisumu |
| HC02 | River Road Clinic | Nairobi |
| HC99 | Lakeside Triage Post | Homa Bay |
Pipeline¶
epi_line_list_cleanup.yaml:
wowdata: 0
pipeline:
start:
uri: epi_line_list_raw.csv
type: csv
steps:
- transform:
op: string
params: {column: patient_name_raw, action: strip}
- transform:
op: string
params: {column: patient_name_raw, action: replace, old: " ", new_value: " "}
- transform:
op: string
params: {column: patient_name_raw, action: title, new: patient_name}
- transform:
op: string
params: {column: sex_raw, action: strip}
- transform:
op: string
params: {column: sex_raw, action: upper, new: sex}
- transform:
op: string
params: {column: district_code_raw, action: strip}
- transform:
op: string
params: {column: district_code_raw, action: upper}
- transform:
op: string
params: {column: district_code_raw, action: removeprefix, prefix: "DIST-", new: district_code}
- transform:
op: string
params: {column: district_code, action: zfill, width: 3}
- transform:
op: string
params: {column: facility_label, action: removeprefix, prefix: "site:", new: facility_compact}
- transform:
op: string
params: {column: facility_compact, action: partition, sep: "|", new: facility_parts}
- transform:
op: string
params:
column: facility_compact
action: regex_extract
pattern: "^([A-Z0-9]+)"
group: 1
new: facility_code
- transform:
op: string
params: {column: symptom_blob, action: split, sep: ";", new: symptom_tokens}
- transform:
op: string
params: {column: outcome_note, action: replace, old: "-", new_value: " ", new: outcome_clean}
- transform:
op: string
params: {column: outcome_clean, action: title}
- transform:
op: string
params: {column: classification_note, action: capitalize, new: classification}
- transform:
op: cast
params:
types: {age_raw: integer}
on_error: "null"
- transform:
op: join
params:
right: epi_sites.csv
on: [facility_code]
how: left
- transform:
op: derive
params:
new: is_admitted
expr: "outcome_clean == 'Admitted Case'"
- transform:
op: select
params:
columns: [case_id, patient_name, sex, age_raw, onset_date_raw, district_code, facility_code, facility_name, county, symptom_tokens, outcome_clean, classification, is_admitted]
- sink:
uri: epi_line_list_clean.csv
type: csv
Expected Output¶
epi_line_list_clean.csv:
| case_id | patient_name | sex | age_raw | onset_date_raw | district_code | facility_code | facility_name | county | symptom_tokens | outcome_clean | classification | is_admitted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| CL-001 | Ama Njoroge | F | 34 | 2025-03-02 | 007 | HC01 | Kijiji Health Centre | Kisumu | ['fever', 'cough', 'headache'] |
Admitted Case | Confirmed | True |
| CL-002 | John Otieno | M | 2025-03-03 | 012 | HC02 | River Road Clinic | Nairobi | ['fever'] |
Home Isolation | Probable | False | |
| CL-003 | Liya Hassan | FEMALE | 17 | 2025-03-04 | 003 | HC99 | Lakeside Triage Post | Homa Bay | ['rash', 'fever'] |
Transferred Out | Suspected | False |
Weekly Incidence Cleanup¶
This workflow cleans district-level weekly incidence inputs before dashboarding or bulletin generation.
Inputs¶
epi_weekly_incidence_raw.csv:
| district_code_raw | district_label_raw | epi_week_raw | cases_raw | report_file | bulletin_path | status_note | investigator_email_raw |
|---|---|---|---|---|---|---|---|
dist-7 |
KISUMU COUNTY |
ew07 |
14 | week07.csv |
bulletins/weekly/ew07 |
draft. |
[email protected] |
dist-12 |
nairobi county |
EW08 |
27 | week08.csv |
bulletins/weekly/ew08 |
final. |
[email protected] |
dist-3 |
HOMA BAY COUNTY |
ew09 |
9 | week09.csv |
bulletins/weekly/ew09 |
provisional.. |
[email protected] |
Pipeline¶
epi_weekly_incidence_cleanup.yaml:
wowdata: 0
pipeline:
start:
uri: epi_weekly_incidence_raw.csv
type: csv
steps:
- transform:
op: string
params: {column: district_label_raw, action: strip}
- transform:
op: string
params: {column: district_label_raw, action: title, new: district_label}
- transform:
op: string
params: {column: district_label, action: casefold, new: district_key}
- transform:
op: string
params: {column: district_code_raw, action: strip}
- transform:
op: string
params: {column: district_code_raw, action: lower}
- transform:
op: string
params: {column: district_code_raw, action: removeprefix, prefix: "dist-", new: district_code}
- transform:
op: string
params: {column: district_code, action: zfill, width: 3}
- transform:
op: string
params: {column: epi_week_raw, action: upper}
- transform:
op: string
params: {column: epi_week_raw, action: removeprefix, prefix: "EW", new: epi_week_num}
- transform:
op: string
params: {column: epi_week_num, action: zfill, width: 2}
- transform:
op: string
params: {column: report_file, action: removesuffix, suffix: ".csv", new: report_stub}
- transform:
op: string
params: {column: bulletin_path, action: rpartition, sep: "/", new: bulletin_parts}
- transform:
op: string
params: {column: status_note, action: rstrip, chars: ". ", new: status_clean}
- transform:
op: string
params: {column: status_clean, action: capitalize}
- transform:
op: string
params: {column: investigator_email_raw, action: casefold, new: investigator_email}
- transform:
op: cast
params:
types: {cases_raw: integer}
on_error: "null"
- transform:
op: derive
params:
new: incidence_flag
expr: "cases_raw >= 20"
- transform:
op: select
params:
columns: [district_code, district_label, district_key, epi_week_num, cases_raw, report_stub, bulletin_parts, status_clean, investigator_email, incidence_flag]
- sink:
uri: epi_weekly_incidence_clean.csv
type: csv
Expected Output¶
epi_weekly_incidence_clean.csv:
| district_code | district_label | district_key | epi_week_num | cases_raw | report_stub | bulletin_parts | status_clean | investigator_email | incidence_flag |
|---|---|---|---|---|---|---|---|---|---|
| 007 | Kisumu County | kisumu county | 07 | 14 | week07 | ('bulletins/weekly', '/', 'ew07') |
Draft | [email protected] | False |
| 012 | Nairobi County | nairobi county | 08 | 27 | week08 | ('bulletins/weekly', '/', 'ew08') |
Final | [email protected] | True |
| 003 | Homa Bay County | homa bay county | 09 | 9 | week09 | ('bulletins/weekly', '/', 'ew09') |
Provisional | [email protected] | False |
Additional String Snippets For Epidemiological Data¶
These smaller snippets are useful when you need an operation that is less common in the two full workflows above.
lstrip¶
- transform:
op: string
params:
column: household_code
action: lstrip
chars: " 0"
Example: " 00042" becomes "42".
rpartition¶
- transform:
op: string
params:
column: specimen_path
action: rpartition
sep: "/"
new: specimen_path_parts
Example: "uploads/specimens/S-204.csv" becomes ("uploads/specimens", "/", "S-204.csv").
format¶
- transform:
op: string
params:
column: bulletin_template
action: format
kwargs:
district: Kisumu County
week: "07"
new: bulletin_message
Example: "EW {week}: {district} reported elevated incidence" becomes "EW 07: Kisumu County reported elevated incidence".
encode¶
- transform:
op: string
params:
column: payload
action: encode
encoding: utf-8
new: payload_bytes
Example: "case-summary" becomes b"case-summary".
swapcase¶
- transform:
op: string
params:
column: qa_marker
action: swapcase
Example: "DrAfT" becomes "dRaFt".