YAML / IR Format Reference (v0)¶
This page documents WowData's serialized pipeline format as used in YAML and internal IR dictionaries.
Use this page when you are:
- authoring pipeline YAML directly
- inspecting
Pipeline.to_ir()orPipeline.to_yaml()output - trying to understand how WowData normalizes pipeline definitions when loading them
Index¶
Top-Level Structure
A WowData pipeline YAML file serializes to a mapping with two top-level keys:
wowdatapipeline
Minimal example:
wowdata: 0
pipeline:
start:
uri: people.csv
type: csv
steps:
- transform:
op: select
params:
columns: [person_id, age]
- sink:
uri: out.csv
type: csv
Notes:
- the supported version in v0 is
wowdata: 0 pipelinemust be a mappingstepsmust be a list
Source Descriptor
The pipeline start is stored under pipeline.start.
Shape:
start:
uri: people.csv
type: csv
schema:
fields:
- name: age
type: integer
options:
delimiter: ","
Main keys:
uri: requiredtype: optional if it can be inferredschema: optionaloptions: optional
This shape corresponds to the public Source model.
Transform Step
Each transform step is wrapped in a transform: mapping.
Shape:
- transform:
op: cast
params:
types:
age: integer
on_error: "null"
Main keys inside the transform descriptor:
op: required transform operation nameparams: optional mapping of operation-specific argumentsoutput_schema: optional explicit schema override
Example with a schema-locked transform:
- transform:
op: derive
params:
new: is_adult
expr: "age >= 18"
output_schema:
fields:
- name: age
type: integer
- name: is_adult
type: boolean
This shape corresponds to the public Transform Object model.
Sink Step
Each sink step is wrapped in a sink: mapping.
Shape:
- sink:
uri: out.csv
type: csv
options:
delimiter: ","
Main keys:
uri: requiredtype: optional if it can be inferredoptions: optional
This shape corresponds to the public Sink model.
Normalization Rules
When WowData loads YAML or IR, it normalizes the structure before building a Pipeline.
Important normalization behaviors:
- if
wowdatais missing, it defaults to0 - relative source and sink paths are normalized relative to the YAML file location when loaded from disk
- missing transform
paramsbecome{}rather thannull - missing
optionson sources and sinks become{}rather thannull
Join-specific normalization:
join.params.rightis normalized relative to the base directory when it is a path- if the right-hand descriptor is a mapping, its
uriis normalized too
YAML ergonomics:
- unquoted YAML
nullforcast.on_erroris normalized to the"null"policy - YAML 1.1 can treat unquoted
on:as a boolean key in some loaders; WowData normalizes that for join params
Path/text loading behavior:
Pipeline.from_yaml(...)accepts either a path or raw YAML text- multiline document-like strings are treated as YAML text, not as filesystem paths
IR And YAML¶
In practice:
- YAML is the human-authored surface
- IR is the normalized dict shape used by
Pipeline.to_ir()andPipeline.from_ir(...)
If you are authoring by hand, write YAML.
If you are inspecting programmatically, use to_ir() or to_yaml().