Getting Started¶
Install¶
pip install wowdata
For local development:
git clone https://github.com/sci2pro/wowdata.git
cd wowdata
pip install -e .[dev]
First Pipeline (Python)¶
from wowdata import Pipeline, Sink, Source, Transform
pipe = (
Pipeline(Source("people.csv"))
.then(Transform("cast", params={"types": {"age": "integer"}, "on_error": "null"}))
.then(Transform("filter", params={"where": "age >= 18"}))
.then(Sink("adults.csv"))
)
pipe.run()
First Pipeline (YAML + CLI)¶
wowdata: 0
pipeline:
start:
uri: people.csv
type: csv
steps:
- transform:
op: filter
params:
where: "age >= 18"
- sink:
uri: adults.csv
type: csv
Run it:
wow run pipeline.yaml
Fallback command:
wowdata run pipeline.yaml
CLI (v0)¶
WowData⢠includes a CLI for running YAML-serialized pipelines.
After installing the package, use:
wow --help
If wow conflicts with another tool in your environment, use the fallback command:
wowdata --help
Commands¶
wow run pipeline.yaml(fallback:wowdata run pipeline.yaml)- Executes the pipeline end-to-end.
-
Returns non-zero on runtime failures.
-
wow validate pipeline.yaml(fallback:wowdata validate pipeline.yaml) -
Parses YAML + IR and runs preflight checks on source/sink paths.
-
wow schema pipeline.yaml(fallback:wowdata schema pipeline.yaml) -
Infers output schema without full pipeline execution.
-
wow lock-schema pipeline.yaml -o pipeline.locked.yaml(fallback:wowdata lock-schema ...) - Writes a schema-locked YAML by embedding per-transform
output_schema.
Common flags¶
--base-dir PATHresolve relative paths in YAML from a specific directory.--jsonprint machine-readable JSON output.--sample-rows Nused byschemaandlock-schemafor bounded inference.--forcerecompute schema inference even if cached.
CLI examples¶
# Run a serialized pipeline
wow run pipeline.yaml
# Validate structure and file paths before execution
wow validate pipeline.yaml
# Print inferred output schema as JSON
wow schema pipeline.yaml --json
# Save a locked pipeline snapshot
wow lock-schema pipeline.yaml -o pipeline.locked.yaml
Exit codes¶
0: success2: CLI usage error3: pipeline parse/validation error4: pipeline runtime execution error