Philosophy¶

Why WowData™?¶

Most data tools assume that if you are touching data, you are already an expert.

WowData™ rejects this assumption.

We believe that:

data engineering is a foundational skill,
learning should be fast and durable,
and tools should teach rather than intimidate.

WowData™ is designed so that anyone can build, read, and reason about a data pipeline—even when that pipeline performs non-trivial work.

Core Concepts¶

WowData™ is built on four stable, universal concepts:

Source — where data comes from
Transform — how data changes
Sink — where data goes
Pipeline — how everything fits together

These concepts are deliberately persistent and reused everywhere.

We avoid hidden helpers, magical shortcuts, and proliferating abstractions.

If you understand these four ideas, you understand WowData™.

Example¶

from wowdata import Source, Transform, Sink, Pipeline

pipe = (  
    Pipeline(Source("people.csv"))  
    .then(Transform("cast", params={"types": {"age": "integer"}, "on_error": "null"}))  
    .then(Transform("filter", params={"where": "age >= 30 and country == 'KE'"}))  
    .then(Sink("out_filtered.csv"))  
)

pipe.run()

This pipeline:

reads a CSV file,
explicitly casts a column,
filters rows using a small, teachable expression language,
and writes the result to disk.

Nothing is hidden. Nothing is inferred without consent.

Learning-First by Design¶

WowData™ makes deliberate trade-offs to accelerate learning:

A small, closed vocabulary of concepts
A restricted expression language that can be mastered quickly
Explicit transforms instead of silent automation
Deterministic checkpoints for inspection
A serializable pipeline that can be read like a recipe

If a feature makes learning harder—even if it saves time—we reject it.

Example: Learning-First Explicitness¶

from wowdata import Source, Transform, Sink, Pipeline

pipe = (
    Pipeline(Source("people.csv"))
    .then(
        Transform(
            "cast",
            params={
                "types": {"age": "integer"},
                "on_error": "null"   # explicit choice, not hidden behaviour
            }
        )
    )
    .then(
        Transform(
            "filter",
            params={
                "where": "age >= 18"
            }
        )
    )
    .then(Sink("adults.csv"))
)

pipe.run()
pipe.to_yaml("pipeline.yaml")

Serialization That Humans Can Read¶

Every WowData™ pipeline can be serialized into a human-readable form.

Serialization is not configuration for machines—it is a cognitive artifact for people.

Our goal is that an ordinary user can:

read a serialized pipeline,
understand what it does,
explain it to someone else,
and safely modify it.

Example: Human-Readable IR Serialization¶

The same pipeline can be represented as a simple, inspectable Intermediate Representation (IR) saved as pipeline.yaml:

wowdata: 0
pipeline:
  start:
    uri: people.csv
    type: csv
  steps:
    - transform:
        op: cast
        params:
          types:
            age: integer
          on_error: null
    - transform:
        op: filter
        params:
          where: "age >= 18"
    - sink:
        uri: adults.csv
        type: csv

This IR is deliberately verbose and stable. It is designed to be read, reviewed, versioned, and edited by humans — not generated once and forgotten.

Because the IR mirrors the core concepts (Source → Transform → Sink), anyone who understands WowData™ can understand what this pipeline does.

Example: Loading a Pipeline from IR (YAML)¶

A serialized pipeline can be loaded back into WowData™ and executed:

from wowdata import Pipeline

pipe = Pipeline.from_yaml("pipeline.yaml")
pipe.run()

This allows pipelines to be: - authored or reviewed as YAML, - stored in version control, - shared between users or systems, - and executed without modifying Python code.

The same IR is used by both the programmatic API and future graphical tools.

Errors That Teach¶

In WowData™, error messages are part of the interface.

Every user-facing error:

explains what went wrong,
explains why it happened,
suggests what to do next.

Errors are designed to teach correct mental models, not expose internal stack traces.

Example: Errors That Teach¶

from wowdata import Source, Sink, Pipeline

pipe = Pipeline(Source("missing.csv")).then(Sink("out.csv"))
pipe.run()

produces the following error

wowdata.errors.WowDataUserError: [E_SOURCE_NOT_FOUND] Source file not found: 'missing.csv'.
Hint: Check the path, working directory, and filename. If the file is elsewhere, pass an absolute path.

Built on the Best¶

WowData™ does not reinvent proven tools.

Instead, it piggybacks on best-in-class ecosystems:

mature ETL engines,
established data modelling and validation frameworks,
battle-tested execution backends.

Our contribution is an opinionated, human-centred layer that makes these tools usable by more people.

Open Source, Unapologetically¶

WowData™ is open source by principle, not convenience.

We believe that tools shaping how people think about data must be:

transparent,
inspectable,
extensible,
and owned by the community.

What WowData™ Is Not¶

WowData™ is not:

a low-code gimmick,
a black-box automation tool,
a thin wrapper around someone else’s API,
or a system that only experts can use safely.

If forced to choose between power and clarity, we choose clarity.

WowData™ is not trying to make data engineering smaller.

It is trying to make it thinkable.