Dataset plugins

Dataset plugins define canonical metric schemas — the stable tables that transforms write into and dashboards read from. They are the contract between data producers and consumers.

What a dataset is

A dataset is a versioned table schema, declared as a Python dataclass extending DatasetTable. It specifies the column names, types, units, and primary key for a canonical metric — for example, daily_hrv, transaction, or activity.

Multiple source-transform pairs can write into the same dataset table. A fitness-tracker source + a geofence transform and a phone-health source + a SQL transform can both write activity rows. Dashboards then read a unified, schema-stable table regardless of origin.

Schema stability Dataset schemas are versioned. Breaking changes require a new major version. Transforms and dashboards pin to a schema version.
Units Column names carry unit suffixes (duration_s, distance_m, weight_kg). All units are SI unless the upstream API forces otherwise.
Entry point Registered under shenas.datasets. Install with shenasctl source add --local; shenas creates the table on first transform run.

Community datasets

The canonical community datasets (activity, daily_hrv, transaction, sleep, event, note) are maintained in the main shenas-org/shenas repository under plugins/datasets/. If a dataset you need doesn't exist, open a PR to propose the schema — the bar is stability and SI units.

activity Workout or movement session: duration_s, distance_m, sport type, start_at, device_id.
daily_hrv Daily heart rate variability: date, rmssd_ms, sdnn_ms, source.
transaction Financial transaction: date, amount, currency, merchant, category, account.
sleep Sleep session: start_at, end_at, total_s, deep_s, rem_s, device_id.
event Calendar event: title, start_at, end_at, calendar, location.

This list is illustrative; authoritative definitions are in the repo. Dataset schemas are still being finalized — check the roadmap for stability status.