Architecture Rules of the Road

A calm, consistent, and clever approach to data engineering.


🔹 1. Look in the Box First

Before inventing a workaround, check what already exists.
If Microsoft or IFS built it, use it — it’s likely more robust, secure, and supported.


🔹 2. Easy Landings

Data should arrive safely and predictably.
Keep import containers simple, standardised, and transformation-free.
Bronze is where the clean-up begins — not the landing zone.


🔹 3. Everything Translates to Bronze.parquet

Whatever the source — Excel, API, SQL — it ends up as Parquet in Bronze.
Parquet gives us consistency, performance, and future-proofing.


🔹 4. Always Look at Fabric

Fabric is our first port of call.
If it can be done natively in Fabric (Lakehouse, Pipeline, Warehouse, or Notebook), that’s where it should live.


🔹 5. Single Point of Entry

All access flows through get.* views or endpoints.
They’re the single, traceable way to query and publish data.
No shortcuts. No side doors.


🔹 6. Archive Everything

We don’t delete — we archive.
Every version, every load, every file.
Storage is cheaper than regret… and it saves you worrying. 😉


🔹 7. One Way In, Many Ways Out

Data enters once — in a controlled, verified way.
It can then flow out through many curated routes: Gold, CDM, API, or Report.


🔹 8. Domain-First Design

Organise everything by business domain (project, employee, order, etc.).
Within each, maintain consistent structures:

  • core (the stable reference)
  • meta_dates, meta_codes, meta_values (the descriptive layers)
  • item_* (the detailed transactions)

🔹 9. Immutable Bronze

Bronze is a record of fact — we never edit it.
Errors are corrected downstream; history remains untouched.
We fix forward, not backward.


🔹 10. Metadata is Law

Our tables, pipelines, and upserts are metadata-driven.
If it’s not in metadata, it doesn’t exist.
That’s what keeps us consistent, and repeatable.


🔹 11. Provenance Everywhere

Every dataset carries its lineage:
source_system, extract_ts, checksum, and run_id.
It means we can trace any record to its origin.


🔹 12. Quality Without Drama

Quality gates, lineage, and logs are automatic.
If something fails, it tells us where and why — no surprises, no panic.


💡 Bonus Rule: Storage is Cheaper than Regret

Archive first, rationalise later.
Future-you will be grateful.

Leave a Comment