A calm, consistent, and clever approach to data engineering.
🔹 1. Look in the Box First
Before inventing a workaround, check what already exists.
If Microsoft or IFS built it, use it — it’s likely more robust, secure, and supported.
🔹 2. Easy Landings
Data should arrive safely and predictably.
Keep import containers simple, standardised, and transformation-free.
Bronze is where the clean-up begins — not the landing zone.
🔹 3. Everything Translates to Bronze.parquet
Whatever the source — Excel, API, SQL — it ends up as Parquet in Bronze.
Parquet gives us consistency, performance, and future-proofing.
🔹 4. Always Look at Fabric
Fabric is our first port of call.
If it can be done natively in Fabric (Lakehouse, Pipeline, Warehouse, or Notebook), that’s where it should live.
🔹 5. Single Point of Entry
All access flows through get.*
views or endpoints.
They’re the single, traceable way to query and publish data.
No shortcuts. No side doors.
🔹 6. Archive Everything
We don’t delete — we archive.
Every version, every load, every file.
Storage is cheaper than regret… and it saves you worrying. 😉
🔹 7. One Way In, Many Ways Out
Data enters once — in a controlled, verified way.
It can then flow out through many curated routes: Gold, CDM, API, or Report.
🔹 8. Domain-First Design
Organise everything by business domain (project
, employee
, order
, etc.).
Within each, maintain consistent structures:
core
(the stable reference)meta_dates
,meta_codes
,meta_values
(the descriptive layers)item_*
(the detailed transactions)
🔹 9. Immutable Bronze
Bronze is a record of fact — we never edit it.
Errors are corrected downstream; history remains untouched.
We fix forward, not backward.
🔹 10. Metadata is Law
Our tables, pipelines, and upserts are metadata-driven.
If it’s not in metadata, it doesn’t exist.
That’s what keeps us consistent, and repeatable.
🔹 11. Provenance Everywhere
Every dataset carries its lineage:source_system
, extract_ts
, checksum
, and run_id
.
It means we can trace any record to its origin.
🔹 12. Quality Without Drama
Quality gates, lineage, and logs are automatic.
If something fails, it tells us where and why — no surprises, no panic.
💡 Bonus Rule: Storage is Cheaper than Regret
Archive first, rationalise later.
Future-you will be grateful.