Oli Steadman
1 min readDec 23, 2021

--

Wondering where to start with DataHub? In Shirshanka’s Nov 2020 presentation to the DataHub Community they explore the relationship between its two constituent parts: DataHub the app, and DataHub the General Metadata Architecture (GMA). These work in tandem but the latter is primary as a portal enabling the 4 business goals that originally drove DataHub’s origin within LinkedIn:

  • Reproducibility
  • Audit-ability
  • Visibility
  • Consistency of concepts

And the 2 principles:

  • Integrated with development flow
  • Data-centricity (metadata should live alongside the code)

Those metadata we need to co-maintain, include:

  • problem statement of the analytics/AI model being supplied
  • pipeline info describing the ETL elements of its lineage
  • run info for each execution of those ETL pipelines
  • associated projects depending on this model
  • associated groups (e.g. project owners, stakeholders)
  • analysis results (e.g. split into high-level exec summaries, and fully fledged reports with granular publication-level detail with analysts in mind)

That all sounds fairly technical; indeed, one comment on that presentation asks whether DataHub can be extended to accommodate business metadata. This is absolutely the case (arguably this is ootb functionality and no extension is required): Saxo have documented & presented a fantastic example from their implementation [👈 link to be added here soon].

--

--