Tableau Data Catalog

Oli Steadman
2 min readDec 14, 2020

In September 2019 Tableau announced their new Tableau Catalog tool via the company blog. Almost a year since they began to release videos like that linked above, which showcase a lot of functionality that can be described as “data cataloguing” without having to be wrapped up into a formal, versioned, deployed product with the same complexities you might see from other catalogue vendors.

It certainly missed out on a mention in The Forrester Wave Machine Learning Data Catalogs Q4 2020 but it’s hard to decide whether that’s due to any lack in feature set, or to Tableau’s newness to the cataloguing brandscape (an already crowded arena). Features advertised in the original announcement and explained to some extent in the showcases, include:

  • External Assets list
  • Lineage and impact analysis
  • Data Quality Warnings
  • Data Details
  • Enhanced search
  • Metadata API

These are standard features across the vast majority of enterprise and even open source catalogs; of course there’s the differentiating factor of having a catalog right alongside your Tableau instance where you presumably already do a bunch of analytics and possibly even governance. But it’s this last point that really leaves a deafening silence: no mention of Data Governance. Almost every catalog worth its salt is marketed on governance grounds with the hyped buzz words being “democratisation”, “access”, “transparency”, and the like which, whilst Tableau’s announcement may skip for no more dramatic reason than that it feels quite obvious to them (their product being one built on shared dashboards & collaboration) feels like the smoking gun on a lightweight product offering. I’m open to being proven wrong on that!

Questions left unanswered by the various showcase/demo videos, include:

  • what’s the infrastructure/footprint requirement; does it simply run alongside Tableau Server?
  • where do governance actions & requests get handled, inside the UI or some other way?
  • documentation mentions NODE_LIMIT_EXCEEDED as one of the typical error messages to expect; are nodes configurable to allow handling of more significant workloads?
  • where is configuration done, via some admin-only UI view? how is this protected (e.g. by credentials or ADFS)?

A more detailed product overview was made available 6 months after the initial announcement but doesn’t go into very much more depth than the blog post, the sales page, and the Getting Started guide. I’m looking forward to exploring further and perhaps posting a follow-up here as to my findings.

--

--