Getting metadata into Collibra

Oli Steadman
2 min readDec 17, 2020

--

What options/pathways are on offer?

JDBC driver-powered Native Integrations

These are available (via Marketplace) for >40 providers including:

  • SAP HANA
  • Kafka
  • Common formats e.g. CSV, XML, XLSX, JSON, Parquet
  • Databases e.g. PostgreSQL, MySQL, MongoDB, Couchbase, Cassandra, MarkLogic, Vertica, SQL Server
  • Engines e.g. Spark, Hive
  • Azure e.g. CosmosDB, Synapse, Table Storage
  • AWS e.g. Athena, DynamoDB, Redshift (for which you can see in docs here the explicit set of technical metadata available)
  • GCP e.g. BigQuery
  • Other cloud-hosted platforms e.g. Salesforce, Workday, Oracle, Snowflake, Databricks, Workday

They are published by a range of providers, with varying price range (none appear to be free of charge) and varying depth of documentation (Collibra-provided drivers are discussed here). The licenses always go something like:

Use of this solution requires a license to Collibra Catalog and the purchase of Metadata Connectors at a volume commensurate with desired use. Please contact your sales representative or Customer Success Manager for more information.

Some popular external systems (such as Tableau and S3) can also be registered without any driver — see docs for advice on how to do that.

Creating data with Collibra REST API

Endpoints include auth (required for calls to almost all the other endpoints), assets, communities, domains, attributes (e.g. definition of an asset); each can be appended with bulk to add multiple.

curl -X POST \
https://<your_Collibra_Data_Center_env_URL>/rest/2.0/domains \
-H 'Content-Type: application/json' \
-d '{
"name": "Finance Glossary",
"description": "A collection of finance-related assets.",
"communityId": "<Finance_community_id>",
"typeId": "00000000-0000-0000-0000-000000010001"
}'

Collibra Connect

This tool extends MuleSoft Anypoint Studio and appears to have once been versatile, appears as of Dec 2020 to have been deprecated or at least removed from long-term support. The official Collibra advice is:

We have made the decision to transition away from Collibra Connect so that we can better serve you and ensure you can use future product functionality without re-instrumenting or rebuilding integrations.

Current customers who are using Collibra Connect can still access Connect resources below. For all other customers, we encourage the use of our native integrations and APIs for your integration needs.

There does appear to be one section or component of Collibra Connect still approved for use: Collibra Connect Hub (M4). This uses Mule ESB 4.2 to provide functionality that is discoverable & consumable, abstracting the 2.0 API (discussed above) to standardise integration and provide a user-friendly interface that “enables business users to configure integrations whilst hiding the technical details”.

Conclusion

Several programmatic means exist to populate & enrich your Collibra environment with metadata from any number of data sources. If none of these float your boat, there’s always manual point-and-click available in the GUI.

Enjoy!

--

--