Today, we’re excited to announce our official partnership with dbt Labs. dbt has been one of the major integrations of Select Star, with more than 15,000 models and 225,000 column synched to date.
Select Star is designed to enable the data discovery that organizations need so that they can maximize their data’s potential and drive successful outcomes. With that, Select Star and dbt Labs share a common interest in supercharging analytics engineers to better transform data and maintain good documentation – so data analysts and business users can trust their data. Today we’re announcing deeply integrated features for dbt: support for dbt docs, column-level data lineage, and dbt Cloud.
Data discovery for dbt
As a data transformation platform, dbt enables analytics engineers to embrace software engineering best practices to design and validate data models. Hence for analytics engineers, dbt is the main place for documenting their data model - tables and columns.
However, as the company grows and data models become more complex, it’s hard for data analysts and other data consumers (i.e., product managers, sales / marketing / business ops) to keep up to date with the core data models and attributes when they want to do their own self-service analytics.
Many data teams share their dbt docs with the rest of the organization, but many times it’s not readily being used by the data consumers because it’s treated as technical documentation that they are not able to follow easily.
Data consumers need to understand data context - what it represents, how it was generated, and how it can or should be used along with other datasets.
With Select Star, the analytics engineering team using dbt can easily share their dbt docs accompanied with Select Star’s auto-generated data context, such as, where did this data come from?, Who's using this in the organization?, and Which are the dashboards built on top of this data?.
Select star also allows the analytics engineering team to trace data provenance and guide proper data usage for others with setting owners, tags, and rich-text docs and metrics.
1.dbt docs integration
With integrations with dbt, Select Star provides more enriched data documentation and discovery platform on top of dbt projects. By natively connecting with all the analytics stack including the data warehouse, transformation layer (dbt), and the BI tool, users can search across multiple platforms and make semantic connections between them on Select Star.
Select Star ingests the metadata and query history logs, then automatically generates popularity and data lineage models. Once data is ingested into Select Star, the platform runs a metadata sync every 24 hours automatically updating models as data changes.
For every metadata sync, Select Star will pick up the latest updates on the following:
- New dbt models, columns, tables, and schemas added or removed from databases
- SQL queries that have been run since the last sync
- BI dashboards, reports, or workbooks that have been created, modified, or removed
- Operational information, such as query execution times and last updated timestamps
Once dbt is added as a new data source in Select Star, all dbt model types will be picked up and show up like the following:
There are two ways to link dbt project to Select Star:
- dbt as its own data source: All dbt model types will show up the same as dbt docs. Each model will have the corresponding dbt table and column descriptions. The description will also be shown as suggested description on its linked tables and views.
- Overlay dbt docs to tables and views: There won’t be a separate dbt file structure in Select Star since it can be redundant with the data warehouse tables. In this case, users can choose the overlay docs option and use dbt docs to fill in their table and column descriptions.
Connecting dbt is as easy as uploading a manifest.json to Select Star, or creating a custom job on dbt Cloud.
2. dbt column-level data lineage
dbt’s data lineage is one of the most important features of dbt docs – allowing analytics engineers to see the complete data transformation flow map. Select Star’s column level lineage has also been the most important feature for our customers, because you can see the data flow and its dependencies across the data stack – how the data warehouse tables and BI dashboards are connected to each other.
With dbt integration, Select Star now provides column-level lineage of the dbt models, and the linkage between the materialized tables/views, and BI dashboards from Tableau, Looker, PowerBI, Mode, Periscope, etc.
Because dbt materialization happens as 1:1 to its dbt model, instead of showing duplicate nodes on the lineage graph, we decided to add dbt icon when the table is materialized by dbt:
Column-level lineage is a must-have feature if you want to have visibility of your ETL pipeline. Read more about data lineage here.
3. dbt Cloud integration
As more customers started using Select Star’s dbt docs integration, many wanted to have the docs to sync automatically.
Today you can do this by creating a custom job on dbt Cloud, or by utilizing our API to upload dbt manifest.json programmatically.
Select Star’s dbt Cloud integration marks a full, one-click integration support for all dbt models. As a next step, we plan to bring on additional metadata of dbt, including dbt meta tags, dbt tests, and metrics on top.
Sharing the data knowledge with everyone in the organization
Having a data discovery platform like Select Star allows everyone in the organization to be able to find, understand, and use the data more easily. Select Star’s dbt integration adds a single source of truth documentation layer, while still allowing the full control of data definition and modeling in the hands of the data team.
If you’re interested in learning more about how Select Star and dbt can improve your data management and data knowledge sharing, schedule a demo with us.