Back
Blog Post

Modern Data Governance: Automating Data Documentation with Lineage and AI

Rya Sciban
June 27, 2023

In today's data-driven world, the importance of effective data documentation, especially on your base data assets (i.e. tables, columns, dashboards, charts) cannot be overstated. Data documentation is important because it provides visibility into what data you have and how it’s defined, it supports knowledge management, curation, standardization and transfer, and drives organization alignment about the state of your data.

Data documentation also plays a vital role in enabling data discovery, allowing organizations to scale their data literacy with self-service capabilities and democratize data internally. When done well, it provides a foundation for understanding what data exists, the structure, meaning, and usage of data, facilitating data analysis and data-driven decision-making.

Yet, data documentation is often at best an afterthought, and let’s be honest - no one’s favorite task. Technical writing takes time because you need to get it right - semantically and technically. You need to re-explain the domain or business level process with technical details, there are edge cases and different conditions of the data, etc.

At Select Star, we believe having a good data documentation means the following:

  • There is relevant documentation on data objects, and users can easily understand the data they are looking for, and will be more likely to come back the next time they’re trying to find something.
  • Analysts make fewer mistakes when working with data, because they understand what it means.
  • Any search across data objects is immediately enhanced as more items in your data catalog are now well documented.

As the quality and availability of artificial intelligence (AI) technologies and solutions continue to advance, there are exciting opportunities to enhance many aspects of data documentation that allow data consumers and data producers to work better together.

Select Star’s Approach to Documentation

We know documentation is important, and we know that documentation is hard to execute on. This is why it’s one of the top challenges that we decided to wrangle at Select Star.

The needs of documentation seem simple at a glance:

  • For consumers - documentation needs to be available, accurate, useful, and in context
  • For creators - documentation needs to be as easy to create as possible and auditable

Simple does not mean easy though. Who does the documentation need to be useful for? Which assets or types of  documentation are the most important to tackle? How do you ensure that documentation stays up-to-date, and is consistent throughout all the applications that the data goes through?

At Select Star, we've chosen to focus on a few key principles:

Focus on what matters - Not all data assets are created equal, and chances are that 20% of your dashboards and tables get 80% of the consumption. Select Star quickly shows you which of your most popular dashboards and tables are missing documentation, so you can focus on the documentation that has the most impact.

Select Star’s database page shows tables ordered by most used to the least used

Force multipliers - Because the needs of data documentation are so vast, focusing on solutions that make things a tiny bit easier won’t cut it. We look for solutions that can greatly improve efficiency and process. For example, Select Star’s documentation propagation ensures that you write documentation once and use it everywhere.

Select Star’s propagation job looks at upstream and downstream lineage fields that are using the same data without transformation, as well as similar looking tables and to show documentation where it makes sense.

In addition to documentation, it also propagates field-level tags the same way. If you tag a field or a column as sensitive or PII, Select Star will propagate that to upstream and downstream and save you the hassle of having to tag in other places manually. This ensures that your sensitive data is treated accurately, and you don't have to spend hours tagging it.

We’re also adding new AI-powered capabilities to do the heavy lifting of writing from zero documentation so our users don’t have to start from scratch.

Select Star suggests descriptions for a similar or duplicated table

Ownership and auditability - There is not good documentation that stands the tests of time without ownership and auditability. Select Star attaches owners to tables, authors to documentation, and has an activity log (and API) that tells you who is making what change. This level of transparency builds accountability and trust, which we plan to build upon.

AI Generated Documentation with Select Star

What the data dictionary looks like for most tables, because it takes too much time to document everything.

Our approaches to automated documentation have already helped numerous customers to achieve their documentation effort, with 2-5x more column description fill rates, and active contribution from their data team members. However, sometimes we come across customers whose data assets have no documentation at all - and it’s been empty for a while.

Someone may be able to dig through other related tables and SQLs to try to write up the documentation, but it will take a significant effort to start.

With the rapid development in LLM and ML models like OpenAI’s ChatGPT, we thought this is the next-level challenge that we can help our customers with. Select Star already has its own generated context about the data assets such as where the data is, where it came from, its related dashboards and reports, and the SQL queries that are utilizing the tables and columns – which are all great ingredients to pass to an AI model for summarization.

Today, I’m excited to announce that we’re releasing our AI Generated Documentation as a part of Automated Documentation of Select Star. Now you can easily generate descriptions for tables,columns, dashboards, and more – that will help the end-users and data consumers to be able to quickly understand your data. No one person - including AI - is “perfect” at documentation, but along with Select Star’s context, user input, and AI, this full collaboration can ensure that data documentation meets the desired quality bar.

What documentation can look like in Select Star, with description propagation, suggested descriptions, and AI helping to fill in the blanks.

Conclusion

Data documentation at scale is a hard but valuable problem to tackle. By focusing on what matters, force multipliers, and ownership and accountability, Select Star helps you focus, work smarter, and build trust in your documentation. The latest features continue to invest in these principles that are the foundation to successful data documentation in a world where we all have too many demands on our time. The future is just the beginning, and we look forward to continuing to invest in solutions that deliver substantial value to data teams.

Yes, this blog post was drafted with the assistance of chatGPT :)

Related Posts

Snowflake Cost Management Best Practices with Ian Whitestone
Learn More
A Guide to Building Data as a Product
Learn More
How Fivetran Streamlines Data Analytics with Select Star
Learn More
Data Lineage
Data Lineage
Data Quality
Data Quality
Data Documentation
Data Documentation
Data Engineering
Data Engineering
Data Catalog
Data Catalog
Data Science
Data Science
Data Analytics
Data Analytics
Data Mesh
Data Mesh
Company News
Company News
Case Study
Case Study
Technology Architecture
Technology Architecture
Data Governance
Data Governance
Data Discovery
Data Discovery
Business
Business
Data Lineage
Data Lineage
Data Quality
Data Quality
Data Documentation
Data Documentation
Data Engineering
Data Engineering
Data Catalog
Data Catalog
Data Science
Data Science
Data Analytics
Data Analytics
Data Mesh
Data Mesh
Company News
Company News
Case Study
Case Study
Technology Architecture
Technology Architecture
Data Governance
Data Governance
Data Discovery
Data Discovery
Business
Business
Turn your metadata into real insights