🔥 🔥Practical Open Source is coming 🔥🔥 Propose an article about doing business with Open Source!

Why Data Governance is the backbone of ethical AI

Lauren Maffeo, a senior service designer at Steampunk, a human-centered design firm catering to the United States federal government, debunks the myths about artificial intelligence. With her expertise and insightful book “Building Data Governance from the Ground Up,” Maffeo reveals how companies often fall into the trap of believing AI is a magical solution for everything. (Catch our review of her book for more or find it from the publisher or library access.)

She talks to OSNet about how to get started in data governance, why it’s the backbone of ethical AI and how to avoid common pitfalls.

Tell us what your time at Gartner taught you about how companies struggle to implement AI and other big data projects?

My time at Gartner taught me that the volume of data that companies consume and produce is and will keep growing exponentially. Meanwhile, the data maturity at most organizations is still extremely low, to the point that some survey data shows just one in four leaders saying they’re data-driven. 

So, as the amount of data increases, orgs’ ability to manage it decreases. This presents a huge range of ethical, security, and quality risks. That’s why there’s no alternative to data governance: It’s not nice to have, it’s a must-have.

What moved you to write your book?

I wrote this book after years of writing for and working with clients who want to use the latest and greatest artificial intelligence to grow their businesses. I covered trends in cloud business intelligence (BI) software for small and midsize businesses as an analyst at Gartner before transitioning to systems and service design work for U.S. government clients. 

This work has shown me that regardless of size, industry, mission, etc., today’s organizations are simply not prepared to build or use AI effectively. While some organizations are more mature than others, most don’t have the tools, talent, or strategy to yield high-quality data, which is crucial for using AI. I wrote this book for leaders who want to get started doing data governance and know they need help getting their strategies off the ground. 

Can you give us a short definition of data governance?

Data governance is the confluence of people, processes and tools used to deploy data at scale. It’s the backbone of ethical AI – and AI writ large – because you can’t produce AI that’s ethical or accurate without data governance. 

Can you identify the top pitfalls organizations stumble over in scaling these projects?

  1. Not having a data strategy that matches their mission. There’s a reason why your organization exists, whether it’s a commercial, Open Source, or nonprofit org. When I once asked a chief data officer (CDO) what his org’s mission was, he started explaining what his data office did and the new machine-learning tools that he wanted to use. He also told me we would “do data governance later” once some models reached production. This proved to me that his office was not ready to use machine learning or invest in data governance, because he didn’t know how his office created business value that helped fulfill the mission.
  2. Not defining data quality standards. At the absolute bare minimum, you need to define Pass and Fail conditions for your data. I’ve spoken to quality assurance (QA)s who told me that they basically can’t do their jobs because they don’t know if data flowing through the pipelines they review meets organizational standards. Likewise, I often speak to data product users who tell me they distrust the data that their organizations produce. If colleagues and users can’t be assured that your data has been vetted, they won’t have a good reason to trust it.
  3. Not defining data quality standards. At the absolute bare minimum, you need to define Pass and Fail conditions for your data. I’ve spoken to QAs who told me that they basically can’t do their jobs because they don’t know if data flowing through the pipelines they review meets organizational standards. Likewise, I often speak to data product users who tell me they distrust the data that their organizations produce. If colleagues and users can’t be assured that your data has been vetted, they won’t have a good reason to trust it.
  4. Not knowing the source systems for their data. I’ve lost track of how many times I’ve worked on client projects and asked where certain datasets lived only to be met with shrugged shoulders. This is one of many examples of why data lineage is so essential: If you don’t even know where your data is, how can you start using it to build AI?

Why is there no effective way to govern data at scale? Is it because the problem has not come up until now?

AI isn’t new, but the amount of data being produced and ingested today makes it easier than ever to access huge datasets that you can use to train large language models (LLMs) and other AI products. That said, I disagree that there’s no effective way to govern data at scale, which is why I wrote the book. I think it’s easier for leaders to say, “This isn’t my problem” and keep ignoring the very real, growing issues they have with their data. 

Once you start assigning ownership of data and building a data-driven culture, you can start automating your data standards throughout your data architecture. This is the disconnect that I see most often: If data governance does exist, it often lives in a random Word document on someone’s local laptop where no one sees or pays attention to it. 

What do you mean by “linking data use back to the business strategy?”

I mean that most data leaders and practitioners don’t do a great job explaining the tangible impact that good data governance has on their colleagues and customers. Ultimately, people aren’t moved by numbers: We respond to storytelling. Do you care that just one in four business leaders say they’re driven? Does that statistic have the same impact as saying that a data regression analysis of employee benefits showed the lack of maternity benefits led to poor retention of women employees and that providing more benefits in response to this analysis helped reverse the trend?

That’s a real example from one of my clients which shows how powerful data governance can be. When you have the right data of sound quality to make business decisions, you can do amazing things to improve your colleagues and customers’ lives. I love data and the positive impact it can have. That’s why I’m so passionate about helping leaders use it effectively.

Why is it important to involve colleagues outside the data science team in developing effective scaling plans? Is it because we’ve essentially been asking that a growing amount of data be managed in one warehouse, by one team, away from the rest of the organization?

Exactly. Too many people still think data is “not their problem” and “someone else” (probably a data scientist hired without guidance or direction) will take care of it. The truth is that organizations produce and ingest too much data for one person or team to manage it all. This approach doesn’t scale or make a meaningful impact. It also reinforces the “top-down” data hierarchy where some random IT colleague holds the keys for data access without knowing the business context for that data.

The alternative approach that I share in my book is to find subject matter experts who can serve as stewards of the data in their respective domains. These SMEs are best positioned to write data definitions, advise on which pieces of metadata should be attached to each piece of data, write contextual summaries for datasets, and other tasks. Then, they can work with technical experts like data engineers who manage the data environment and can implement those standards. 

That’s how you co-create data governance: Elevating subject matter experts per data domain by giving them ownership and autonomy to help define data quality in their areas of expertise, and rewarding them for this work.

In your research, what have you identified as the most important basics in co-creating data governance programs that will last for the long haul? 

  1. Finding a framework to help your data strategy fulfill your organizational mission.
  2. Selecting data stewards to serve as subject matter experts of the data in their domains.
  3. Creating a data governance council to work/vote on key initiatives and break team silos.
  4. Write a roadmap for a data product that can drive the biggest impact on your business.
  5. Practice governance-driven development, where you automate your data quality standards into your data environment.
  6. Make a plan to monitor data governance post-deployment, because when it comes to AI, getting past production is just the beginning.

What role does open source play in all this?

I don’t think there’s a better group that models good governance than the Open Source community. Who is more effective at innovating for the public good and inspiring people to contribute to something bigger than themselves? That said, I think open source to date is still more focused on code to the exclusion of data, and that there’s a huge opportunity for open source to start leading the charge for effective standards. I’m excited to see how Open Source might champion data governance over the next five to ten years.

On a personal note, I’m so grateful for how the Open Source community has supported my book. I attended my first Open Source conference (Open Source Summit North America in Vancouver) in 2018 to gain more speaking experience as a young analyst. Five years later, I returned to that same conference, in the same location, to give a lightning talk based on my first book, which was born at a separate Open Source conference. (All Things Open in Raleigh.) From serving as reviewers to hosting book signings at events, Open Source has shown up for me this year and it means the world. I’m excited to keep growing, giving back, and working towards an open future!

Disclaimer: All published articles represent the views of the authors, they don’t represent the official positions of the Open Source Initiative, even if the authors are OSI staff members or board directors.

Author

Support us

OpenSource.net is supported by the Open Source Initiative, the non-profit organization that defines Open Source.

Trending