Azure Build Health AI-Generated Risk Assessment
I was the lead designer working along side two cross functional teams. The team was made up of 2 Product Managers and 5 Developers and 1 designer. Together our focus was to leverage AI (Microsoft’s Copilot) to generate key insights for our Release Managers.
These set of features were the first attempt to integrate AI into an internal engineering system within Microsoft’s Azure.
Azure Build Health is an internal tool for Azure developers to automatically aggregates quality insights across build, validation, and deployment systems into a single pane of glass within Azure DevOps.
The team strives to empower release managers to shift-left and make well-informed deployment decisions.
Release Managers
Our core user persona - they are usually Senior developers whose main goal is to ready their teams releases, builds, pipelines for deployment.
Problems
Release Managers (RMs) manually evaluate the intent and risk of every code change contained in release.
RMs have minimal context on each change and must rely on the PR owner to attest to its quality, which takes 4-5 days to track down all the details.
In most cases, R2D reviewers observe RMs do not complete this detailed payload analysis and lack specifics on the code changes they intend to deploy, allowing defective or malicious code to slip into production.
Research conducted
I conducted 8 discovery interviews with Release Managers.
What we wanted to learn
Understand the end-to-end of their release process
Capture insights on how long it takes to prepare for a release
Gather key pain points to find areas of opportunity for the team
“Every time we go to R2D, I go commit by commit, looking at the summary to get a sense of the risk, and with that risk then I try to get a measurement of the overall risk of a given release. And as you can imagine, this is a very time consuming, somewhat subjective manual process.”
Goals
Leveraging AI, we aim to shift the burden of evaluating the risk of deploying a new release from individual engineers to purpose-built systems.
Empower release managers to make well-informed deployment decisions by providing accurate and comprehensive risk assessments from purpose-built systems that more deeply investigate each code change contained by a release.
Reduce manual toil by automatically aggregating and summarizing release payload using generative AI.
Key Results:
1. Decrease the time it takes a release manager to get to deployment
2. Improve utlization/onboarding of Azure teams onto Build Health
3. Integrate Copilot to help reduce manual toil
First iteration
Unlike many of the Copilot integrations, we decided that there should not be a chat bot.
Since every team must answer the same set of question before deployment; we leveraged AI to answers those questions on the backend and surface the answers without having to interact with a bot.
How does it work?
Robust data
Copilot now gives detailed risk information for each PR. This reduces manual analysis of each PR and quickly gives insights for the release managers to go an investigate possible issues.
Giving control to our users
Since AI was not always accurate and I wanted to build more trust, I designed options for users to change the generated content. Release Managers could change the risk generated for each payload and the overall release score. This would also help the Copilot team to take the changes and help train the AI to give better content in the future.
Working with the team
The team was myself, the Azure Build Health Product Manager and dev team. We partnered with the VIRA team which was a Product Manager and 2 developers.
This was the first time the VIRA team worked with a designer. Post handoff the VIRA team discussed what they had learned from this experience.
Key insights
Learning about our design process and how they to could contribute
Helped keep the focus on solving user problems and needs first
Sticking to the scope and prioritizing
Keep gathering user feedback between iterations
Post deployment
Once the features were released we starting hearing from early adopters that copilot data is not all that accurate.
A research study was planned to evaluate both the usefulness and usability of this copilot integration. This would be conducted by a user research once bandwidth is open.
However, we still had plans to incorporate AI into another tab within Azure Build Health.