Azure Build Health AI-Generated Risk Assessment

I was the lead designer working along side two cross functional teams. The team was made up of 2 Product Managers and 5 Developers and 1 designer. Together our focus was to leverage AI (Microsoft’s Copilot) to generate key insights for our Release Managers.

These set of features were the first attempt to integrate AI into an internal engineering system within Microsoft’s Azure.

Azure Build Health is an internal tool for Azure developers to automatically aggregates quality insights across build, validation, and deployment systems into a single pane of glass within Azure DevOps.

The team strives to empower release managers to shift-left and make well-informed deployment decisions.

Release Managers

Our core user persona - they are usually Senior developers whose main goal is to ready their teams releases, builds, pipelines for deployment.

Problems

Release Managers (RMs) manually evaluate the intent and risk of every code change contained in release. 

RMs have minimal context on each change and must rely on the PR owner to attest to its quality, which takes 4-5 days to track down all the details. 

In most cases, R2D reviewers observe RMs do not complete this detailed payload analysis and lack specifics on the code changes they intend to deploy, allowing defective or malicious code to slip into production.

Research conducted

I conducted 8 discovery interviews with Release Managers.

What we wanted to learn

  • Understand the end-to-end of their release process

  • Capture insights on how long it takes to prepare for a release

  • Gather key pain points to find areas of opportunity for the team

Every time we go to R2D, I go commit by commit, looking at the summary to get a sense of the risk, and with that risk then I try to get a measurement of the overall risk of a given release. And as you can imagine, this is a very time consuming, somewhat subjective manual process.
— MDM Release Manager

Goals

Leveraging AI, we aim to shift the burden of evaluating the risk of deploying a new release from individual engineers to purpose-built systems.  

Empower release managers to make well-informed deployment decisions by providing accurate and comprehensive risk assessments from purpose-built systems that more deeply investigate each code change contained by a release.

Reduce manual toil by automatically aggregating and summarizing release payload using generative AI.


Key Results:
1. Decrease the time it takes a release manager to get to deployment

2. Improve utlization/onboarding of Azure teams onto Build Health

3. Integrate Copilot to help reduce manual toil

First iteration

Unlike many of the Copilot integrations, we decided that there should not be a chat bot.

Since every team must answer the same set of question before deployment; we leveraged AI to answers those questions on the backend and surface the answers without having to interact with a bot.

How does it work?

Robust data

Copilot now gives detailed risk information for each PR. This reduces manual analysis of each PR and quickly gives insights for the release managers to go an investigate possible issues.

Giving control to our users

Since AI was not always accurate and I wanted to build more trust, I designed options for users to change the generated content. Release Managers could change the risk generated for each payload and the overall release score. This would also help the Copilot team to take the changes and help train the AI to give better content in the future.

Working with the team

The team was myself, the Azure Build Health Product Manager and dev team. We partnered with the VIRA team which was a Product Manager and 2 developers.

This was the first time the VIRA team worked with a designer. Post handoff the VIRA team discussed what they had learned from this experience.

Key insights

  1. Learning about our design process and how they to could contribute

  2. Helped keep the focus on solving user problems and needs first

  3. Sticking to the scope and prioritizing

  4. Keep gathering user feedback between iterations

Post deployment

Once the features were released we starting hearing from early adopters that copilot data is not all that accurate.

A research study was planned to evaluate both the usefulness and usability of this copilot integration. This would be conducted by a user research once bandwidth is open.

However, we still had plans to incorporate AI into another tab within Azure Build Health.

Next
Next

Managed Home Screen's Shared Device