Microsoft Introduces Safety Metrics for AI Models in Azure AI Foundry

safety metrics in azure ai foundry
Table of Contents

Microsoft has taken a significant step to make AI model safety a first-class consideration in its cloud platform. In June 2025, the company rolled out new Safety Metrics within Azure AI Foundry, enabling organisations to measure and compare the safety of AI models just as easily as their performance or cost.

Azure AI Foundry, an “AI app and agent factory” in the Azure ecosystem, now provides built-in tools and metrics to evaluate model behaviour and risks. This means selecting the right AI model is no longer just about accuracy or speed; it’s now also about trust, compliance, and risk management.

These new safety evaluations help teams assess models “not only on quality and cost, but also on safety using open benchmarks.” In practice, this update empowers enterprises to choose AI models that meet business objectives and align with ethical and regulatory requirements.

What is Azure AI Foundry?

Azure AI Foundry is Microsoft’s integrated platform for designing, customising, and managing AI applications at scale. Part of the broader Azure AI ecosystem, Foundry brings together prebuilt and custom AI models, tools for fine-tuning and orchestration, and robust safeguards under one roof.

It serves as a centralised hub within Azure AI Studio, where developers and organisations can discover foundation models (OpenAI and open-source), build multi-agent solutions, and monitor both performance and safety. Foundry was created to simplify enterprise AI development, providing a unified API and interface for model discovery and deployment.

Crucially, Azure AI Foundry includes out-of-the-box tools for observability, governance, and safety, helping organisations embed Responsible AI into every stage of development.

New Safety Metrics: Making AI Safety Measurable

To ensure AI systems are trustworthy, Microsoft launched the public preview of Safety Metrics in Azure AI Foundry. This update adds safety as a core model selection criterion alongside quality, latency, and cost.

Some of the new tools include:

  • A Safety Leaderboard for ranking models by robustness against harmful outputs
  • A Quality–Safety Trade-off Chart that visualises performance vs. safety
  • Scenario-specific leaderboards for targeted safety benchmarks (e.g. toxicity, harmful knowledge)

Safety Leaderboard: Ranking Model Risk

The Safety Leaderboard uses a metric called Attack Success Rate (ASR): the percentage of adversarial prompts that cause the model to produce harmful outputs. Microsoft evaluates models using HarmBench, an open-source benchmark containing prompts in areas like violence, harassment, and misinformation.

Lower ASR = higher safety. For example, a model with 2% ASR is safer than one with 7%. Foundry displays these scores clearly, making safety visible and comparable in the model selection process.

Balancing Quality and Safety with Trade-off Charts

Foundry includes an interactive Quality–Safety Chart showing each model’s quality (e.g. task performance) versus its ASR. This helps teams identify models that strike the right balance for their risk tolerance, either the safest model at a required level of performance, or the best-performing model that still stays under a defined safety threshold.

Scenario-Specific Safety Evaluations

In addition to general safety scores, Microsoft released five scenario-based leaderboards, including:

  • HarmBench Variants – Safety against harmful content in standard, contextual, and copyright scenarios
  • ToxiGen – Detecting toxic language and subtle hate speech
  • WMD Proxy – Model knowledge of sensitive domains like cybercrime or biosecurity

These tailored metrics allow organisations to align model choices with specific risk concerns, such as offensive language or dangerous instructions.

How It Works: Safety Evaluation Under the Hood

Behind the scenes, Foundry uses Azure AI Evaluation SDK, which lets teams simulate adversarial attacks and analyse model responses for safety risks.

  • Models like GPT-4 from Azure OpenAI generate harmful prompts (attack simulation)
  • Another Azure AI model evaluates the responses, classifying them by risk (e.g. hate, violence, self-harm)
  • Developers can automate this using evaluators like HateUnfairnessEvaluator or ViolenceEvaluator in their workflows

This enables continuous testing, so teams can track regressions and ensure new model versions stay compliant.

Responsible AI Benefits for Governance and Compliance

Microsoft’s approach helps organisations embed Responsible AI into their workflows with measurable, transparent data:

  • Governance: Safety metrics allow for objective model vetting and approvals
  • Compliance: Scenario benchmarks map to real regulatory concerns (toxicity, bias, misinformation)
  • Development: Teams can integrate safety checks into CI/CD pipelines
  • Executive Confidence: Leaders can trust that models are safe enough for customer-facing use cases

With the rollout of safety metrics in Azure AI Foundry, Microsoft has made responsible AI a measurable, operational reality. Enterprises now have the tools to select models not just for performance or price, but for trustworthiness, with safety scores they can monitor, enforce, and explain.

Stellium

June 24, 2025