Azure AI Foundry as the Unified Control Plane: Deploying…

Introduction

If you have been hands-on with Azure AI over the past two years, you have felt the whiplash. One quarter it is Azure Cognitive Services, the next it is Azure OpenAI Service, then Azure AI Studio, and now — Microsoft Foundry (formerly Azure AI Foundry). It is tempting to roll your eyes at another rebrand. But this one is different.

Microsoft Foundry (formerly Azure AI Foundry) is not just a renamed portal. It is Microsoft's bet on something enterprise AI has desperately needed — a single, unified control plane for the entire AI lifecycle. Not just training and inference, but deployment, evaluation, monitoring, governance, and responsible AI guardrails all in one place. For architects and engineering leads in Malaysian enterprises navigating the AI adoption curve, understanding what Microsoft Foundry (formerly Azure AI Foundry) actually is (and is not) can save months of architectural dead ends.

I have spent the last several months helping clients across financial services, government-linked companies, and retail build their AI platforms on Azure. What follows comes straight from those trenches — the good, the messy, and the parts the documentation glosses over.

The Problem: AI Sprawl Is Real

Before we talk about the solution, let us be honest about the pain point. Most organisations I work with in Malaysia are not starting from scratch. They already have:

A handful of OpenAI GPT-4 deployments in Azure OpenAI Service for chatbot use cases.
Custom models on Azure ML compute clusters for niche predictions.
Cognitive Services endpoints (OCR, speech-to-text, translation) integrated into line-of-business apps.
Maybe a GenAI-powered document search built on Azure AI Search with embeddings in Cosmos DB.

Every one of these lives in its own silo. Each has its own deployment pipeline, monitoring dashboard, access control model, and cost centre. Your security team wants a single view of which models are internet-facing. Your compliance team wants evidence that every output passes content safety. Your finance team wants to know why the AI bill jumped 40% last month. And your data science lead wants a staging environment to A/B test a new model before production.

Without a unified control plane, answering any of these requires stitching together data from Azure Monitor, Azure Policy, Azure Cost Management, and custom scripts. That works for one or two models. It crumbles at twenty — or fifty.

This is the exact problem Microsoft Foundry (formerly Azure AI Foundry) was built to solve.

The Solution: Microsoft Foundry (formerly Azure AI Foundry) as the Unified Control Plane

Microsoft Foundry (formerly Azure AI Foundry) — at its core — is a platform that wraps Azure's existing AI capabilities into a cohesive management layer. Think of it as the orchestration hub that sits above your AI infrastructure, not a replacement for any individual service.

What It Actually Includes

Let me break down the key capabilities because the marketing material tends to bundle things vaguely:

1. The AI Hub (formerly AI Studio workspace). Your logical container. Every project, model endpoint, and evaluation run lives inside a hub. Multiple hubs per subscription (dev, staging, prod) with independent policies, network isolation, and quotas.

2. Model Catalog and Deployment. From a single interface, deploy Azure OpenAI models (GPT-4o, o3-mini, embeddings), open-source models via Model-as-a-Service (Llama, Mistral, Phi-3, Cohere), custom fine-tuned models, or self-managed compute. The workflow is consistent regardless of source — one Bicep template, one CLI command, one pricing model.

3. Evaluation and Safety Metrics. Built-in evaluators for groundedness, relevance, coherence, fluency, and similarity. They use GPT-4 as a judge with transparent prompt templates you can inspect and customise. Combined with Content Safety integration, you get automated guardrails for hate speech, self-harm, sexual content, and violence before any output reaches a user.

4. Prompt Flow. A visual designer for prompt engineering and LLM orchestration. Chain models, add retrieval steps, integrate AI Search, and bake in safety evaluation — all from a canvas that generates deployable code. For teams with fewer dedicated ML engineers, this lowers the barrier significantly.

5. Tracing and Monitoring. OpenTelemetry-based instrumentation with built-in dashboards for token usage, latency, error rates, and safety results — zero custom instrumentation code needed.

The Architecture at a Glance

At the resource level, the hierarchy looks like this:

Subscription
 └── Azure AI Hub (Microsoft.CognitiveServices/accounts)
      ├── Project (ai_project)
      │    ├── Model Deployments (GPT-4o, Llama 3, custom models)
      │    ├── Prompt Flow Runs
      │    ├── Evaluation Resources
      │    └── Connections (AI Search, Storage, Content Safety)
      ├── Shared Compute (optional)
      └── Policies & Network Rules

The AI Hub resource is technically a Cognitive Services account with a new kind of AIServices. This matters for anyone who has dealt with Azure's resource provider limitations — it means you get the same familiar Azure RBAC, diagnostic settings, private endpoints, and managed identity support you already know.

Practical Examples: Azure CLI and Bicep

Let's get into the concrete. Here is how you would set up a production-grade AI Foundry environment using infrastructure-as-code.

1. Deploy an AI Hub with Bicep

param hubName string = 'ai-hub-prod-myapp'
param location string = 'eastus2'
param sku string = 'S0'
param tags object = {
  environment: 'production'
  costCenter: 'ai-platform'
}

resource aiHub 'Microsoft.CognitiveServices/accounts@2024-10-01' = {
  name: hubName
  location: location
  kind: 'AIServices'
  sku: {
    name: sku
  }
  tags: tags
  properties: {
    customSubDomainName: hubName
    networkAcls: {
      defaultAction: 'Deny'
      virtualNetworkRules: []
      ipRules: []
    }
    publicNetworkAccess: 'Disabled'
    apiProperties: {
      statisticsEnabled: false
    }
  }
}

output hubId string = aiHub.id
output hubEndpoint string = aiHub.properties.endpoint

Note a few deliberate choices here: I set publicNetworkAccess: 'Disabled' and defaultAction: 'Deny' from day one. You can always open access later for development hubs, but locking down production by default saves you from the "oops, I forgot to configure networking" panic.

2. Deploy a GPT-4o Model Deployment via Azure CLI

Once your hub is up, deploying a model is straightforward:

# List available models in the catalog
az cognitiveservices account list-models \
  --name ai-hub-prod-myapp \
  --resource-group rg-ai-prod

# Deploy GPT-4o with a specific SKU
az cognitiveservices account deployment create \
  --name ai-hub-prod-myapp \
  --resource-group rg-ai-prod \
  --deployment-name gpt-4o-global \
  --model-name gpt-4o \
  --model-version "2024-11-20" \
  --model-format OpenAI \
  --sku-name "GlobalStandard" \
  --sku-capacity 10

The GlobalStandard SKU is worth explaining. It routes traffic through Microsoft's global infrastructure, giving higher throughput limits and better resilience than region-specific Standard. For Malaysian customers, this also means lower latency than you might expect. The trade-off? GlobalStandard is not available for all models yet, and it cannot use private network isolation. For sensitive workloads, use Standard with private endpoints.

3. Create a Project and Deploy a Prompt Flow

# Create a project within the hub
az ml workspace create \
  --name proj-doc-summariser \
  --resource-group rg-ai-prod \
  --hub-id /subscriptions/.../resourceGroups/.../providers/Microsoft.CognitiveServices/accounts/ai-hub-prod-myapp

# Deploy a prompt flow as a managed endpoint
az ml online-endpoint create \
  --name doc-summariser-endpoint \
  --resource-group rg-ai-prod \
  --workspace-name proj-doc-summariser \
  --auth-mode key

az ml online-deployment create \
  --name blue \
  --endpoint-name doc-summariser-endpoint \
  --resource-group rg-ai-prod \
  --workspace-name proj-doc-summariser \
  --file deploy/prompt-flow-deployment.yaml

The az ml workspace create with the --hub-id flag is how you attach a project to an AI Hub. This is the wiring that enables the unified governance layer — policies applied at the hub level cascade down to all projects.

4. Configure Content Safety with Bicep

resource contentSafety 'Microsoft.CognitiveServices/accounts@2024-10-01' = {
  name: 'csa-${hubName}'
  location: location
  kind: 'ContentSafety'
  sku: {
    name: 'F0'  // Free tier for evaluation
  }
}

resource aiHubProject 'Microsoft.MachineLearningServices/workspaces@2024-10-01' = {
  name: 'proj-customer-support'
  location: location
  kind: 'Default'
  properties: {
    hubResourceId: aiHub.id
  }
  // Associate Content Safety connection
}

The Content Safety integration is configure-once, enforce-everywhere. Every model deployment in the hub automatically gets content filtering unless you explicitly opt out — and you should have a very good reason before you do.

Pitfalls: What the Documentation Doesn't Tell You

After deploying Microsoft Foundry (formerly Azure AI Foundry) across multiple production environments, here are the sharp edges I have collected.

Pitfall 1: Model-as-a-Service Quota Trap

Open-source models via MaaS use a different quota pool than Azure OpenAI models. You cannot manage them through az cognitiveservices account deployment — they require az ml model deployment instead. Your IaC templates need two deployment patterns, and your monitoring setup must account for both.

Pitfall 2: Private Endpoint Limitations with GlobalStandard

If your organisation mandates private network connectivity (common in Malaysian banking and GLCs), GlobalStandard is a non-starter — it does not support private endpoints. Use Standard SKU instead, which means lower throughput and regional failover. Plan your capacity modelling accordingly.

Pitfall 3: Evaluation Is Not Free

The built-in evaluators use GPT-4 as a judge model. Each evaluation run costs tokens. If you evaluate every prompt-response pair in production (which you should for compliance), the cost adds up. Budget roughly 10-15% of your inference cost for evaluation.

Pitfall 4: RBAC Complexity

The AI Hub, projects, and underlying Azure ML workspaces each have their own RBAC model. Hub-level contributor access does not automatically grant project access. Your identity team will need careful role planning. The built-in roles (Azure AI Developer, Azure AI Engineer, Azure AI Administrator) help, but coverage across scopes is still evolving.

Pitfall 5: Region Availability Is Still Rolling Out

Not all features are available in every region. Southeast Asia (Singapore) supports most capabilities, but newer features like the full Prompt Flow designer with custom evaluators remain in eastus2/francecentral. Check the regional availability matrix before committing. For Malaysian customers, Singapore is generally best for latency, but you may need a secondary region for certain evaluation workloads.

Conclusion: Five Takeaways for Malaysian Enterprise Architects

Microsoft Foundry (formerly Azure AI Foundry) is not a silver bullet, but it is a genuine step forward in taming AI sprawl. Here is what I want you to take away:

1. Start with governance, not models. The AI Hub is powerful because of the policies, network rules, and safety configurations you bake into it. Deploy an empty hub first, configure guardrails, then add models incrementally. Retrofitting governance never goes well.

2. Invest in IaC from day one. The az cognitiveservices account deployment and Bicep templates above should be your foundation. If you have more than three models in production, you need repeatable, auditable infrastructure-as-code.

3. Budget for evaluation and safety. The cost of evaluating every output is real, but the cost of a compliance incident is far higher. Bake content safety and evaluation into your architecture from the start — the Foundry evaluators are production-grade.

4. Plan for hybrid deployment patterns. You will likely end up with some models on GlobalStandard (cost/throughput) and some on Standard with private endpoints (compliance). Design for both. The Foundry control plane handles this complexity, but only if you model it explicitly.

5. Stay close to the feature rollout map. Microsoft Foundry (formerly Azure AI Foundry) evolves rapidly. What is unavailable in Southeast Asia today may land next quarter. Test new capabilities in a sandbox subscription before committing in production.

Microsoft Foundry (formerly Azure AI Foundry) gives us the control plane we have been asking for. It is not perfect — the documentation still lags behind the product, and some sharp edges remain. But for the first time, I can point to a single pane of glass that covers deployment, evaluation, safety, and governance for every model my teams deploy. That is worth taking seriously.

Law Wen Feng is a Principal Solution Architect at Cloud Catalyst, where he helps Malaysian enterprises design and scale their AI platforms on Azure. The views expressed here are his own and do not necessarily reflect those of his employer or Microsoft.