Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/MicrosoftDocs/azure-ai-docs/llms.txt

Use this file to discover all available pages before exploring further.

Model Deployment

Deploy models in Microsoft Foundry using multiple options optimized for different scenarios.

Deployment Methods

Serverless API Deployment

Characteristics:
  • Pay-per-token billing
  • Microsoft-managed infrastructure
  • Automatic scaling
  • No capacity planning
Example:
# Model is accessed via serverless API endpoint
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

Provisioned Throughput

Characteristics:
  • Reserved capacity (PTUs)
  • Predictable cost and performance
  • Dedicated resources
  • Fungible across models
Example:
deployment = client.deployments.create(
    model="gpt-4o",
    sku={
        "name": "ProvisionedManaged",
        "capacity": 100  # Provisioned Throughput Units
    }
)

Managed Compute

Characteristics:
  • Deploy to Azure VMs
  • Billed for VM hours
  • Supports open-source models
  • Full infrastructure control

Deployment Process

1

Select Model

Choose from model catalog based on requirements
2

Choose Deployment Option

Serverless, Provisioned, or Managed Compute
3

Configure Settings

Region, capacity, version settings
4

Deploy

Create deployment via portal, CLI, or SDK
5

Test

Verify deployment with test requests

Regional Considerations

  • Model availability varies by region
  • Check Region Support
  • Consider data residency requirements
  • Evaluate latency for global users

Model Lifecycle

  • GA: Full support and SLA
  • Deprecation Notice: 6-12 months warning
  • Deprecated: No new deployments
  • Retired: Model unavailable
Set up auto-update for seamless transitions. See Model Overview for model catalog details.