Model Deployment

Deploy models in Microsoft Foundry using multiple options optimized for different scenarios.

Deployment Methods

Serverless API Deployment

Characteristics:

Pay-per-token billing
Microsoft-managed infrastructure
Automatic scaling
No capacity planning

Example:

# Model is accessed via serverless API endpoint
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages
)

Provisioned Throughput

Characteristics:

Reserved capacity (PTUs)
Predictable cost and performance
Dedicated resources
Fungible across models

Example:

deployment = client.deployments.create(
    model="gpt-4o",
    sku={
        "name": "ProvisionedManaged",
        "capacity": 100  # Provisioned Throughput Units
    }
)

Managed Compute

Characteristics:

Deploy to Azure VMs
Billed for VM hours
Supports open-source models
Full infrastructure control

Deployment Process

Select Model

Choose from model catalog based on requirements

Choose Deployment Option

Serverless, Provisioned, or Managed Compute

Configure Settings

Region, capacity, version settings

Deploy

Create deployment via portal, CLI, or SDK

Test

Verify deployment with test requests

Regional Considerations

Model availability varies by region
Check Region Support
Consider data residency requirements
Evaluate latency for global users

Model Lifecycle

GA: Full support and SLA
Deprecation Notice: 6-12 months warning
Deprecated: No new deployments
Retired: Model unavailable

Set up auto-update for seamless transitions. See Model Overview for model catalog details.

Foundry Models Overview

Model Region Support

⌘I

Documentation Index

​Model Deployment

​Deployment Methods

​Serverless API Deployment

​Provisioned Throughput

​Managed Compute

​Deployment Process

​Regional Considerations

​Model Lifecycle

Model Deployment

Deployment Methods

Serverless API Deployment

Provisioned Throughput

Managed Compute

Deployment Process

Regional Considerations

Model Lifecycle