# azure-ai-inference

> Microsoft Azure AI Inference Client Library for Python

- **URL**: https://www.freshcrate.ai/projects/azure-ai-inference
- **Author**: Microsoft Corporation
- **Category**: Security
- **Latest version**: `azure-mgmt-computelimit_1.1.0` (2026-06-02)
- **License**: MIT License
- **Source**: https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference
- **Language**: Python
- **GitHub**: 5,526 stars, 3,285 forks
- **Registry**: pypi (`azure-ai-inference`)
- **Tags**: `azure`, `pypi`, `sdk`

## Description

# Azure AI Inference client library for Python

Use the Inference client library (in preview) to:

* Authenticate against the service
* Get information about the AI model
* Do chat completions
* Get text embeddings
* Get image embeddings

The Inference client library supports AI models deployed to the following services:

* [GitHub Models](https://github.com/marketplace/models) - Free-tier endpoint for AI models from different providers
* Serverless API endpoints and Managed Compute endpoints - AI models from different providers deployed from [Azure AI Foundry](https://ai.azure.com). See [Overview: Deploy models, flows, and web apps with Azure AI Foundry](https://learn.microsoft.com/azure/ai-studio/concepts/deployments-overview).
* Azure OpenAI Service - OpenAI models deployed from [Azure AI Foundry](https://oai.azure.com/). See [What is Azure OpenAI Service?](https://learn.microsoft.com/azure/ai-services/openai/overview). Although we recommend you use the official [OpenAI client library](https://pypi.org/project/openai/) in your production code for this service, you can use the Azure AI Inference client library to easily compare the performance of OpenAI models to other models, using the same client library and Python code.

The Inference client library makes services calls using REST API version `2024-05-01-preview`, as documented in [Azure AI Model Inference API](https://aka.ms/azureai/modelinference).

[Product documentation](https://aka.ms/aiservices/inference)
| [Samples](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/samples)
| [API reference documentation](https://aka.ms/azsdk/azure-ai-inference/python/reference)
| [Package (Pypi)](https://aka.ms/azsdk/azure-ai-inference/python/package)
| [SDK source code](https://github.com/Azure/azure-sdk-for-python/tree/main/sdk/ai/azure-ai-inference/azure/ai/inference)

## Reporting issues

To report an issue with the client library, or request additional features, please open a GitHub issue [here](https://github.com/Azure/azure-sdk-for-python/issues). Mention the package name "azure-ai-inference" in the title or content.

## Getting started

### Prerequisites

* [Python 3.8](https://www.python.org/) or later installed, including [pip](https://pip.pypa.io/en/stable/).
* For GitHub models
  * The AI model name, such as "gpt-4o" or "mistral-large"
  * A GitHub personal access token. [Create one here](https://github.com/settings/tokens). You do not need to give any permissions to the token. The token is a string that starts with `github_pat_`.
* For Serverless API endpoints or Managed Compute endpoints
  * An [Azure subscription](https://azure.microsoft.com/free).
  * An [AI Model from the catalog](https://ai.azure.com/explore/models) deployed through Azure AI Foundry.
  * The endpoint URL of your model, in of the form `https://<your-host-name>.<your-azure-region>.models.ai.azure.com`, where `your-host-name` is your unique model deployment host name and `your-azure-region` is the Azure region where the model is deployed (e.g. `eastus2`).
  * Depending on your authentication preference, you either need an API key to authenticate against the service, or Entra ID credentials.
* For Azure OpenAI (AOAI) service
  * An [Azure subscription](https://azure.microsoft.com/free).
  * An [OpenAI Model from the catalog](https://oai.azure.com/resource/models) deployed through Azure AI Foundry.
  * The endpoint URL of your model, in the form `https://<your-resouce-name>.openai.azure.com/openai/deployments/<your-deployment-name>`, where `your-resource-name` is your globally unique AOAI resource name, and `your-deployment-name` is your AI Model deployment name.
  * Depending on your authentication preference, you either need an API key to authenticate against the service, or Entra ID credentials.
  * An api-version. Latest preview or GA version listed in the `Data plane - inference` row in [the API Specs table](https://aka.ms/azsdk/azure-ai-inference/azure-openai-api-versions). At the time of writing, latest GA version was "2024-06-01".

### Install the package

To install the Azure AI Inferencing package use the following command:

```bash
pip install azure-ai-inference
```

To update an existing installation of the package, use:

```bash
pip install --upgrade azure-ai-inference
```

If you want to install Azure AI Inferencing package with support for OpenTelemetry based tracing, use the following command:

```bash
pip install azure-ai-inference[opentelemetry]
```

## Key concepts

### Create and authenticate a client directly, using API key or GitHub token

The package includes two clients `ChatCompletionsClient` and `EmbeddingsClient`<!-- and `ImageGenerationClients`-->. Both can be created in the similar manner. For example, assuming `endpoint`, `key` and `github_token` are strings holding your endpoint URL, API key or GitHub token, this Python code will create and authenticate a synchronous `ChatCompletionsClient`:

```python
from azure.ai.inf

## Recent releases

| Version | Date | Urgency | Changes |
| --- | --- | --- | --- |
| `azure-mgmt-computelimit_1.1.0` | 2026-06-02 | High | ## 1.1.0 (2026-05-26)  ### Features Added    - Client `ComputeLimitMgmtClient` added operation group `vm_families`   - Added model `FeatureEnableRequest`   - Added model `VmFamily`   - Added model `VmFamilyProperties`   - Operation group `FeaturesOperations` added method `begin_disable`   - Added operation group `VmFamiliesOperations` |
| `azure-appconfiguration-provider_2.5.0` | 2026-05-26 | High | ## 2.5.0 (2026-05-22)  ### Features Added  - Added `refresh_enabled` parameter to the `load` method. Defaults to `True` if `refresh_on` is set. When set to `True` without `refresh_on` keys, all selected key-values are monitored for changes. When set to `False`, calling `refresh` will be a no-op. - Added the ability to monitor all selected key-values for refresh with the `refresh_enabled` kwarg. When this kwarg is set to `True`, and `refresh_on` is not specified, changes to any selected key-value |
| `azure-mgmt-storage_25.0.0` | 2026-05-20 | High | ## 25.0.0 (2026-05-19)  ### Features Added    - Client `StorageManagementClient` added method `send_request`   - Client `StorageManagementClient` added operation group `connectors`   - Client `StorageManagementClient` added operation group `data_shares`   - Enum `AccessTier` added member `SMART`   - Enum `AllowedCopyScope` added member `ALL`   - Enum `TriggerType` added member `MOCK_RUN`   - Model `AzureEntityResource` added property `system_data`   - Model `BlobContainer` added property `system |
| `azure-mgmt-storagesync_1.0.1` | 2026-05-14 | High | ## 1.0.1 (2026-05-14)  ### Other Changes    - Regenerated with latest code generator tool |
| `azure-mgmt-attestation_2.0.0` | 2026-05-08 | High | ## 2.0.0 (2026-05-08)  ### Features Added    - Client `AttestationManagementClient` added parameter `cloud_setting` in method `__init__`   - Client `AttestationManagementClient` added method `send_request`   - Client `AttestationManagementClient` added operation group `private_link_resources`   - Model `AttestationServiceCreationSpecificParams` added property `public_network_access`   - Model `AttestationServiceCreationSpecificParams` added property `tpm_attestation_authentication`   - Model `At |
| `azure-batch_15.1.0` | 2026-05-01 | High | ## 15.1.0 (2026-03-06)  ### Other Changes  - This is the GA release of the features introduced in the 15.0.0 and 15.1.0 beta versions, including LRO support, job-level FIFO scheduling, CMK support on pools, IPv6 support, metadata security protocol support, IP tag support, and confidential VM enhancements.  ### Breaking Changes  - Renamed `BatchNodeUserUpdateOptions` to `BatchNodeUserReplaceOptions`. - Renamed `OutputFileUploadConfig` to `OutputFileUploadConfiguration`.  - Removed Models:   - Rem |
| `azure-postgresql-auth_1.0.2` | 2026-04-29 | High | ## 1.0.2 (2026-04-28)  ### Bugs Fixed  - Removed dependency on `DefaultAzureCredential` in source library - Fixed `get_entra_conninfo_async` and `get_entra_token_async` closing the credential by using it as a context manager  ### Other Changes  - Bumped minimum dependency on `azure-core` to `>=1.31.0` |
| `azure-mgmt-hybridkubernetes_1.2.0` | 2026-04-23 | High | ## 1.2.0 (2026-04-23)  ### Other Changes    - Regenerate SDK code with latest code generator tool |
| `azure-template_0.1.0b6187637` | 2026-04-21 | High | ## 0.1.0b6187637 (2026-04-21)  ### Features Added  - Some feature  ### Breaking Changes  - Some breaking change  ### Bugs Fixed  - Some bug fix  ### Other Changes  - Some other change |
| `1.0.0b9` | 2026-04-21 | Low | Imported from PyPI (1.0.0b9) |

## Dependency audit

- **Score**: 98/100
- **Total deps**: 0
- **Resolved**: 0
- **Unresolved**: 0
- **License conflicts**: 0
- **Warnings**: 1
- **Scanned**: 2026-05-25

## Citation

- HTML: https://www.freshcrate.ai/projects/azure-ai-inference
- Markdown: https://www.freshcrate.ai/projects/azure-ai-inference.md
- Dependencies JSON: https://www.freshcrate.ai/api/projects/azure-ai-inference/deps

_Generated by freshcrate.ai. Indexes pypi releases for AI-agent ecosystem packages._