Lock down your Azure OpenAI to private network

Fumihiko Shiroyama
15 min readSep 8, 2023

Introduction

In the previous entry, we successfully combined Azure API Management (APIM) and Azure Front Door to make Azure OpenAI Service (AOAI) resources load-balanced and redundant in an easy-to-use and manageable way.

In this entry we take another step forward and learn how to use AOAI more securely by locking it down within a private network.

High Level Architecture

The high level architecture to be constructed in this article is as follows.

High level architecture
  1. The user sends a request to the APIM endpoint.
  2. APIM authenticates with Azure AD and uses that authentication token to communicate with the backend.
  3. This time, we lock down the AOAI resources in the Azure Virtual Network (Vnet) and connect to them via private endpoints.
  4. We use Azure Application Gateway in Vnet to load balance AOAI resources.
  5. We integrate APIM into Vnet and send requests to its external endpoint, but the communication behind that takes place within Vnet.

Prerequisites

We assume that the infrastructure described in the previous entry has been constructed. In particular, it is important for this entry that the AOAI resources are created in each region and specified models are deployed as follows to setup the Application Gateway.

  • my-endpoint-canada (Canada East): gpt-35-turbo, text-embedding-ada-002
  • my-endpoint-europe (West Europe): gpt-35-turbo, text-embedding-ada-002
  • my-endpoint-france (France Central): gpt-35-turbo, text-embedding-ada-002
  • my-endpoint-australia (Australia East): gpt-35-turbo. gpt-35-turbo-16k
  • my-endpoint-japan (Japan East): gpt-35-turbo. gpt-35-turbo-16k
  • my-endpoint-us2 (East US 2): gpt-35-turbo. gpt-35-turbo-16k

Setup Vnet

The first step is to create a Vnet to be used as a private network. Go to Azure portal, enter “vnet” and select “Virtual networks”.

Virtual networks

The region of the Vnet must be the same as the APIM created in the former entry. Name the Vnet and create it with all other settings as default.

Create virtual network

Navigate to the created resource and select “Subnets” under “Settings”.

Subnets

Press “+ Subnet” to create subnets.
When Vnet is created, the IPv4 address space is 10.0.0.0/16 by default. Therefore, subnet addresses should be created within this range.

Add subnets

This time we will create three new subnets:

  • AppGW-Subnet: subnet for Application Gateway (10.0.1.0/24)
  • PE-Subnet: subnet for private endpoints (10.0.2.0/24)
  • APIM-Subnet: subnet for APIM (10.0.3.0/24)

Please refer to each image to create your own. All settings except for the address space may be left as is.

AppGW-Subnet
PE-Subnet
APIM-Subnet

The created subnets look like this.

Created subnets

Setup Private Endpoints

Private endpoint is a pathway to connect to Azure resources via a private network. By enabling this on the AOAI resources, you can connect to AOAI privately from Vnet.

First, go to the AOAI resource “my-endpoint-canada” you already have, select “Networking”, switch to the “Private endpoint connections” tab, and click “+ Private endpoint”.

Private endpoint

When you name a private endpoint, the name of the network interface is automatically populated as well.
This is significantly important, but the region should be selected where the Vnet is located. It may seem counterintuitive, but private endpoints can be created independent of the region where the AOAI resource is located. The important thing is that nodes in the Vnet can connect to this endpoint, so the private endpoint must be created in a region of the Vnet.

Create a private endpoint

The sub-resource is automatically selected on the next screen.

Create a private endpoint

Next, select the Vnet we just created, and select “PE-Subnet” for the subnet.

Create a private endpoint

Check “Yes” for “Integrate with private DNS zone”.
This is the DNS record setting for this resource within Vnet. This will be explained in detail in the later step.

Create a private endpoint

Complete the creation of a private endpoint if the validation passes.

Create a private endpoint

Do the same for all AOAI resources listed in the “Prerequisites” section at the beginning. I’ll show you one more example of “my-endpoint-europe” for reference. Again, the region should be chosen where the Vnet is located. All other settings are the same as in the previous example.

Create a private endpoint

Once you have finished creating all your private endpoints, you can check the information such as Private IP from the “Private endpoints” in Azure portal (you won’t use these private IPs directly very often because of the Private DNS zone that resolves these hostnames).

private endpoints

Link Private DNS zone and Vnet

When a private endpoint is created, a Private DNS zone is created with it. This is to name-resolve the hostname of the resource to which the private endpoint is connected to the private IP within the Vnet.
For example, in the case of my-endpoint-canada.openai.azure.com, the hostname is resolved to the global IP over the Internet, but from within the Vnet, the same hostname will be resolved to the private IP.

Here we need to link the Private DNS zone and Vnet to enable this feature. Go to Azure portal and open “Private DNS zones”.

Private DNS zones

Select “privatelink.openai.azure.com” zone.

privatelink.openai.azure.com

These are the DNS records of the zone.

Private DNS zone

Select “Virtual network links” under “Settings” to link this to the Vnet.

Virtual network links

Select “+ Add”.

Add a link

Select the Vnet to which this Private DNS zone will be attached and enable “auto registration”.

Add virtual network link

When the Link status becomes “Completed”, the process is complete. After this, *.openai.azure.com will be resolved to Private IPs within the Vnet.

Link status Completed

Lock down AOAI resources within Vnet

Now that we have created private endpoints on all AOAI resources and linked the private DNS zone to the Vnet, it is no longer necessary for AOAI to be accessible from all networks. Select “Disabled” in the “Firewalls and virtual networks” tab under Networking in the AOAI resource. Make sure to “Save”.

With this setup, there is no way to access AOAI resources except through private endpoints. Do the same all other AOAI resources.

Disable network accesses except from Private Endpoints

Now all the AOAI resources are locked down from outside. Let’s try to call the AOAI API directly from our laptop.

curl "https://my-endpoint-canada.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'

{"error":{"code":"AccessDenied","message": "Public access is disabled. Please configure private endpoint."}}

As intended, an “AccessDenied” error was returned when we tried to call the API directly. This is a good thing!

Accessing AOAI resources over APIM

As described in the high-level architecture, in the following steps we will deploy a load balancer called Application Gateway in the Vnet and access the AOAI resources through it. Before doing so, let’s see if we can call the AOAI API via APIM with Vnet integration.

Note: Only “Developer” and “Premium” tier can do this. See “API Management pricing” page for details.

Access APIM, select “Virtual Network” from the “Network” menu, then click “External”. This means that APIM is integrated into the Vnet and has access to the private network including private endpoints, but APIM itself remains accessible from the outside.

Vnet integration

Next, select “OpenAIVnet” and “APIM-Subnet”.

Select virtual network

Again, don’t forget to save the changes.

Note: This may take up to 45 minutes if you are using the Developer tier.

Don’t forget to save

The APIM dashboard is also unavailable during this time.

Service is being updated…

Once the Vnet integration is complete, we can try and update the policy as follows. As you can see, we just set <set-backend-service base-url> to one of the AOAI resources.

<policies>
<inbound>
<base />
<set-backend-service base-url="https://my-endpoint-canada.openai.azure.com/" />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 300)" count="5" interval="1" max-interval="10" delta="1">
<forward-request buffer-request-body="true" buffer-response="false" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

Now let’s make a request to the APIM endpoint.

curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'

Do you get a response back from the AOAI resource in Vnet this time as expected? Congratulations!

Update Network security group for APIM

Maybe after integrating APIM into Vnet, you may not be able to access the APIM dashboard with the following error.

In that case, you may have to update your Network security group according to “Common network configuration issues” and “Virtual network configuration reference: API Management”.

Add inbound security rule
Add inbound security rule

Load-balancing AOAI resources in Vnet using Application Gateway

Application Gateway is Azure’s L7 (HTTP/HTTPS) high-performance load balancer that supports path-based and URL-based request forwarding to the backend pools. It also supports cookie-based session affinity. Most importantly, Application Gateway is a regional load balancer and can be integrated into the Vnet.

Private Application Gateway deployment

Application Gateway has v1 and v2 SKUs at the time of writing. The v1 has the option of operating a load balancer with only Private IPs, but v2 does not. The safe choice would be to use v1, but v1 has already been announced as deprecated. Fortunately, the private deployment of Application Gateway in v2 is currently available as a preview. For future benefit, here I will explain how to use v2 preview.

Note: Preview is not suitable for production use

First, select “Preview features” from the Azure portal.

Preview features

Next, type “EnableApplicationGatewayNetworkIsolation” in the search box, check the menu that comes up, and press “+ Register”.

Once it is “Registered”, it is done, but there seems to be a bit of a time lag before it is actually available.

Setup Application Gateway

Now we are ready to setup Application Gateway. Open “Application Gateways” from Azure portal.

As already explained, choose v2 for Tier. Then in the network settings select our Vnet and “AppGW-Subnet”. You can change other settings such as the instance count later, so don’t worry too much about other parts.

Create application gateway

In the “Frontends” tab, select “Private” as the IP address type and enter any address from the “AppGW-Subnet” range such as 10.0.1.100 for the actual address.

Create application gateway

In the “Backends” tab, create a backend pool from “Add a backend pool”.

Create application gateway

First is the “default-pool” to which traffic is forwarded by default. Add all AOAI resources here. Target should contain the exact hostname of the AOAI resource, such as “my-endpoint-us1.openai.azure.com”. It is incorrect to put a private address here.

Add a backend pool

The “default-pool” has been added. As in the previous entry, we will also create backend pools for the “text-embedding-ada-002” and “gpt-35-turbo-16k” models.

Create application gateway

This is the backend pools for the “text-embedding-ada-002”.

backend pools for the “text-embedding-ada-002”

And this is the backend pools for the “gpt-35-turbo-16k”.

backend pools for the “gpt-35-turbo-16k”

Now that all backend pools have been added, let’s proceed.

Create application gateway

We were able to set up Frontends and Backend pools. Next, we need to set up “Routing rules” to mediate between them.

Routing rules

There are a few things to explain here.

  1. Set the rule name and priority. We only have one rule this time, so 1 is fine for the priority.
  2. Next, name the listener. Then select “Private” for the Frontend IP. In this case, since we will be connecting to this load balancer in Vnet, HTTP, 80 is fine for protocol and port.
  3. Next, switch to the “Backend targets” tab.
Add a routing rule

In the “Backend targets” tab, click “Add new” to create a new “Backend settings”.

Backend targets

Again, there is much to explain here.

  1. Since the connection to the backend AOAI is made via HTTPS, HTTPS, 443 should be selected for the protocol and port.
  2. The Host name setting is extremely important. Application Gateway forwards the same HTTP host header from the request to the backend by default. However, AOAI will return an HTTP 404 if the host header in the request does not match the requesting host. This is not ideal when using a load balancer. Therefore, we must select “Yes” here to override the host header.
  3. The last setting is also related to the HTTP host header. We need to select “Pick host name from backend target” and overwrite the host header with the target name we set in the backend pool.
Add Backend setting

We are back to the previous screen.
Here, select “Backend pool” for Target type and choose the “default-pool” for Backend target. Next, select “Add multiple targets to create a path-based rule” at the bottom of the screen to configure paths and pools for the “text-embedding-ada-002” and “gpt-35-turbo-16k” models.

Add a routing rule

Here we only forward requests to the “pool-text-embedding-ada-002” pool when it matches the /openai/deployments/text-embedding-ada-002/embeddings path.

Add a path for “text-embedding-ada-002”

Likewise, we only forward requests to the “pool-gpt-35-turbo-16k” pool when it matches the /openai/deployments/gpt-35-turbo-16k/chat/completions path.

Add a path for “gpt-35-turbo-16k”

Now that the basic configuration is complete, press “Add” to finish creating the routing rule.

Add a routing rule

We are now back at the screen for creating an Application Gateway. Let’s proceed.

Create application gateway

Once all validations have passed, press “Create” to complete the creation.

Finish creating application gateway

There is one more step left. Click on “Health probes” in the Application Gateway we just created.

Health probes

The Application Gateway checks if the backend nodes are healthy with using Health probes. Here are a few things to check.

  • As already explained, host names need to be picked up from the backend.
  • Since AOAI does not have an endpoint for health probe, simply input / for the path.
  • Add HTTP 404 to the HTTP status codes that are considered healthy. This looks a little tricky, but since AOAI has no health check mechanism out of the box, we will assume that the node is healthy by accessing / path and recives HTTP 404.

Finally, press “Test” to confirm.

Health probes

Thus, all pools are found to be healthy.

Health probes

Finally, all Application Gateway settings are complete! Take a note of the Private IP of the Application Gateway to connect to it from APIM.

Front private IP address

Connect APIM to Application Gateway

Configure the Application Gateway IP as the APIM backend.
From “APIs” section, select the API we are using, select “+ Add operation” and click </> in Inbound processing.

Update policy

Please refer to the following to update the policy. Dont’ worry. There is only one change. Change <set-backend-service base-url> to the Application Gateway endpoint, http://10.0.1.100/.

<policies>
<inbound>
<base />
<set-backend-service base-url="http://10.0.1.100/" />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 300)" count="5" interval="1" max-interval="10" delta="1">
<forward-request buffer-request-body="true" buffer-response="false" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

It is the time of destiny! Send a request to the respective APIs via APIM.
The responses came back as expected, didn’t they? We did it!

# Chat Completions via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'

# Chat Completions via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo-16k/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'

# Embeddings via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"input": "Sample Document goes here"}'

Also, please test each API with “Trace” in APIM’s testing feature.

  1. When requesting the “gpt-35-turbo” model, we can see that the response is evenly returned from one of the six nodes in the default pool.
  2. When requesting the “gpt-35-turbo-16k” model, we can see that the response is evenly returned from one of the three nodes in the “gpt-35-turbo-16k” pool.
  3. When requesting the “text-embedding-ada-002” model, we can see that the response is evenly returned from one of the three nodes in the “text-embedding-ada-002” pool.

Was everything as expected? Congratulations!

Conclusion

We have done the following this time:

  • Created a private Vnet and locked down AOAI resources there using private endpoints.
  • Application Gateway was created in Vnet and configured to load balance to the internal AOAI resources.
  • APIM was integrated into Vnet and it was able to connect to the internal endpoint of Application Gateway.

This way, we were able to lock down all but the external endpoint of APIM to a secure private Vnet compared to the approach in the previous entry.

This approach can be developed into even more secure applications, for example, by setting the APIM endpoint to “Internal” to restrict access only to applications deployed in the same Vnet.

Internal endpoint of APIM

There is no greater happiness for me than it if this entry gives you some ideas for using AOAI in a more secure and robust manner. Happy hacking!

--

--