Lock down your Azure OpenAI to private network
Introduction
In the previous entry, we successfully combined Azure API Management (APIM) and Azure Front Door to make Azure OpenAI Service (AOAI) resources load-balanced and redundant in an easy-to-use and manageable way.
In this entry we take another step forward and learn how to use AOAI more securely by locking it down within a private network.
High Level Architecture
The high level architecture to be constructed in this article is as follows.
- The user sends a request to the APIM endpoint.
- APIM authenticates with Azure AD and uses that authentication token to communicate with the backend.
- This time, we lock down the AOAI resources in the Azure Virtual Network (Vnet) and connect to them via private endpoints.
- We use Azure Application Gateway in Vnet to load balance AOAI resources.
- We integrate APIM into Vnet and send requests to its external endpoint, but the communication behind that takes place within Vnet.
Prerequisites
We assume that the infrastructure described in the previous entry has been constructed. In particular, it is important for this entry that the AOAI resources are created in each region and specified models are deployed as follows to setup the Application Gateway.
- my-endpoint-canada (Canada East): gpt-35-turbo, text-embedding-ada-002
- my-endpoint-europe (West Europe): gpt-35-turbo, text-embedding-ada-002
- my-endpoint-france (France Central): gpt-35-turbo, text-embedding-ada-002
- my-endpoint-australia (Australia East): gpt-35-turbo. gpt-35-turbo-16k
- my-endpoint-japan (Japan East): gpt-35-turbo. gpt-35-turbo-16k
- my-endpoint-us2 (East US 2): gpt-35-turbo. gpt-35-turbo-16k
Setup Vnet
The first step is to create a Vnet to be used as a private network. Go to Azure portal, enter “vnet” and select “Virtual networks”.
The region of the Vnet must be the same as the APIM created in the former entry. Name the Vnet and create it with all other settings as default.
Navigate to the created resource and select “Subnets” under “Settings”.
Press “+ Subnet” to create subnets.
When Vnet is created, the IPv4 address space is 10.0.0.0/16
by default. Therefore, subnet addresses should be created within this range.
This time we will create three new subnets:
- AppGW-Subnet: subnet for Application Gateway (
10.0.1.0/24
) - PE-Subnet: subnet for private endpoints (
10.0.2.0/24
) - APIM-Subnet: subnet for APIM (
10.0.3.0/24
)
Please refer to each image to create your own. All settings except for the address space may be left as is.
The created subnets look like this.
Setup Private Endpoints
Private endpoint is a pathway to connect to Azure resources via a private network. By enabling this on the AOAI resources, you can connect to AOAI privately from Vnet.
First, go to the AOAI resource “my-endpoint-canada” you already have, select “Networking”, switch to the “Private endpoint connections” tab, and click “+ Private endpoint”.
When you name a private endpoint, the name of the network interface is automatically populated as well.
This is significantly important, but the region should be selected where the Vnet is located. It may seem counterintuitive, but private endpoints can be created independent of the region where the AOAI resource is located. The important thing is that nodes in the Vnet can connect to this endpoint, so the private endpoint must be created in a region of the Vnet.
The sub-resource is automatically selected on the next screen.
Next, select the Vnet we just created, and select “PE-Subnet” for the subnet.
Check “Yes” for “Integrate with private DNS zone”.
This is the DNS record setting for this resource within Vnet. This will be explained in detail in the later step.
Complete the creation of a private endpoint if the validation passes.
Do the same for all AOAI resources listed in the “Prerequisites” section at the beginning. I’ll show you one more example of “my-endpoint-europe” for reference. Again, the region should be chosen where the Vnet is located. All other settings are the same as in the previous example.
Once you have finished creating all your private endpoints, you can check the information such as Private IP from the “Private endpoints” in Azure portal (you won’t use these private IPs directly very often because of the Private DNS zone that resolves these hostnames).
Link Private DNS zone and Vnet
When a private endpoint is created, a Private DNS zone is created with it. This is to name-resolve the hostname of the resource to which the private endpoint is connected to the private IP within the Vnet.
For example, in the case of my-endpoint-canada.openai.azure.com
, the hostname is resolved to the global IP over the Internet, but from within the Vnet, the same hostname will be resolved to the private IP.
Here we need to link the Private DNS zone and Vnet to enable this feature. Go to Azure portal and open “Private DNS zones”.
Select “privatelink.openai.azure.com” zone.
These are the DNS records of the zone.
Select “Virtual network links” under “Settings” to link this to the Vnet.
Select “+ Add”.
Select the Vnet to which this Private DNS zone will be attached and enable “auto registration”.
When the Link status becomes “Completed”, the process is complete. After this, *.openai.azure.com
will be resolved to Private IPs within the Vnet.
Lock down AOAI resources within Vnet
Now that we have created private endpoints on all AOAI resources and linked the private DNS zone to the Vnet, it is no longer necessary for AOAI to be accessible from all networks. Select “Disabled” in the “Firewalls and virtual networks” tab under Networking in the AOAI resource. Make sure to “Save”.
With this setup, there is no way to access AOAI resources except through private endpoints. Do the same all other AOAI resources.
Now all the AOAI resources are locked down from outside. Let’s try to call the AOAI API directly from our laptop.
curl "https://my-endpoint-canada.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'
{"error":{"code":"AccessDenied","message": "Public access is disabled. Please configure private endpoint."}}
As intended, an “AccessDenied” error was returned when we tried to call the API directly. This is a good thing!
Accessing AOAI resources over APIM
As described in the high-level architecture, in the following steps we will deploy a load balancer called Application Gateway in the Vnet and access the AOAI resources through it. Before doing so, let’s see if we can call the AOAI API via APIM with Vnet integration.
Note: Only “Developer” and “Premium” tier can do this. See “API Management pricing” page for details.
Access APIM, select “Virtual Network” from the “Network” menu, then click “External”. This means that APIM is integrated into the Vnet and has access to the private network including private endpoints, but APIM itself remains accessible from the outside.
Next, select “OpenAIVnet” and “APIM-Subnet”.
Again, don’t forget to save the changes.
Note: This may take up to 45 minutes if you are using the Developer tier.
The APIM dashboard is also unavailable during this time.
Once the Vnet integration is complete, we can try and update the policy as follows. As you can see, we just set <set-backend-service base-url>
to one of the AOAI resources.
<policies>
<inbound>
<base />
<set-backend-service base-url="https://my-endpoint-canada.openai.azure.com/" />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 300)" count="5" interval="1" max-interval="10" delta="1">
<forward-request buffer-request-body="true" buffer-response="false" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
Now let’s make a request to the APIM endpoint.
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'
Do you get a response back from the AOAI resource in Vnet this time as expected? Congratulations!
Update Network security group for APIM
Maybe after integrating APIM into Vnet, you may not be able to access the APIM dashboard with the following error.
In that case, you may have to update your Network security group according to “Common network configuration issues” and “Virtual network configuration reference: API Management”.
Load-balancing AOAI resources in Vnet using Application Gateway
Application Gateway is Azure’s L7 (HTTP/HTTPS) high-performance load balancer that supports path-based and URL-based request forwarding to the backend pools. It also supports cookie-based session affinity. Most importantly, Application Gateway is a regional load balancer and can be integrated into the Vnet.
Private Application Gateway deployment
Application Gateway has v1 and v2 SKUs at the time of writing. The v1 has the option of operating a load balancer with only Private IPs, but v2 does not. The safe choice would be to use v1, but v1 has already been announced as deprecated. Fortunately, the private deployment of Application Gateway in v2 is currently available as a preview. For future benefit, here I will explain how to use v2 preview.
Note: Preview is not suitable for production use
First, select “Preview features” from the Azure portal.
Next, type “EnableApplicationGatewayNetworkIsolation” in the search box, check the menu that comes up, and press “+ Register”.
Once it is “Registered”, it is done, but there seems to be a bit of a time lag before it is actually available.
Setup Application Gateway
Now we are ready to setup Application Gateway. Open “Application Gateways” from Azure portal.
As already explained, choose v2 for Tier. Then in the network settings select our Vnet and “AppGW-Subnet”. You can change other settings such as the instance count later, so don’t worry too much about other parts.
In the “Frontends” tab, select “Private” as the IP address type and enter any address from the “AppGW-Subnet” range such as 10.0.1.100
for the actual address.
In the “Backends” tab, create a backend pool from “Add a backend pool”.
First is the “default-pool” to which traffic is forwarded by default. Add all AOAI resources here. Target should contain the exact hostname of the AOAI resource, such as “my-endpoint-us1.openai.azure.com”. It is incorrect to put a private address here.
The “default-pool” has been added. As in the previous entry, we will also create backend pools for the “text-embedding-ada-002” and “gpt-35-turbo-16k” models.
This is the backend pools for the “text-embedding-ada-002”.
And this is the backend pools for the “gpt-35-turbo-16k”.
Now that all backend pools have been added, let’s proceed.
We were able to set up Frontends and Backend pools. Next, we need to set up “Routing rules” to mediate between them.
There are a few things to explain here.
- Set the rule name and priority. We only have one rule this time, so
1
is fine for the priority. - Next, name the listener. Then select “Private” for the Frontend IP. In this case, since we will be connecting to this load balancer in Vnet,
HTTP, 80
is fine for protocol and port. - Next, switch to the “Backend targets” tab.
In the “Backend targets” tab, click “Add new” to create a new “Backend settings”.
Again, there is much to explain here.
- Since the connection to the backend AOAI is made via HTTPS,
HTTPS, 443
should be selected for the protocol and port. - The Host name setting is extremely important. Application Gateway forwards the same HTTP host header from the request to the backend by default. However, AOAI will return an HTTP 404 if the host header in the request does not match the requesting host. This is not ideal when using a load balancer. Therefore, we must select “Yes” here to override the host header.
- The last setting is also related to the HTTP host header. We need to select “Pick host name from backend target” and overwrite the host header with the target name we set in the backend pool.
We are back to the previous screen.
Here, select “Backend pool” for Target type and choose the “default-pool” for Backend target. Next, select “Add multiple targets to create a path-based rule” at the bottom of the screen to configure paths and pools for the “text-embedding-ada-002” and “gpt-35-turbo-16k” models.
Here we only forward requests to the “pool-text-embedding-ada-002” pool when it matches the /openai/deployments/text-embedding-ada-002/embeddings
path.
Likewise, we only forward requests to the “pool-gpt-35-turbo-16k” pool when it matches the /openai/deployments/gpt-35-turbo-16k/chat/completions
path.
Now that the basic configuration is complete, press “Add” to finish creating the routing rule.
We are now back at the screen for creating an Application Gateway. Let’s proceed.
Once all validations have passed, press “Create” to complete the creation.
There is one more step left. Click on “Health probes” in the Application Gateway we just created.
The Application Gateway checks if the backend nodes are healthy with using Health probes. Here are a few things to check.
- As already explained, host names need to be picked up from the backend.
- Since AOAI does not have an endpoint for health probe, simply input
/
for the path. - Add
HTTP 404
to the HTTP status codes that are considered healthy. This looks a little tricky, but since AOAI has no health check mechanism out of the box, we will assume that the node is healthy by accessing/
path and recivesHTTP 404
.
Finally, press “Test” to confirm.
Thus, all pools are found to be healthy.
Finally, all Application Gateway settings are complete! Take a note of the Private IP of the Application Gateway to connect to it from APIM.
Connect APIM to Application Gateway
Configure the Application Gateway IP as the APIM backend.
From “APIs” section, select the API we are using, select “+ Add operation” and click </>
in Inbound processing.
Please refer to the following to update the policy. Dont’ worry. There is only one change. Change <set-backend-service base-url>
to the Application Gateway endpoint, http://10.0.1.100/
.
<policies>
<inbound>
<base />
<set-backend-service base-url="http://10.0.1.100/" />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 300)" count="5" interval="1" max-interval="10" delta="1">
<forward-request buffer-request-body="true" buffer-response="false" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>
It is the time of destiny! Send a request to the respective APIs via APIM.
The responses came back as expected, didn’t they? We did it!
# Chat Completions via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'
# Chat Completions via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo-16k/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'
# Embeddings via APIM + Application Gateway in Vnet
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"input": "Sample Document goes here"}'
Also, please test each API with “Trace” in APIM’s testing feature.
- When requesting the “gpt-35-turbo” model, we can see that the response is evenly returned from one of the six nodes in the default pool.
- When requesting the “gpt-35-turbo-16k” model, we can see that the response is evenly returned from one of the three nodes in the “gpt-35-turbo-16k” pool.
- When requesting the “text-embedding-ada-002” model, we can see that the response is evenly returned from one of the three nodes in the “text-embedding-ada-002” pool.
Was everything as expected? Congratulations!
Conclusion
We have done the following this time:
- Created a private Vnet and locked down AOAI resources there using private endpoints.
- Application Gateway was created in Vnet and configured to load balance to the internal AOAI resources.
- APIM was integrated into Vnet and it was able to connect to the internal endpoint of Application Gateway.
This way, we were able to lock down all but the external endpoint of APIM to a secure private Vnet compared to the approach in the previous entry.
This approach can be developed into even more secure applications, for example, by setting the APIM endpoint to “Internal” to restrict access only to applications deployed in the same Vnet.
There is no greater happiness for me than it if this entry gives you some ideas for using AOAI in a more secure and robust manner. Happy hacking!