Utilize API Management to make Azure OpenAI load-balanced and redundant

Fumihiko Shiroyama
23 min readSep 6, 2023

--

Have you ever experienced a failure in Azure OpenAI Service (AOAI) because a provisioned AOAI resource is out of service or the API is unavailable due to Rate Limit? Fortunately, Azure allows multiple AOAI resources to be created in multiple regions, which can be used to load balance traffic and automatically remove non-functioning resources from the back end to provide redundancy. But how?
This entry proposes several approaches for load balancing and redundancy of AOAI resources and provides step-by-step procedures.

Prerequisites

The prerequisites for reading this entry is as follows:

  • You have already created the AOAI resource named “my-endpoint-us1” in the eastus region and its endpoint is https://my-endpoint-us1.openai.azure.com/.
  • You have already deployed the gpt-35-turbo model with the same name gpt-35-turbo.
  • You have already deployed the text-embedding-ada-002 model with the same name text-embedding-ada-002.
  • You already have one copy of a valid API Key.
  • You have already set up the HTTP client of your choice such as curl command or a Postman.

Now you can call Chat Completions API and Embeddings API respectively.

YOUR_RESOURCE_NAME=my-endpoint-us1
API_KEY=[Your API Key here]

# Chat Completions API
curl "https://${YOUR_RESOURCE_NAME}.openai.azure.com/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{
"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]
}'

# Embeddings API
curl "https://${YOUR_RESOURCE_NAME}.openai.azure.com/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{"input": "Sample Document goes here"}'

And this is what you get:

{
"id": "chatcmpl-7vE2Ql47wYhjtOyntgx0NbdgZsbEV",
"object": "chat.completion",
"created": 1693873014,
"model": "gpt-35-turbo",
"choices": [
{
"index": 0,
"finish_reason": "stop",
"message": {
"role": "assistant",
"content": "Azure OpenAI Service is a cloud-based service provided by Microsoft Azure that allows users to access OpenAI models and technologies to enhance the capabilities of their applications. With this service, developers can integrate powerful AI capabilities such as text generation, language translation, natural language understanding, and more into their own software systems.\n\nAzure OpenAI Service primarily leverages OpenAI's GPT-3 (Generative Pre-trained Transformer 3) model, which is one of the most advanced and versatile language models available. This model has been trained on a large corpus of text from the internet to understand and generate human-like text responses.\n\nBy using Azure OpenAI Service, developers can easily build applications that can generate coherent and contextually relevant text, enable multilingual translations, create conversational agents, assist with writing code and documenting software, enhance chatbots, provide content summaries, and more. The service provides an API-based interface, allowing developers to make requests and receive responses from the OpenAI models.\n\nAzure OpenAI Service is designed to be scalable and reliable, utilizing the infrastructure and resources of Microsoft Azure. It offers flexible pricing options based on the number of tokens consumed during API calls, with different plans available to accommodate various usage scenarios. This enables developers to control costs while benefiting from the power and versatility of OpenAI models.\n\nOverall, Azure OpenAI Service empowers developers to easily leverage state-of-the-art AI models in their applications, opening up possibilities for enhanced natural language processing and generation capabilities to deliver richer and more advanced user experiences."
}
}
],
"usage": {
"completion_tokens": 302,
"prompt_tokens": 15,
"total_tokens": 317
}
}
{
"object": "list",
"data": [
{
"object": "embedding",
"index": 0,
"embedding": [
-0.0017071267, -0.013916413, 0.0017036214, -0.018410329, -0.0071545085, 0.018522501, -0.00961179, -0.029164877, -0.0061309338, -0.014722654, 0.011820188, 0.00716853, 0.0014004048, -0.011764102, -0.0071510035, 0.013047076, 0.046860103, 0.0009350638, 0.017022192, -0.028716186, 0.010663408, 0.0072771977, -0.0032372312, 0.0015327334, -0.030174429, -0.014834827, 0.001798267, -0.037045002, -0.0031075315, 0.0028744228, 0.018915106, -0.028884444, -0.026739143, -0.029865954, -0.023373965, -0.008384902, -0.0039330516, -0.013965488, 0.017877508, -0.000069286296, 0.0104320515, 0.013341528, 0.005492952, -0.014554395, -0.0032372312, -0.008483053, 0.012044533, -0.028365646, -0.022350391, 0.017162409, 0.010915796, 0.0070142928, -0.012570342, -0.010298847, -0.012689525, 0.0009876447, -0.0086653335, 0.0025414105, 0.013783208, -0.014540373, -0.02634654, -0.0071790465, -0.01246518, 0.024103088, 0.011427584, -0.007368338, 0.0056471894, 0.0128087085, 0.008693377, 0.0066111726, 0.023598311, 0.013222345, -0.0012067318, 0.011357476, 0.029417265, -0.0063973437, -0.006018761, -0.012345997, -0.014554395, 0.013881359, 0.025659483, -0.01370609, -0.020163026, 0.0029462834, 0.016152855, 0.00846202, 0.00035798835, 0.019363794, -0.006961712, 0.0010305857, -0.00075015426, 0.013895381, 0.0061554713, 0.007291219, -0.01869076, -0.006148461, 0.0045009255, 0.03471742, -0.019644227, -0.032642227, -0.0057488456, -0.008735442, 0.014792762, -0.018802932, -0.009191142, 0.0049285837, -0.0022416993, -0.0013145227, 0.02403298, 0.00703182, -0.03426873, 0.035867188, -0.0070107877, -0.02278506, 0.0071334764, 0.001544126, -0.0065585915, -0.023696462, 0.0013504529, -0.0012277642, 0.028407712, 0.0011287368, 0.02741218, -0.017540991, 0.012661482, 0.012009479, -0.012998, -0.029893998, -0.00095872517, 0.011182206, 0.031464413, 0.010123577, 0.022925274, -0.0014345824, -0.022013873, 0.008693377, -0.032473966, 0.024117108, -0.039568886, -0.015507862, 0.0028411217, 0.032277666, -0.011820188, -0.0065936456, -0.011715026, 0.015283517, 0.03250201, 0.005780394, -0.0006116912, -0.023738526, 0.008560171, -0.0104320515, 0.011848231, 0.008980819, -0.008504085, 0.019966723, 0.004434323, -0.01023575, -0.015535905, -0.001931472, 0.007557629, 0.0122268135, 0.0038349007, -0.019896615, 0.004413291, 0.029978128, 0.0003211817, 0.008686366, -0.005412328, -0.018662717, -0.01255632, 0.021382902, -0.040101703, 0.027131747, 0.01032689, 0.017540991, -0.0074734995, 0.013853316, -0.009962329, -0.008861636, -0.016222963, 0.019644227, 0.03401634, 0.02849184, -0.000039271363, -0.018059788, 0.009871189, -0.009015873, 0.0042870967, -0.025631439, 0.019672269, 0.016363177, 0.0046271197, -0.003573749, -0.6932267, 0.009674887, 0.0070423363, -0.0035807597, 0.03233375, 0.03802651, 0.04887921, 0.010908785, -0.03213745, 0.00401017, 0.0051774667, 0.014070651, -0.010123577, -0.014386136, -0.0074805105, 0.006295687, 0.011602853, -0.023275815, 0.0005849626, 0.026907403, -0.01451233, 0.014568416, 0.008532128, 0.012668493, -0.003302081, -0.00014021575, -0.0022750006, -0.01388837, 0.0036596311, -0.01068444, -0.023906786, 0.010838677, 0.014259942, 0.006285171, 0.03410047, -0.015718186, -0.017737292, 0.0071510035, 0.00009037343, 0.03595132, -0.009071959, -0.007936211, 0.03098768, 0.018844998, 0.0029620577, 0.013243377, 0.006292182, -0.00002381203, -0.0018753856, -0.0043747313, 0.0012137425, -0.0011480164, -0.012563331, 0.0055560493, -0.004585055, -0.012065565, 0.013790219, -0.010656397, -0.0006936298, 0.005202004, -0.008090449, -0.0011786886, -0.008412945, -0.008833592, -0.041447777, 0.017723272, -0.017512947, 0.025799697, 0.004111827, -0.004879508, -0.0037262335, -0.003196919, -0.016419264, -0.010642376, 0.017611098, 0.03115594, 0.021831593, -0.0077959956, -0.0032968228, 0.0051984987, -0.0008263966, 0.0005091584, -0.008735442, 0.0059802015, 0.031632673, -0.0048023895, -0.0074805105, -0.007326273, 0.015956553, 0.014175813, 0.025519267, 0.017288603, 0.0038559332, -0.0035492112, 0.016054703, 0.0044027744, 0.0013697327, 0.0058785453, 0.025042534, -0.011462637, 0.00072605466, -0.00062965637, 0.001141882, -0.004655163, -0.0069301636, 0.012086597, 0.006688291, 0.028968574, 0.024944382, -0.008917722, -0.019069342, -0.015367647, -0.0059802015, -0.008966797, 0.00068223727, -0.040073663, 0.0063167196, 0.014498308, 0.014358093, -0.019335752, 0.013348539, 0.007760942, 0.020149004, 0.002376657, -0.0021014835, 0.009212174, -0.013033054, -0.017050235, -0.01682589, -0.02083606, 0.0061169122, -0.013727122, 0.010165642, -0.031464413, 0.009338369, -0.004395764, 0.024902318, -0.021873657, 0.0134046255, -0.026234366, -0.0037542768, -0.0035895233, 0.009289294, 0.012542299, 0.011238293, -0.020723889, -0.019602163, -0.013692068, -0.011476659, 0.020723889, -0.01797566, -0.0050232294, -0.022714952, -0.0023556247, -0.0027534869, -0.030034214, -0.0021540644, -0.022378433, -0.018718803, -0.009001851, -0.0011164679, 0.027285986, -0.014961021, 0.0011786886, -0.007494532, 0.00099115, -0.020906169, 0.01326441, -0.003712212, -0.025140684, -0.009927276, -0.0029708212, -0.003989138, 0.02413113, 0.004024192, 0.008237676, 0.00066909206, -0.008209632, 0.0038699547, -0.022448542, 0.00855316, 0.011532745, -0.0034265225, 0.006677775, 0.02376657, 0.017498925, 0.010691451, 0.010354933, -0.010516182, 0.008700388, -0.01594253, -0.008707399, -0.008160557, 0.0058364808, -0.0017334172, 0.011967414, 0.006789948, 0.0014652546, 0.008048384, 0.008384902, 0.024341455, -0.0074384455, -0.0050898315, -0.0011357475, 0.021242686, -0.022813102, 0.009695919, -0.0146385245, 0.0070808954, 0.019377816, 0.009303315, -0.042962104, -0.027846849, -0.0063693007, 0.025925891, 0.021144535, -0.003908514, 0.000352073, -0.008258708, 0.006667259, 0.010551236, -0.009366412, -0.003792836, -0.010572268, -0.00351591, 0.003614061, 0.014484287, 0.041055173, 0.021873657, -0.030202473, -0.01817196, 0.01682589, 0.010411019, -0.0001692448, 0.020695845, 0.010733516, 0.010130588, -0.024593843, 0.03909215, 0.019307708, 0.012878817, 0.011490681, 0.033315264, -0.003082994, 0.019475969, 0.002289022, 0.032109406, -0.00012904231, -0.009822113, 0.007894147, -0.0021365376, 0.010516182, 0.00086276507, -0.013544842, 0.03348352, -0.011013947, -0.011175196, 0.018059788, 0.02136888, 0.048009872, 0.01710632, 0.009681897, -0.0005761991, -0.0014608728, 0.008195611, -0.005962675, 0.010032437, -0.0014827816, 0.004521958, -0.028547926, -0.010109556, -0.0156621, 0.009934286, -0.007431435, 0.018971192, 0.0128577845, 0.0061028907, -0.0043747313, 0.023317879, -0.011133131, -0.0034703398, -0.033006787, 0.021803549, 0.0020506554, 0.0075646397, -0.020233132, -0.034156557, 0.0033283713, -0.0029918535, -0.00024231034, -0.017470883, -0.022827124, 0.019616183, -0.019812485, 0.013895381, 0.02243452, 0.004974154, 0.0016238737, 0.0003126373, -0.0034195117, 0.0059521585, -0.0021417956, -0.006250117, -0.022939296, 0.008223654, -0.008132514, -0.023009405, -0.021256708, 0.014168802, -0.0021295268, 0.013923424, -0.01006048, -0.017176429, 0.0012663235, 0.00614145, 0.0092682615, -0.019980744, 0.0020331284, 0.022658866, 0.0110419905, -0.027187834, -0.021018341, -0.022112023, 0.0015528895, 0.02804315, 0.028856402, 0.015984595, 0.02048552, -0.009906243, -0.011553777, -0.023752548, -0.020275198, 0.009920265, -0.010873731, 0.0054438766, -0.0029392727, 0.008616257, -0.017498925, 0.0067794314, 0.024635907, -0.0015152064, -0.008609247, 0.013839294, 0.0061449553, -0.011736059, 0.00507581, 0.0134046255, 0.015367647, 0.008251697, -0.00060774764, 0.0017307881, 0.017050235, 0.0105933, -0.013741144, -0.007950233, 0.015563948, -0.008882668, 0.017330667, 0.0011620381, 0.026430668, -0.0033669306, 0.0110419905, 0.009934286, -0.0104881385, 0.025238834, 0.00092016585, 0.000352073, -0.0007597941, -0.0016466588, -0.00097099406, 0.017849466, 0.012317954, -0.0024327433, -0.02493036, 0.009506628, 0.01068444, -0.020808017, -0.015367647, -0.0007992298, 0.030510947, -0.015914489, -0.008181589, -0.017863486, -0.00881256, -0.025294922, -0.041419733, 0.00043598335, -0.010621343, -0.014112716, -0.03623175, -0.023345923, -0.012647461, -0.027019575, -0.016335135, -0.0068495395, -0.023976894, -0.01273159, -0.014470265, 0.024299389, 0.007129971, 0.021200622, 0.021915723, 0.003018144, 0.016391221, -0.012920882, 0.00048374434, -0.011785134, -0.013811251, 0.005990718, 0.011595842, 0.0034720926, 0.0008864264, 0.00023267051, -0.0064569353, 0.010754548, 0.008581204, -0.0036070503, -0.02314962, 0.010943839, 0.0045815497, 0.0023380977, 0.012402083, -0.014890913, -0.0067338613, 0.0017088795, -0.017162409, 0.0038068576, 0.0039961487, 0.004991681, 0.008686366, -0.020527586, 0.027748697, -0.024004936, -0.0032302204, 0.0315205, -0.0122618675, 0.013348539, -0.005349231, 0.009057937, 0.0013373077, -0.0014775235, 0.022476586, -0.007894147, -0.011560788, -0.0013986521, -0.031380285, 0.028085215, 0.0079292, -0.010179663, 0.0019787948, -0.018887062, -0.011883285, -0.00012214106, 0.00703182, -0.023892764, 0.021873657, -0.014456244, -0.006870572, -0.039568886, -0.029333135, -0.027706632, 0.008265719, 0.0074805105, 0.0050793155, -0.024383519, 0.010530203, 0.0069862497, -0.036007404, 0.00058058085, -0.036456097, -0.011974425, 0.010088523, -0.008398923, 0.018522501, 0.005626157, -0.013460712, -0.031913105, -0.009773037, -0.012324965, -0.04200864, 0.011343454, -0.019882593, 0.024103088, 0.00899484, 0.032894615, -0.004294107, 0.011988447, 0.009156088, 0.0014670073, -0.002578217, -0.020948233, 0.0023468612, -0.013306474, 0.028085215, 0.025883827, -0.002439754, 0.009121034, 0.0074033914, -0.0007852082, 0.017765336, -0.009282283, -0.006481473, -0.008840603, -0.0061134067, 0.0029795847, 0.0018911599, -0.0039856327, 0.017863486, -0.0146385245, -0.019924657, 0.030230517, -0.0010358439, 0.00070940406, -0.0062185684, 0.010796613, -0.016980127, -0.0044483445, 0.009506628, -0.0071615195, 0.011736059, -0.01744284, -0.033932213, -0.0020453972, 0.014126737, -0.001665062, 0.018452393, -0.0072701867, 0.014056629, -0.010389987, 0.019742377, -0.0354185, -0.014175813, -0.004949616, -0.016335135, -0.005457898, -0.0052265422, 0.021607246, -0.010607322, 0.018788911, 0.011294379, -0.027790762, 0.0049531213, -0.013600928, -0.017036214, -0.0065270434, -0.009485596, 0.029220963, 0.022266261, 0.019518033, 0.032782443, -0.0017220246, -0.032978743, -0.0204715, -0.007859093, -0.019335752, 0.032445926, 0.01950401, -0.007915179, -0.004371226, 0.017779358, 0.02403298, -0.0051529286, -0.011462637, -0.0035106519, 0.020597694, 0.0018596114, -0.021509096, -0.0023380977, -0.046411414, 0.019994766, 0.0009937792, -0.01014461, -0.018031746, -0.00204715, -0.00029839666, 0.008532128, -0.029529437, 0.016685674, 0.01905532, -0.031043768, 0.0056296624, -0.0010726505, -0.0206678, 0.0033932212, -0.007943222, 0.03687674, -0.016713718, 0.0049040457, 0.028519884, -0.002381915, -0.016713718, 0.0015169592, -0.015591991, 0.030426817, -0.03233375, 0.004995186, -0.003503641, 0.012367029, 0.006064331, 0.015381668, -0.024229283, -0.017723272, -0.010761559, -0.0006103767, 0.022476586, 0.008854625, 0.0067689153, -0.011806166, -0.010474117, -0.004550001, -0.01735871, -0.03292266, 0.0031408328, -0.0007843319, -0.03168876, 0.002590486, 0.0018017724, 0.0013434421, 0.0021838604, -0.018802932, 0.005724308, 0.012927893, -0.03696087, 0.011476659, -0.0074734995, 0.002371399, -0.011651929, 0.005202004, -0.015591991, -0.01504515, 0.019181514, -0.03348352, -0.020541608, -0.025126662, -0.012745611, 0.017611098, 0.007908168, -0.009008862, 0.0031653706, -0.012331976, -0.0017859981, 0.003989138, -0.013699079, 0.0059766965, -0.01504515, 0.009752005, -0.004024192, -0.014526351, 0.010726505, 0.009036905, 0.016881976, 0.0067759263, -0.017933594, -0.0051143696, -0.005633168, 0.010901774, -0.004995186, -0.010158631, -0.029361177, -0.0028533905, 0.0033704361, 0.015816336, 0.0008837974, -0.011764102, -0.0033178553, -0.013061097, 0.00047804808, 0.01895717, -0.013194302, 0.0030391763, -0.028772272, -0.0005858389, -0.015185365, -0.0116238855, 0.0042625587, 0.026514798, 0.005331704, -0.022112023, -0.026598928, 0.0075716507, -0.059339307, -0.02135486, -0.002504604, 0.01958814, 0.015648078, 0.0019437409, -0.01727458, 0.021340838, 0.016531438, 0.03037073, -0.014442222, -0.012058554, 0.013748154, -0.020106938, 0.010936828, -0.009752005, -0.03712913, -0.0059521585, 0.019546075, 0.027271964, 0.013579896, 0.00783806, -0.0055174897, -0.010579279, -0.00015138919, 0.011722037, 0.009955319, -0.02634654, -0.008966797, -0.008980819, -0.011231282, 0.012402083, -0.0068880985, -0.013916413, -0.022504628, -0.0024765607, -0.009716951, 0.012409094, -0.0423732, 0.00059547875, -0.011497691, -0.022490606, 0.01139253, 0.008125503, 0.010375965, 0.031099854, 0.010985904, -0.014091683, -0.012317954, 0.0062606335, -0.019686291, -0.006488484, -0.0039856327, -0.0110490015, -0.022238217, -0.0128157195, 0.0066392156, -0.010852699, 0.0023398504, -0.018298155, -0.0056296624, 0.005422844, 0.0192376, 0.017611098, -0.02662697, -0.00006824564, -0.020597694, -0.011091066, 0.0073753484, -0.012044533, 0.011609864, -0.0059241154, 0.010011405, 0.010116566, -0.0026045076, -0.01308213, -0.027580438, 0.0122338245, -0.010740526, 0.013460712, 0.20168634, 0.00009360497, -0.0067268508, 0.024411563, 0.008027351, 0.00007251783, 0.009639833, 0.012598385, 0.0012794688, 0.021747462, 0.02556133, 0.0071159494, 0.004441334, 0.011056012, 0.0036736527, 0.017022192, -0.023317879, -0.027496308, 0.0015432496, -0.020850083, 0.014224888, -0.024537757, -0.0010270803, 0.00034966302, 0.047420967, 0.004844454, -0.009282283, 0.0011366239, 0.02894053, -0.012955936, -0.029809868, -0.0013092646, 0.0022311832, -0.00351591, -0.020709867, -0.008405934, -0.00086276507, 0.019644227, 0.017148387, -0.012367029, 0.0074454565, -0.0058890614, 0.013811251, -0.015451776, 0.00039632857, 0.01629307, 0.004833938, -0.013741144, 0.0010200696, 0.013734133, -0.04122343, -0.0051424126, 0.0128157195, 0.014708633, 0.020457478, 0.0067619043, 0.043747313, -0.017555011, -0.006677775, 0.01557797, -0.024229283, 0.013804241, -0.032193538, 0.021116491, -0.02706164, 0.0077749635, -0.015549927, -0.0018859018, 0.012934903, 0.0026448197, 0.0060432986, -0.024762101, -0.000049869705, -0.00064849784, -0.02163529, -0.016517416, 0.025308942, 0.019083364, 0.022953318, 0.0037087067, -0.0025589375, -0.00525108, 0.008539139, -0.016811868, -0.003105779, -0.028856402, 0.00440628, 0.0040802783, -0.009829124, 0.0061624823, -0.02314962, -0.021102471, -0.025519267, -0.032025278, 0.0055315113, 0.036652397, -0.0023328396, 0.019027278, 0.004539485, -0.007459478, -0.013600928, 0.010123577, 0.034773506, 0.01770925, -0.003347651, 0.028533906, -0.02038737, -0.001977042, -0.009135056, -0.005275618, -0.013250388, -0.037942383, 0.010270804, -0.007845071, -0.0027569921, -0.0016904761, -0.0017965143, -0.011771113, 0.0013776198, -0.005117875, 0.01950401, -0.0021084943, -0.004679701, 0.009646843, 0.0016983632, 0.0021067415, -0.016573502, -0.010922807, 0.022462564, -0.024103088, -0.0016668148, -0.014400157, 0.014722654, -0.0057558566, -0.013699079, -0.0016168628, -0.0037087067, -0.00022083981, -0.028015107, 0.0005450887, 0.018284135, -0.022378433, -0.02056965, 0.0071369815, 0.001431077, -0.019335752, 0.037970424, -0.011154163, -0.012051544, -0.006730356, -0.01691002, -0.022378433, -0.0152414525, -0.0011164679, 0.025056554, -0.0046656793, -0.028646078, -0.03693283, 0.0122268135, -0.008735442, -0.028996617, -0.0055665653, 0.045710336, -0.005321188, -0.01290686, -0.010425041, -0.18553348, 0.019363794, 0.02020509, -0.017344689, 0.008265719, -0.0030304128, 0.027398158, -0.016110789, -0.009632822, 0.0034826086, 0.01932173, 0.0062010414, -0.023275815, -0.0044203014, -0.0026553357, -0.0009929028, -0.00703182, -0.008945765, 0.015185365, 0.028337603, 0.025575353, -0.024047, 0.005156434, -0.012941914, 0.0031531018, 0.0008487435, -0.006982744, 0.025757633, 0.018270113, -0.0040907944, -0.017344689, -0.0018526006, 0.02083606, -0.008644301, 0.0032810485, -0.0022557208, 0.0146245025, -0.017793378, -0.0010358439, 0.0072771977, -0.0032021771, 0.03471742, -0.0064674513, -0.0074664885, -0.0025010984, 0.010460095, 0.008041373, 0.016138833, 0.024888296, -0.026977511, 0.006169493, -0.007459478, 0.014904934, -0.022869188, 0.022771038, 0.0030970154, -0.024467649, 0.0004265626, 0.018452393, 0.009107013, -0.0048760027, -0.041251473, 0.0192376, 0.004655163, -0.022742994, -0.013376582, -0.018410329, -0.004890024, -0.023836678, 0.008868646, 0.0012610654, -0.01076857, 0.040718652, -0.018115874, 0.019518033, 0.0013241625, -0.022069959, 0.004532474, -0.0061449553, 0.0031899083, -0.011806166, 0.0439997, 0.009057937, 0.01843837, 0.007887136, 0.010845688, -0.0012041028, 0.01112612, 0.0024204743, -0.014007553, 0.033259176, 0.0008040497, -0.007186057, -0.009787059, 0.015690142, 0.010747537, -0.0021680861, 0.01085971, 0.0067513883, -0.0076768124, -0.0075365966, -0.0015336098, -0.022911254, 0.017555011, 0.01414777, -0.0061835144, 0.014182823, 0.036792614, 0.015872423, -0.0027727664, -0.032894615, -0.011259325, 0.023177663, 0.010305857, 0.016769804, 0.027804783, 0.0051984987, -0.029389221, -0.002949789, -0.013040065, 0.030202473, -0.003936557, -0.0070037767, 0.016405243, 0.009114024, -0.011701005, -0.10353531, -0.053029597, 0.013607939, 0.058273666, -0.007585672, 0.02829554, 0.017779358, 0.026542842, -0.006046804, 0.02626241, -0.030146386, -0.015956553, -0.016867954, -0.012717568, -0.011343454, 0.0077399095, 0.0069932607, -0.009920265, -0.00587504, 0.043214492, -0.015746228, -0.006712829, 0.005359747, -0.030286603, -0.00454299, -0.008083438, -0.035755016, 0.037633907, 0.01744284, -0.00802034, 0.0010840431, -0.020793995, 0.018143918, -0.04332667, -0.009934286, -0.0039575896, -0.019728357, -0.020976277, 0.052468732, -0.007361327, 0.022911254, 0.013152237, 0.0020629242, -0.016433286, 0.019363794, -0.02732805, 0.0019244612, 0.0066742697, -0.014764719, -0.028337603, -0.021607246, 0.0067759263, 0.0071510035, -0.014217877, 0.0053106714, 0.008679355, 0.021789528, 0.031099854, -0.016769804, 0.00006939584, -0.0045079365, -0.0012303932, 0.0116168745, 0.021971809, -0.010263793, -0.00052843813, -0.03037073, -0.0065550865, 0.00712296, -0.015788294, 0.010165642, 0.011897306, -0.0059135994, 0.029669654, -0.026038066, 0.025996, -0.038503245, -0.010046459, 0.03177289, -0.016839912, -0.0122618675, -0.0014117974, 0.008931743, -0.026010022, 0.03157659, -0.006817991, -0.01424592, -0.00025874187, 0.0021645806, -0.020808017, -0.0021663334, 0.027075661, 0.029220963, -0.022196153, -0.040718652, 0.020008788, -0.00996934, -0.00596618, 0.008784517, 0.005769878, -0.0140496185, -0.036540225, -0.063714035, 0.01807381, -0.0058505023, 0.0040802783, 0.0086723445, 0.025336986, -0.0079292, -0.0081886, -0.0009043916, 0.017232515, -0.014316028, 0.019279666, -0.0008110605, -0.010733516, -0.008469031, -0.0067969584, 0.021705398, 0.025308942, 0.015143301, 0.018452393, 0.0035194154, -0.013313485, 0.013103162, 0.00083165464, -0.011595842, 0.0017623367, -0.00898783, 0.012633439, -0.007333284, -0.013895381, 0.007487521, -0.02092019, -0.005117875, 0.01531156, -0.009205164, -0.0023153126, -0.017036214, 0.014414179, 0.015984595, 0.0027377126, -0.011820188, -0.020555628, 0.00685655, -0.03463329, -0.01905532, 0.010572268, -0.020345306, 0.018101854, 0.0122829, -0.0077749635, 0.008532128, 0.0116659505, -0.01674176, -0.017162409, -0.006400849, -0.023444073, -0.0027604976, -0.0029673157, 0.0024502703, -0.007073885, 0.023191685, 0.01602666, 0.041784294, 0.013678047, 0.013909402, 0.0042520426, -0.0140426075, -0.0052440693, 0.013720111, -0.01023575, -0.014526351, -0.010523192, 0.027285986, 0.0012671999, 0.018564565, -0.002884939, -0.0050617885, 0.0036456096, 0.022224197, 0.01950401, -0.0022101507, 0.020822039, -0.04461665, 0.017863486, 0.018746845, 0.014386136, -0.010670419, -0.0055209952, -0.0048654866, -0.0028025622, -0.020681823, 0.0035211681, -0.0018823964, 0.0018070305, -0.0044658715, 0.0110349795, -0.0059416424, -0.007221111, 0.024579821, 0.014610481, 0.011063023, 0.0068039694, 0.0062010414, 0.0016790836, 0.013159248, -0.008272729, -0.02394885, -0.037886295, -0.0029077241, 0.021298772, 0.007368338, 0.0023083019, 0.0066392156, 0.008293762, -0.0077048554, 0.013040065, -0.020457478, 0.0016562985, -0.021691376, 0.0163772, -0.0011120861, 0.019644227, 0.030342689, 0.0020664297, 0.013467723, -0.0072000786, 0.010908785, -0.014105705, -0.00093331106, 0.016699696, 0.01433005, -0.0025414105, -0.026655015, -0.026248388, -0.019714335, 0.0016229973, 0.0006559468, 0.008644301, -0.02270093, 0.06135841, 0.032978743, -0.011602853, 0.010411019, 0.011441605, 0.008307783, 0.01246518, 0.023962872, -0.0018841492, -0.0077399095, 0.019896615, 0.007627737, 0.010214717, -0.005587598, -0.019181514, 0.016713718, -0.0031215532, 0.01050216, 0.003070725, 0.006677775, 0.010305857, -0.004788368, 0.025126662, 0.0045745387, -0.015157322, -0.017428817, 0.030931594, -0.008847614, -0.008532128, -0.010130588, 0.017064257, 0.0052721123, -0.028029129, -0.032642227, 0.018718803, -0.013902391, 0.004441334, -0.016349157, -0.0036070503, 0.00020583234, 0.01584438, 0.028884444, -0.047505096, -0.026879359, 0.0076067043, 0.0034125007, -0.0122338245, -0.010193685, -0.000057236506
]
}
],
"model": "ada",
"usage": {
"prompt_tokens": 1,
"total_tokens": 1
}
}

Not a big deal, right? Now, with this in mind, let’s use the various resources in Azure to make AOAI more error-resistant and redundant.

Use API Management as API proxy

What is API Management?

API Management (APIM) is a managed API management service provided by Azure that combines various backends together to provide a unified interface for APIs. If properly configured, users only need to access the APIM’s endpoint, which can load balance multiple AOAI resources or switch backend in the event of errors. Let’s take a look.

Setting up API Management

First, set up APIM. Go to the Azure portal and type “API Management”, then select “API Management services”.

API Management services

Next, press the “Create” button to create the API management service. Fill in all required fields. Default values for the other fields are fine. The price tier can also be defaulted at this point. Finally, press “Review + create” to create the APIM resource. This may take up to 30 minutes.

Create API Management service

Once you have created APIM, open it from Azure portal. The important menu here is “APIs” under “APIs”.

APIs

Opening “APIs” brings up an important screen. Here you can define the API and configure how it will be forwarded to the backend and finally returned to the user. There is a pre-defined “Echo API” that is very useful as a reference.

APIs

Add our API from “+ Add API”. There are many ways to do this, but we will choose “HTTP — Manually define an HTTP API” this time.

Create an HTTP API by filling in the required fields.

Once you have your API, select it and switch to the “Settings” tab. Here, the base URL is shared with other APIs (e.g., Echo API); you can make this API endpoint unique by adding any suffix in the API URL suffix field.
Next, under “Products,” select the predefined “Unlimited” product. This will be discussed in more detail at another time, but one API should be published tied to one product. Finally, uncheck “Subscription required”. This serves as the API Key for this API, but we will not use it this time for simplicity.

Now that the basic setup is complete, let’s briefly explain the basic flow of APIM, which divides the process from API call to response into several stages.

  • Frontend: HTTP Methods and URL paths
  • Inbound processing: Modify the request before it is sent to the backend
  • Backend: Backend’s HTTP(s) endpoint
  • Outbound processing: Modify the response before it is sent to the client

Now that we understand the process, let’s add the API Frontend settings from the “+ Add operation” under “Design” tab. Recall that we called the Completions API and Embeddings API at the beginning. We need to be able to access them when we use AOAI resources via APIM. Each of them has a different URL path, so we need to map them to the backend with different paths.

First, add POST /openai/deployments/{deployment}/chat/completions for Chat Completions API. The {deployment} part is a variable that the user can change upon request to match the deployed model name.

Likewise, add POST /openai/deployments/{deployment}/embeddings for Embeddings API.

Connect one AOAI resource to APIM

Now, our goal is to achieve load balancing and redundancy with multiple AOAI resources. But don’t rush. Take it one step at a time: tie APIM and AOAI together on a one-to-one basis and make sure APIM is functioning well as a proxy.

From the “Design” tab, make sure to select “All operations”, then click “Backend” and then “HTTP(s) endpoint” to set one backend.

Specify your AOAI resource in “Service URL” and check “Override”.

Now, let’s try accessing the API in exactly the same way as the first time, except for changing the URL to the APIM endpoint. Your APIM endpoint should look like this: https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023–05–15

# Chat Completions API via APIM
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{
"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]
}'

If the API key setting is correct, you should receive the correct response exactly as you had made a direct request to AOAI. The Embeddings API should also work the same way.

# Embeddings API via APIM
curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/text-embedding-ada-002/embeddings?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-H "api-key: ${API_KEY}" \
-d '{"input": "Sample Document goes here"}'

Have you made it? Congratulations!

Automatically load values and secrets

In the example in the previous section, APIM and AOAI were linked one-to-one and the API Key of AOAI was directly specified in the request header. A question arises here. What if down the road we have to handle multiple AOAI resources on the backend? Since the API Key is different for each AOAI resource, we cannot specify it in the header.
In fact, there are several ways to do this, starting with “Named Values”.

Go to “Named values” under APIs.

Here, you can name and save the plain text or secret. For example, let’s add the AOAI endpoint specified in the backend URL as “my-endpoint-us1”.

The defined Named values can be used in the {{key-name}} format.

In the same way, save the key information for “my-endpoint-us1” in Named values. When doing so, please make sure to select “Secret” as its type¹.

1: For all of you who are familiar with Azure security best practices. Yes, that’s right, “Key vault” is better than “Secret”, but don’t worry. I will explain Azure AD authentication, which is even better later in this entry.

Now, how do we set this key in the header? On the “Design” tab, make sure to select “All operations”, and click </> next to “Policies” in “Inbound processing”.

This is called “policies” and is where processing and calculations can be inserted at each stage of the request and response. Here, the key is set using <set-header name=”api-key”> as follows. The value will be the secret set earlier using {{my-endpoint-us1-key}}.

<policies>
<inbound>
<base />
<set-backend-service base-url="{{my-endpoint-us1}}" />
<set-header name="api-key" exists-action="override">
<value>{{my-endpoint-us1-key}}</value>
</set-header>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

After saving, try calling the API without the api-key header from the curl request. Did that work? Perfect!

curl "https://my-cool-apim-us1.azure-api.net/openai-test/openai/deployments/gpt-35-turbo/chat/completions?api-version=2023-05-15" \
-H "Content-Type: application/json" \
-d '{"messages": [{"role": "user", "content": "Tell me about Azure OpenAI Service."}]}'

Verify communication details

How can we debug if our API testing using curl command doesn’t work? Good news is that APIM has an excellent testing capability built in.
Select the “Test” tab and the API you wish to test, then set the same parameters that you specify in curl, including the “deployment” template parameter, the “api-version” query parameter, the “Content-Type” header, and finally the JSON of the request body. Then press “Trace”.

If asked, click on “Enable tracing for one hour”.

The result seems to be HTTP 200. Let’s switch to the “Trace” tab to see the details.

In this view, the entire process of the API receiving the request, forwarding it to the backend, and finally returning the response to the user is recorded verbatim. As you can see, the Named values we set in the previous section are used correctly and set in the header through policy expressions.

This Trace feature is super important for debugging. Please take advantage of it!

Load balancing AOAI resources using APIM

We are getting closer and closer to what we want to accomplish. Here we will try to load-balance AOAI resources using the policies we have already described. First, prepare the following.

  • Create “my-endpoint-canada” in the canadaeast region in addition to the existing AOAI resource “my-endpoint-us1”.
  • Likewise, create “my-endpoint-australia” in the australiaeast region.
  • Deploy model gpt-35-turbo and text-embedding-ada-002 in “my-endpoint-canada”.
  • Deploy model gpt-35-turboin “my-endpoint-australiaeast”. text-embedding-ada-002 is not available in this region.
  • Finally, add my-endpoint-canada, my-endpoint-canada-key, my-endpoint-australia, and my-endpoint-australia-key just like we did for my-endpoint-us1 in Named values.

Next, change the Incoming processing policy in the same way as before.

Update the policy as follows:

<policies>
<inbound>
<base />
<set-variable name="rand" value="@(new Random().Next(0, 3))" />
<choose>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 0)">
<set-backend-service base-url="{{my-endpoint-us1}}" />
<set-header name="api-key" exists-action="override">
<value>{{my-endpoint-us1-key}}</value>
</set-header>
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 1)">
<set-backend-service base-url="{{my-endpoint-canada}}" />
<set-header name="api-key" exists-action="override">
<value>{{my-endpoint-canada-key}}</value>
</set-header>
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 2)">
<set-backend-service base-url="{{my-endpoint-australia}}" />
<set-header name="api-key" exists-action="override">
<value>{{my-endpoint-australia-key}}</value>
</set-header>
</when>
<otherwise />
</choose>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

This needs an explanation. First, the policy allows C# expressions to be written within @(). In other words, you can do anything basic that you can do in C#.

First, @(new Random().Next(0, 3)) generates a random number and assigns it to the rand variable. Next, <choose> and <when> are combined to branch the case according to the value of the variable. This is similar to a switch in a general programming language. Depending on the value, the backend and its key is changed dynamically.

“Trace” also provides a detailed look at the evaluation of expressions and the subsequent processing branches. Here a random number 2 is selected and the corresponding AOAI resource in Australia is selected for the backend.

Repeat the tests many times to make sure that the backend switches correctly according to a random number and that the API returns a right response no matter where it is connected. How did it go this time?
Now we have achieved load balancing, although it is a bit hacky! Congratulations!

Azure AD Authentication with Managed Identity

Authentication using API keys as described so far has several problems.

  • More problematic if the key is compromised.
  • If you share keys, you can’t get audit information such as who is using it and how much they are using it.
  • The more keys there are, the more complicated it becomes to manage Named values, and key rotation cannot be automated.

A great way to solve this is Azure AD Authentication with Managed Identity. Managed Identity is an ID on Azure AD assigned to the application. This will allow you to benefit from Azure RBAC. For example, you can enable Managed Identity in APIM, and AOAI resource would allow read-only access to it. Authentication with Managed Identity is done through a secure channel, making it more secure than carrying a key directly. Let’s see how it works!

First, enable System assigned Managed Identity in APIM. Go to “Managed identities” under “Security”.

Then enable System assigned Managed Identity.

Next, assign a role from the AOAI resource to APIM’s Managed Identity. Go to “Access controll (IAM)” in AOAI resource, then select “Add” and “Add role assignment”.

Type “openai” to narrow down the roles, then select “Cognitive Services OpenAI User” role. This role is best suited for having the associated Managed Identity to get information from AOAI APIs.

Next, select the members by hitting “Select members”.

Here, select the Managed Identity for APIM that you have just created.

Once correctly assigned, you will find the relationship between Managed Identity and Role in the “Role assignments” screen.

Next, modify the policy so that APIM authenticates using Azure AD. Go to the Inbound processing policy screen. Modify the policy as follows.

<policies>
<inbound>
<base />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
<set-variable name="rand" value="@(new Random().Next(0, 3))" />
<choose>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 0)">
<set-backend-service base-url="{{my-endpoint-us1}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 1)">
<set-backend-service base-url="{{my-endpoint-canada}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 2)">
<set-backend-service base-url="{{my-endpoint-australia}}" />
</when>
<otherwise />
</choose>
</inbound>
<backend>
<base />
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

This would also need an explanation. First, authenticate securely with Azure AD using <authentication-managed-identity> and store the obtained token (which is more secure than the API Key because it is short-lived) in msi-access-token variable. Next, an Authorization header is added using <set-header>. This is set as a Bearer token. Finally, remove all API keys from policies. Let’s test if we can now communicate with the API with Azure AD authentication.

If you check the “Trace”, you can see that the authentication is done internally with Azure AD and the token obtained from the authentication is set in the header.

Test it a few times to make sure you can call AOAI APIs without having to use the API key directly, even if the backend changes randomly. This made authentication with APIM and AOAI much easier and much more secure. Congratulations!

Utilize retry feature of APIM to make AOAI redundant

We come to the last topic of this entry. It is redundancy. Why is redundancy necessary? In other words, what was the problem with the previous approach? The problem is that with simple load balancing, if an error occurs on the backend, it is passed back to the user.

Interestingly, APIM’s policy is so sophisticated that it has the ability to retry under certain conditions if an error occurs on the back end. Change the policy as follows.

<policies>
<inbound>
<base />
<authentication-managed-identity resource="https://cognitiveservices.azure.com" output-token-variable-name="msi-access-token" ignore-error="false" />
<set-header name="Authorization" exists-action="override">
<value>@("Bearer " + (string)context.Variables["msi-access-token"])</value>
</set-header>
<set-variable name="rand" value="@(new Random().Next(0, 3))" />
<choose>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 0)">
<set-backend-service base-url="{{my-endpoint-us1}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 1)">
<set-backend-service base-url="{{my-endpoint-canada}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 2)">
<set-backend-service base-url="{{my-endpoint-australia}}" />
</when>
<otherwise />
</choose>
</inbound>
<backend>
<retry condition="@(context.Response.StatusCode >= 300)" count="3" interval="1" max-interval="10" delta="1">
<choose>
<when condition="@(context.Response != null && (context.Response.StatusCode >= 300))">
<set-variable name="rand" value="@(new Random().Next(0, 3))" />
<choose>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 0)">
<set-backend-service base-url="{{my-endpoint-us1}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 1)">
<set-backend-service base-url="{{my-endpoint-canada}}" />
</when>
<when condition="@(context.Variables.GetValueOrDefault<int>("rand") == 2)">
<set-backend-service base-url="{{my-endpoint-australia}}" />
</when>
<otherwise />
</choose>
</when>
<otherwise />
</choose>
<forward-request buffer-request-body="true" buffer-response="false" />
</retry>
</backend>
<outbound>
<base />
</outbound>
<on-error>
<base />
</on-error>
</policies>

Check it out step by step. The first half remains the same. The only change is in the <backend></backend>section.
First, <retry condition=”@(context.Response.StatusCode >= 300)” count=”3" interval=”1" max-interval=”10" delta=”1"> part declares that a retry will be performed up to three times if the backend HTTP status code is 300 or higher. Next, only if the backend actually fails, a random number is generated in the same way as above to determine the next backend that should be accessed.
<forward-request buffer-request-body=”true” buffer-response=”false” /> part of the backend policy is set to temporarily buffer the request body in order to forward the request to another backend.

Now let’s actually send requests. In doing so, try removing the “Cognitive Services OpenAI User” role you just assigned to “my-endpoint-canada” from the “Role assignments” in IAM. What do you think would happen?

The request will most likely succeed with HTTP 200 in most cases, no matter how many times you experiment. But results do not tell the whole story. What if we check “Trace”?
If you check Trace, you will notice that the request succeeds without any problems when the AOAI resources in the US and Australia are selected as the backend. However, if Canada is selected as the backend, please look a little more carefully at Trace.

The request was correctly forwarded to the backend in Canada, but…

APIM received an HTTP 401 from the backend because the “Cognitive Services OpenAI User” role was revoked from APIM’s Managed Identity.

This is where the retry policy described earlier comes into play. As you can see, the retry condition becomes true and the process begins.

Once again a random number was generated and this time the AOAI resource in the US was chosen as the backend.

It is correctly forwarded to the back end in the US…

Boon! We got HTTP 200!

The beauty of this retry mechanism is that there is no way to know about such drama from the end user of this API! Our heroes are always at work while people don’t even realize it. Isn’t that amazing?

Conclusion

This entry proposed a method of load balancing and redundancy for AOAI using APIM, and it worked out pretty well. However, problems remain.

  1. APIM does not originally have load balancing capabilities. In this article, this was accomplished with a hack using policies, but the code would be complex and maintenance would be problematic.
  2. This time we randomly chose one of the AOAI resources for the back end. However, the models available in each region are not the same. For example, of the regions we used, only the australiaeast region does not have an Embeddings model. Therefore, if Australia happens to be selected when the Embeddings API is called, an error will occur. This is not good.

The upcoming entries will present an approach using Azure’s load balancer to solve these problems. Stay tuned!

--

--

Fumihiko Shiroyama
Fumihiko Shiroyama

Responses (1)