Azure OpenAI: what are the real costs for prompts and responses?

Stephen Bedford, Product Director

More blogs by this author

As businesses increasingly leverage AI to enhance their operations, understanding the associated costs becomes crucial. One key consideration for adopting Azure OpenAI is the cost of prompts and responses. This blog delves into what these costs entail and how they can impact your budget.

This blog covers:

What is Azure OpenAI?
How much does Azure OpenAI cost for text generation?
Input / Output Token Costs on Azure OpenAI
Every time you ask Azure OpenAI a question… you get a bill
Cost Calculator for Azure Open AI
When should I use Gen-AI and which model?

What is Azure OpenAI?

Azure OpenAI leverages the power of Large Language Models (LLMs) particularly OpenAI's GPT models, to generate high-quality and diverse outputs for various scenarios (commonly known as Gen-AI). It enables users to easily integrate text generation into their applications, using simple and flexible APIs. Users can provide a prompt or a context, such as a question, a topic, or a sentence, and Azure OpenAI generates a natural language response that is relevant, coherent, and creative. Azure OpenAI can be used for various purposes, such as content creation, data augmentation, conversational AI, summarization, question answering, and more.

How much does Azure OpenAI cost for text generation?

The answer is: “It depends”.

Microsoft publishes costs for Azure OpenAI on its website Azure OpenAI Service - Pricing | Microsoft Azure. Azure OpenAI is a pay-as-you-go service based upon the actual consumption of tokens.

There are three elements to the costs that will be incurred when using one of the GPT models based upon the type of token consumed:

Embedding Tokens – used when converting input text (typically made up from documents) into a vector representation (think of this as the creation of an AI index).
Input Tokens - used when providing text to the model for generation or completion (the prompt).
Output Tokens - used when receiving text from the model as a response or continuation.

This blog will deal with the costs for Input and Output Tokens. For additional detail about Embedding Tokens, refer to Exploring the Differences: Search vs Generative AI (clearpeople.com).

Input / Output Token Costs on Azure OpenAI

Each token corresponds to approximately four characters of text, while the number of tokens consumed depends on the length and complexity of the input and output texts. The cost per token varies depending on the model type and the tier of service. At the time of writing, the costs per GPT model are as follows:

2024 May, GPT Model Cost Profiles

Figure 1 - GPT Model Cost Profiles

Every time you ask Azure OpenAI a question… you get a bill

So, let’s take a look at some Azure OpenAI cost examples with the ClearPeople Atlas Knowledge Assistant (which uses Azure OpenAI). For demonstration purposes, we will use the GPT-4 Turbo model. As described above, different LLMs will incur different costs, but the calculations hold for available models.

Example 1

Let’s assume that we have indexed a range of documents that hold company information. These might be referenced when responding to employee questions, when responding to external queries, or when responding to RFPs.

Prompt	#Chars (approximately)	Token Cost
How much insurance liability cover do we have?	50	$0.17
Response	#Chars (approximately)	Token Cost
The company holds professional indemnity insurance with a limit of liability of £5,000,000, public liability insurance with a limit of £10,000,000, employers' liability insurance also with a limit of £10,000,000, and cyber liability insurance with a limit of £5,000,000, all expiring on 31 July 2024. Citations – (source provided but omitted from here)	500	$0.00

Using the GPT-4 Turbo model, the total cost of asking this question was $0.17 (seventeen cents).

At first glance, this seems odd. An input token costs $0.01 per 1000 tokens. A completion token costs $0.03 per 1000 tokens, nearly 3 times more. But the completion cost was zero and it had approximately 10 times more characters! What gives?

What is happening here is that the input characters (and hence tokens) are comprised of more than just the user prompt. The actual number of input tokens is comprised of:

The user prompt.
In this example, the Atlas AI (optional) system prompt that gets wrapped around every user prompt for governance purposes.
Any immediate previous prompts and responses relating to that topic - also known as “history.”
The content then extracted from the AI index based upon the above – also known as “chunks.”

Together, the prompts, history and chunks of content are then fed into the relevant LLM model for processing to create a response. In this example this totaled approximately 68,000 characters (or roughly 17,000 tokens).

The number of characters in the response was equal to approximately 125 tokens (500/4).

The input cost is therefore - $0.01/1000*17000 = $0.17

The output cost is therefore - $0.03/1000*125 = $0.00

Now whether it was worth spending an additional $0.17 answering that question is a subject for debate when you might have used a search tool instead. But let’s assume that this is a valid question, and a user might ask these sorts of questions 5 times per day. In this case, you would incur a cost of $0.85 per day or $18.42 per month (approximately).

Example 2

Let’s assume that we have indexed a range of documents that relate to comments made in respect to proposed SEC regulatory changes. A lawyer might need to query these changes to answer a client’s questions or to create a paper detailing the potential advantages/disadvantages that the regulation might impose on clients. Let’s also assume that previous prompts and responses regarding the topic had also been made. The following table details the prompts and responses, together with the approximate cost.

Prompt	#Chars (approximately)	Token Cost
With regard to "The Enhancement and Standardization of Climate Related Disclosures for Investors" please provide a detailed report listing at least 10 arguments being made in favor and at least 10 arguments being made against the proposed regulation, citing case law where relevant. Please use source data from a minimum of 6 documents and ensure that citations are provided along with the names of the firms or individuals contributing any arguments.	<500	$0.43
Response	#Chars (approximately)	Token Cost
Arguments in Favor of the Proposed Regulation: 1. Investors demand rigorous and comparable climate-related disclosure, as evidenced by 733 global institutional investors signing a statement for mandatory TCFD-aligned climate risk disclosure requirements globally. Citations -[Investor Agenda 2021]. 2. Disclosure of material and systemic risks of climate change helps companies and investors understand, price, and manage climate risks and opportunities, which are essential for efficient securities markets. Citations - [Ceres 2022]. 3. Assurance of GHG emissions disclosure suggests that companies recognize climate-related information as decision-useful for investors, meriting scrutiny for data quality. Citations - [Center for Audit Quality 2022]. (You should note that the actual AI response was much longer; we have shown just enough content to make the point).	<4500	$0.03

Using the GPT-4 Turbo model, the total cost of asking this question was $0.46 (forty six cents). This is because the prompt has been enriched with historic data and the prompt tokens consumed are now circa. 43,000.

If a user were to ask these sorts of questions 5 times per day, you could expect an approximate cost of $2.30 per day or $49.83 per month approximately.

Example 3

Let’s now show another prompt similar to example 2 but now switch to the GPT-4 model. In this case no history (i.e. previous prompts/responses) were provided. The Azure OpenAI costs were as follows:

Prompt	#Chars (approximately)	Token Cost
What are the main arguments and comments for and against in respect to File Number S7-10-22	<100	$0.67
Response	#Chars (approximately)	Token Cost
The discourse surrounding File Number S7-10-22, which pertains to the Enhancement and Standardization of Climate-Related Disclosures for Investors, is multifaceted and involves a variety of stakeholders with differing perspectives. The main arguments in favor of the proposed rule emphasize the growing investor demand for transparent and standardized climate-related information, which is seen as crucial for making informed investment decisions. Proponents argue that the rule would provide clarity to investors about the climate risks associated with their investments and would promote more responsible corporate behavior regarding environmental impact. Citations – (source provided but omitted from here). The company holds professional indemnity insurance with a limit of liability of £5,000,000, public liability insurance with a limit of £10,000,000, employers' liability insurance also with a limit of £10,000,000, and cyber liability insurance with a limit of £5,000,000, all expiring on 31 July 2024. Citations – (source provided but omitted from here)	~4500	$0.14

As you can see, the total cost of asking this question with the GPT-4 model was $0.81 (eighty one cents). In this example only approx.11,200 prompt tokens were consumed due to the lack of history.

If a user were to ask these sorts of questions 5 times per day, you could expect an approximate cost of $4.05 per day or $87.85 per month approximately.

Different models will generate different costs even if you ask the same prompt, as detailed in the GPT Model Cost Profiles table, above.

Cost comparison of the models is a little tricky due to the different capabilities and capacities provided. As a very rough guide, based upon our testing of all three models and using similar prompts and asking for similar response outputs, GPT-4 is the most expensive model to use (but bear in mind it can create twice the amount of output in the response). GPT-4 can be 4-5 times more expensive to use than GPT-4 Turbo, but GPT-4 Turbo can handle much larger prompts and inputs and therefore the potential input costs will rise proportionally. GPT-4 can be up to 100 times more expensive than GPT-3.5 Turbo, which is less capable but offers the fastest response times.

This is why the AI tools that you choose should be able to utilize different models for different purposes, one of the things that the Atlas Knowledge Assistant provides.

Cost Calculator for Azure Open AI

Ready to calculate your costs using our Azure OpenAI Calculator?

When should I use Gen-AI and which model?

The answer to these questions depends on your specific needs, preferences, and budget. Each model has its own strengths and weaknesses, and you should consider factors such as the quality, speed, scalability, reliability, and overall value that each deliver. Here are some questions that you can ask yourself to help decide if you should use Gen-AI at all… and if you do, which model is the best for your purposes:

What is the purpose and goal of your AI project?
What type of responses do you want to generate, and how complex or creative do they need to be? Complex answers would require GPT4 or GPT-4 Turbo. GPT-4 can create larger responses, but GPT-4 Turbo can handle much larger prompts.
How much text do you need to generate, and how frequently do you need to generate it?
How much are you willing to pay for the service, and what is your budget limit?
How important is the accuracy, consistency, and quality of the models generated responses?
How important is the speed, scalability, and reliability of the model? GPT-3.5 Turbo will generate faster responses, but the answers may not be as rich or comprehensive.
How much time are you likely to save and what is that worth?
Are there alternatives – e.g. a powerful, effective search solution?

In very basic terms if simpler responses are required for more basic requirements and speed is important, then use GPT 3.5 Turbo. If you require more complex, creative and larger responses then choose GPT-4. If you require much larger inputs and the ability to consume much larger material from the AI Index then choose GPT-4 Turbo.

Considering the above, you can narrow down your options and choose the model that suits your needs and expectations. You can also try out the different models and compare their results and performance. Ultimately, the best AI service is the one that can help you achieve your AI goals and deliver the best value for your money.

Conclusion

Understanding the real costs associated with prompts and responses in Azure OpenAI is essential for effectively managing your AI budget. By focusing on efficient prompt engineering, monitoring usage, and selecting the appropriate model, businesses can leverage the powerful capabilities of Azure OpenAI while keeping costs under control.

Azure OpenAI: what are the real costs for prompts and responses?

How much does Azure OpenAI cost for text generation?

Input / Output Token Costs on Azure OpenAI

Every time you ask Azure OpenAI a question… you get a bill

Example 1

Example 2

Example 3

Cost Calculator for Azure Open AI

Ready to calculate your costs using our Azure OpenAI Calculator?

When should I use Gen-AI and which model?

Conclusion

Generative AI Guide

Topics

Azure OpenAI: what are the real costs for prompts and responses?

How much does Azure OpenAI cost for text generation?

Input / Output Token Costs on Azure OpenAI

Every time you ask Azure OpenAI a question… you get a bill

Example 1

Example 2

Example 3

Cost Calculator for Azure Open AI

Ready to calculate your costs using our Azure OpenAI Calculator?

When should I use Gen-AI and which model?

Conclusion

Stay in touch!

Generative AI Guide

Topics

Related posts

Mastering AI for legal firms: The critical need for robust governance

How to reduce the environmental impact of using AI?

SharePoint agents: Features, Use Cases, Pricing and FAQs