Price Elasticity of Demand for Large Language Models Estimated at -1.11

A working paper from the National Bureau of Economic Research (NBER) has estimated the price elasticity of demand for large language models (LLMs) at -1.11.

The paper, titled “The Emerging Market for Intelligence: Pricing, Supply, and Demand for LLMs,” uses usage data from OpenRouter to estimate demand price elasticity and draws on Microsoft internal data for a case study of market competition dynamics.

Methodology

A major challenge in estimating demand price elasticity for LLMs is price endogeneity. Higher-quality models command higher prices and also see greater usage, meaning naive regression would yield the absurd conclusion that usage is positively correlated with price—a positive price elasticity. The paper addresses this through a within-model identification strategy, leveraging price variation for the same model across different providers to estimate demand elasticity.

Data Source

OpenRouter is an LLM aggregation platform where the same open-source model is served by multiple cloud inference providers. Pricing differences across providers form the basis of this research.

For example, GPT-OSS-120B is offered by dozens of cloud providers including Fireworks, Together AI, and DeepInfra. The cheapest provider, GMI Cloud, prices input/output at $0.02/$0.10 per million tokens, while the most expensive, Cerebras, charges $0.35/$0.75 per million tokens. Correspondingly, GMI Cloud’s output speed is only 60 tokens/s, whereas Cerebras delivers up to 700 tokens/s.

The study includes only open-source models and excludes proprietary models. Proprietary models like GPT 5.2 have only one provider on OpenRouter (OpenAI), making identification impossible.

Empirical Model

The authors’ primary empirical model incorporates core variables including price, throughput, latency, and context size. Fixed effects γtm and θim control for time-model interaction trends and provider-model level quality differences, respectively.

The study identifies demand elasticity through two main sources of variation: (1) entry and exit of providers offering the same model, and (2) price adjustments by existing providers. Since model quality remains constant, this variation can reasonably isolate the causal effect of price on quantity demanded.

Results

After controlling for provider-model fixed effects and date-model fixed effects, the study finds demand elasticity for LLMs of approximately -1.11.

In the estimation, the throughput coefficient is negative, indicating that higher throughput is associated with lower demand—counterintuitive at first glance. The authors argue this is consistent with provider capacity constraints: when the number of requested tokens increases, throughput decreases. However, this raises endogeneity concerns, as demand inversely affects throughput.

Implications

The Jevons paradox suggests that despite improved efficiency and lower prices, usage can surge to such an extent that total revenue increases rather than decreases.

In a simplified scenario, this typically requires demand elasticity greater than 1 in absolute value. With price elasticity of -2, a 10% price reduction yields 20% demand growth, and the product 0.9 × 1.2 = 1.08 represents revenue growth.

The study’s elasticity coefficient of -1.11 indicates only a weak Jevons paradox at the provider level. However, considering the rapid increase in LLM penetration across society, a broader Jevons paradox may still exist.

Limitations

The study has two major limitations.

First, OpenRouter’s platform has its own routing mechanism. When users don’t explicitly specify a provider, the platform automatically allocates based on its internal algorithm. The study does not clearly distinguish between requests where users explicitly specified providers versus those routed automatically. This makes the analysis more akin to reverse-engineering OpenRouter’s internal routing algorithm rather than reflecting genuine user demand.

According to the regression tables, the within R² is only 0.06, indicating that the vast majority of variation is absorbed by fixed effects—which themselves reflect OpenRouter’s built-in scoring for model providers.

Furthermore, from OpenRouter’s perspective, setting price elasticity at -1 in its internal algorithm has a special property: price times quantity (total expenditure or revenue) becomes insensitive to price changes. A 10% price decrease leads to 10% more traffic, leaving total spending virtually unchanged. OpenRouter very likely has this as a hardcoded weight parameter in its algorithm to maintain stable revenue distribution.

The authors acknowledge this in a footnote:

An important caveat is that OpenRouter allows users to either select a specific provider or delegate the choice to its routing algorithm. OpenRouter’s algorithm selects providers based on a combination of price and other attributes. Thus, some of the observed price sensitivity may reflect routing decisions made by OpenRouter rather than direct user choices.

The second major issue is that the study assumes model-provider quality does not change over time, using fixed effects to absorb quality differences. This assumes that when a provider lowers prices, output quality remains unchanged. In reality, when providers quantize models from BF16 to FP8 while reducing prices, model quality does change.

Epoch AI’s evaluation shows that cloud providers perform noticeably worse when serving newer models compared to established ones. For instance, the recently released GLM 4.6 shows much greater variation across providers, while the mature Qwen 3 model shows smaller differences.

We find that providers are noticeably worse at serving newer models, in our case GLM-4.6, compared to established models such as Qwen3. This is consistent with other model releases, which are accompanied by bugs that are then fixed over time. —Epoch AI

Cloud providers fix bugs in their inference infrastructure and improve performance over time. For example, Artificial Analysis benchmarks indicated that Microsoft Azure’s hosted GPT-OSS-120B scored significantly lower than other providers. Microsoft technical staff subsequently confirmed and fixed the issue.

Before the fix, GPT-OSS-120B on Azure scored 80%.

After the fix, the score improved to 93%.

More rigorous LLM demand estimation would likely require A/B testing with different pricing for different users to observe usage changes, or access to OpenRouter’s internal data identifying which users explicitly specified their provider.

Significance

Despite its limitations, this paper is the first to estimate demand elasticity for LLMs and will almost certainly be widely cited. It currently represents the only reference point for LLM demand elasticity estimation.

Access

In addition to NBER, you can obtain the PDF version of this paper through author Andrey Fradkin’s personal website.

Price Elasticity of Demand for Large Language Models Estimated at -1.11

Methodology

Data Source

Empirical Model

Results

Implications

Limitations

Significance

Access

分享到：

赞过：

评论

发表评论取消回复

了解 AI前哨 的更多信息

了解 AI前哨的更多信息