Why LLMs can't reason over raw OHLCV data

Every financial data API on the market was built for the same customer: a developer writing formulas in a spreadsheet, or a quant building a backtest in Python. The response format reflects this. You get raw OHLCV bars. Open, high, low, close, volume. Rows and rows of numbers. Maybe an RSI calculation on top.

This worked fine for decades. But the consumer has changed. Today, the fastest-growing use case for financial data is feeding it into LLMs. AI agents, MCP integrations, autonomous trading bots. And raw OHLCV is a terrible format for all of them.

The token math

Here's the cost of asking a basic question with raw data.

"Is AAPL oversold?"

To answer this with a traditional API, your agent needs historical price data to compute RSI. That means pulling daily bars. A single year of daily OHLCV for one ticker is 252 trading days. Each bar carries at minimum 6 fields: date, open, high, low, close, volume.

What one year of daily bars actually costs

A typical OHLCV bar in JSON looks like this:

json

{
  "date": "2026-03-28",
  "open": 182.15,
  "high": 184.95,
  "low": 181.43,
  "close": 184.25,
  "volume": 62338249
}

That single bar is roughly 25 tokens. Multiply that across a full year:

Metric	Value
Trading days per year	252
Fields per bar	6 (date, open, high, low, close, volume)
Tokens per bar (JSON)	~25
Total tokens (1 ticker, 1 year)	~6,300
10 tickers, 1 year	~63,000
10 tickers + 3 indicators each	~100,000+

And that is before the LLM does any reasoning. That is just the raw data sitting in context.

LLMs cannot compute indicators

Now your agent needs to actually compute RSI from that data. LLMs are not reliable calculators. GPT-4, Claude, Gemini: none of them can consistently apply a 14-period RSI formula across 252 data points and arrive at the correct answer. They approximate. They hallucinate intermediate steps. They get the math wrong in ways that are difficult to detect because the output looks plausible.

Here is what happens when you ask an LLM to compute RSI from raw close prices:

It receives 252 bars of OHLCV (6,300 tokens consumed)
It attempts to calculate 14-period average gains and losses
It makes arithmetic errors in the running average (common with long sequences)
It returns a plausible-looking number, say RSI: 34.2
The actual RSI is 22.4. The agent makes a decision based on wrong data.

So you end up needing to compute the indicator yourself before passing it to the LLM. At which point you have built the exact infrastructure the API was supposed to save you from.

And that was just one indicator, for one ticker.

A typical agent workflow might monitor 10 tickers across RSI, MACD, trend direction, and support/resistance levels. With a raw data provider, you are looking at dozens of API calls and tens of thousands of tokens before the LLM even starts reasoning about what to do.

The menu problem

Some providers have noticed the token problem and started optimizing. Massive, for example, reduced their MCP tool definitions from 25,000 tokens down to 1,500 tokens. That is a real improvement. Fewer tokens describing what the API can do means the LLM spends less context on the menu and more on the task.

Tool definitions are the menu. Response payloads are the meal. You can shrink the menu and still serve a meal that overflows the table.

You can compress the menu to 1,500 tokens and still get back a response that dumps thousands of tokens of raw OHLCV data into context. The agent still has to process dense numerical arrays. It still cannot reliably compute a moving average from those arrays. The fundamental problem remains: the data format is wrong for LLMs.

Optimizing tool definitions is a good start. But the real leverage is in the response format.

Categories, not numbers

Consider the difference between these two responses to the question "What's the RSI situation for INTC?"

Raw numeric response

jsontraditional api

{
  "indicator": "RSI",
  "period": 14,
  "values": [
    {
      "date": "2026-03-28",
      "rsi": 22.41
    },
    {
      "date": "2026-03-27",
      "rsi": 24.87
    },
    {
      "date": "2026-03-26",
      "rsi": 28.33
    },
    {
      "date": "2026-03-25",
      "rsi": 31.02
    }
  ]
}

The LLM receives these numbers and now has to reason about what they mean. Is 22.41 oversold? Very oversold? How does it compare to this ticker's history? Is this rare or common? The model might know that RSI below 30 is generally considered oversold, but it has no context about whether this particular ticker has been at this level before, or how long it typically stays there.

Categorical response

jsontickerdb

{
  "ticker": "INTC",
  "rsi_zone": "deep_oversold",
  "days_in_oversold": 8,
  "historical_median_oversold_days": 4,
  "historical_max_oversold_days": 14,
  "condition_rarity": "very_rare",
  "condition_percentile": 2.1,
  "volume_context": "spike",
  "trend_context": "downtrend",
  "accumulation_state": "distribution",
  "sector": "Semiconductors",
  "valuation_zone": "deep_value"
}

The LLM doesn't need to compute anything. Every field is a pre-computed fact:

deep_oversold is unambiguous. No threshold interpretation needed.
condition_rarity: "very_rare" tells it this situation is unusual for this specific ticker.
days_in_oversold: 8 with historical_median_oversold_days: 4 tells it this has persisted longer than typical.
volume_context: "spike" confirms something is actively happening.
accumulation_state: "distribution" indicates sellers are in control.
valuation_zone: "deep_value" adds a fundamental dimension the agent can weigh against the technical picture.

The model can immediately start reasoning about the implications rather than trying to derive them from raw numbers.

Why categories work better for LLMs

This is the core idea behind categorical data for LLMs. Instead of passing numbers and expecting the model to interpret them, you pass pre-computed, labeled facts. The interpretation has already happened on the server, where it can be done deterministically.

Property	Raw numbers	Categorical labels
Interpretation	LLM must derive meaning from values	Meaning is the data
Consistency	Different LLMs interpret thresholds differently	Same label, same meaning, every time
Historical context	Requires additional data + computation	Built into fields like `condition_rarity`
Multi-indicator synthesis	LLM must cross-reference multiple numeric arrays	All dimensions in one flat object
Failure mode	Silent math errors (plausible but wrong)	No math to get wrong

The scan-to-lookback workflow

The difference becomes even more dramatic in multi-step workflows. Consider a common agent pattern: scan the market for opportunities, then look back at historical precedents to validate the thesis.

Step 1: Scan for oversold assets

terminal

$ curl -G https://api.tickerdb.com/v1/search \
  --data-urlencode 'filters=[{"field":"asset_class","op":"eq","value":"stock"},{"field":"momentum_rsi_zone","op":"eq","value":"deep_oversold"}]' \
  -H "Authorization: Bearer YOUR_API_KEY"

jsonresponse

{
  "timeframe": "daily",
  "date": "2026-03-28",
  "fields": [
    "ticker",
    "asset_class",
    "momentum_rsi_zone",
    "extremes_days_in_condition",
    "extremes_condition_rarity",
    "extremes_condition_percentile",
    "volume_ratio_band",
    "volume_accumulation_state",
    "trend_direction",
    "fundamentals_valuation_zone"
  ],
  "filter_count": 2,
  "result_count": 1,
  "results": [
    {
      "ticker": "BGM",
      "asset_class": "stock",
      "momentum_rsi_zone": "deep_oversold",
      "extremes_days_in_condition": 12,
      "extremes_condition_rarity": "extremely_rare",
      "extremes_condition_percentile": 0.8,
      "volume_ratio_band": "extremely_high",
      "volume_accumulation_state": "strong_distribution",
      "trend_direction": "strong_downtrend",
      "fundamentals_valuation_zone": "deep_value"
    }
  ]
}

One API call. The agent immediately sees that BGM is in deep_oversold territory and that this condition is extremely_rare (0.8th percentile). But it also picks up contradicting context:

Bullish case: deep_oversold, deep_value, extremely_rare condition
Bearish case: strong_downtrend, strong_distribution, volume_ratio_band: extremely_high (active selling pressure)

Oversold and cheap, but with sellers in control. A classic potential value trap setup. An LLM can read these labels and immediately reason about the tension between them. With raw numbers, it would need to compute RSI, rank the condition historically, infer whether volume confirms the move, and derive the broader trend state. Each step is a potential point of failure.

Step 2: Check historical precedent

The agent wants to know: what happened the last time BGM was in this condition?

terminal

$ curl https://api.tickerdb.com/v1/summary/BGM?field=momentum_rsi_zone&band=deep_oversold&limit=5 \
  -H "Authorization: Bearer YOUR_API_KEY"

jsonresponse

{
  "ticker": "BGM",
  "field": "momentum_rsi_zone",
  "events": [
    {
      "date": "2025-11-14",
      "band": "deep_oversold",
      "prev_band": "oversold",
      "duration_days": 9,
      "aftermath": {
        "5d": {
          "performance": "slight_decline"
        },
        "10d": {
          "performance": "moderate_decline"
        },
        "20d": {
          "performance": "slight_decline"
        },
        "50d": {
          "performance": "moderate_decline"
        },
        "100d": {
          "performance": "sharp_decline"
        }
      }
    },
    {
      "date": "2025-06-03",
      "band": "deep_oversold",
      "prev_band": "neutral_low",
      "duration_days": 5,
      "aftermath": {
        "5d": {
          "performance": "slight_gain"
        },
        "10d": {
          "performance": "flat"
        },
        "20d": {
          "performance": "slight_decline"
        },
        "50d": {
          "performance": "moderate_decline"
        },
        "100d": {
          "performance": "sharp_decline"
        }
      }
    }
  ],
  "total_occurrences": 3,
  "query_range": "5y"
}

Two API calls total. TickerDB's summary event mode returns pre-computed aftermath data showing what actually happened after each historical occurrence. The pattern is clear:

Event date	5d	10d	20d	50d	100d
2025-11-14	slight decline	moderate decline	slight decline	moderate decline	sharp decline
2025-06-03	slight gain	flat	slight decline	moderate decline	sharp decline

Even the brief 5-day bounce in June faded to a sharp_decline by 100 days. This is a stock that looks cheap but keeps getting cheaper. The value trap hypothesis is confirmed by actual historical data.

No indicator computation. No pulling years of price bars. No forward return calculations. The agent gets pre-computed facts and immediately starts reasoning about what to do.

What the raw data equivalent looks like

To answer the same question with a traditional provider like Alpha Vantage, your agent (or your code) would need to:

Pull the full RSI history: GET /query?function=RSI&symbol=BGM&outputsize=full
Scan the array for values below 20 (deep_oversold threshold)
Pull daily price bars in a separate call: GET /query?function=TIME_SERIES_DAILY&symbol=BGM&outputsize=full
For each oversold crossing, compute forward returns at 5d, 10d, 20d, 50d, 100d
Categorize those returns into performance bands
Pass all of this derived data to the LLM

That is 2+ API calls returning thousands of rows, plus custom code for the entire aftermath pipeline. And you would need to build and maintain that pipeline for every indicator you want to look back on.

Full comparison

Here is the same question answered both ways: "Is there an oversold stock worth looking at today, and what happened last time it was in this condition?"

	Raw data provider	TickerDB
API calls	10+ (scan universe, pull bars, compute RSI per ticker, pull historical bars, compute forward returns)	2 (search + summary event mode)
Developer code required	RSI calculation, percentile ranking, forward return computation, performance bucketing	None
Tokens into LLM context	Thousands per ticker (raw bars + indicator arrays)	A fraction of that (categorical labels, pre-computed aftermath)
LLM computation	Interpret raw numbers, compare to thresholds, derive meaning	None. Labels are the meaning.
Risk of LLM math errors	High. RSI, MACD, and moving averages are multi-step calculations LLMs get wrong	Zero. Computation happens on the server.
Aftermath context	You build it. Pull years of bars, compute indicators daily, identify crossings, calculate forward returns, categorize performance.	Built in. One parameter.

The gap is not just about token count. It is about where the computation happens. Raw data providers outsource the hard part to the LLM (or to you). Categorical data handles it on the server, where deterministic code can do it correctly every time.

What this means for agents

If you are building an AI agent, MCP integration, or any LLM-powered system that needs financial data, the format of that data matters more than the breadth of the API.

An API with 200 endpoints returning raw numbers will underperform an API with a handful of endpoints returning the right abstractions. Your agent does not need 252 daily close prices. It needs to know that the stock is in a strong_uptrend with decelerating momentum and a squeeze_active in volatility.

Consider what a well-structured categorical response gives your agent in a single call:

Trend state: direction, ma_alignment, volume_confirmation
Momentum: rsi_zone, macd_state, divergence_detected
Extremes: condition_rarity, condition_percentile, days_in_condition
Volatility: regime, squeeze_active, regime_trend
Volume: accumulation_state, climax_detected
Fundamentals: valuation_zone, growth_zone, earnings_proximity

All of that in one flat JSON object. No arrays of 252 bars. No multi-step indicator pipelines. No room for LLM math errors. Here is what that looks like in practice:

jsonGET /v1/summary/AAPL

{
  "ticker": "AAPL",
  "trend": {
    "direction": "uptrend",
    "duration_days": 18,
    "ma_alignment": "aligned_bullish",
    "volume_confirmation": "confirmed"
  },
  "momentum": {
    "rsi_zone": "neutral_high",
    "macd_state": "contracting_positive",
    "direction": "decelerating",
    "divergence_detected": false
  },
  "volatility": {
    "regime": "normal",
    "squeeze_active": false,
    "regime_trend": "stable"
  },
  "volume": {
    "accumulation_state": "accumulation",
    "climax_detected": false
  },
  "fundamentals": {
    "valuation_zone": "fair_value",
    "growth_zone": "moderate_growth",
    "earnings_proximity": "this_month"
  }
}

One call. Every field is a label the LLM can reason about directly. An agent reading this response can immediately synthesize: AAPL is in an uptrend with bullish alignment, but momentum is decelerating. Volatility is stable with no squeeze. Accumulation is healthy. Fair value with moderate growth and earnings coming this month. That is useful analysis from a single HTTP request.

Categorical data is not a simplification. It is the correct abstraction layer for LLM consumption. The same way you would not pass raw pixel data to a language model and ask it to identify objects, you should not pass raw OHLCV data and ask it to identify market conditions.

The right data format eliminates an entire class of errors, reduces token usage, and lets the model focus on what it is actually good at: reasoning about labeled facts and making decisions.

TickerDB turns raw market data into categorical market intelligence for APIs, MCP clients, and AI agents. Read the docs or try it free.