The Last Mile Problem: Why General LLMs Can’t Run Your Research Workflow

Vikas 2025-11-27
The Last Mile Problem: Why General LLMs Can’t Run Your Research Workflow

Every serious investor is running the same experiment right now.

Someone at the desk opens ChatGPT, Gemini, Claude, or Perplexity and types a query that sounds trivial:

“Which NBFCs mentioned write-offs in their Q2 FY26 disclosures?”

The answer arrives instantly.
Clean bullets. Confident phrasing. Perfect fluency.
For a moment, it feels like the morning meeting just automated itself.

Then you check the sources:

  • Company A never mentioned write-offs, the model pulled a line from a news article.
  • Company B’s summary came from a broker note, not the filing.
  • Company C, the one with the biggest write-off, is missing because the model couldn’t parse the scanned PDF.

It’s structurally unreliable.

This is the point when teams realize the core mismatch:

General LLMs optimise for answering. Financial research requires verifying.

And forcing these models into a finance workflow exposes every architectural fault line underneath.

1. Why Uploading PDFs Doesn’t Fix the Problem

The natural reaction after a few bad answers is:

“Fine, I’ll upload the Annual Report myself. That will force the model to read the right data.”

And it works for a single company, a single document, and a single question.

But real research isn’t a one-document workflow.
The moment you try to replicate what analysts actually do, three immovable walls appear.

Wall One : The Single-Company Trap (Context Amnesia)

A serious coverage universe isn’t one PDF.

It’s:

  • 4–5 years of annual reports
  • 8–12 concall transcripts
  • Multiple investor presentations
  • Credit rating notes
  • Exchange announcements

Upload everything and ask, general models start blurring.

  • FY21 commentary blends with FY24 context
  • Standalone mixes with consolidated
  • A full year disappears
  • Guidance slips into actuals
  • Middle pages effectively vanish (“lost in the middle”)

You’re asking a probabilistic text generator to maintain multi-year continuity across thousands of messy tokens.
It simply wasn’t designed for that.

Wall Two : The Fake Citation Hunt

Even when an answer looks right, verification fails.

You click the citation:

  • It opens the PDF at Page 1
  • Or it highlights a paragraph mentioning “Revenue,” but not the number
  • Or the link points to the wrong document entirely
  • Or just shows parsed text & not the actual page

If every AI-generated answer still miss verifiability , nothing has been automated — the load has just been reshaped.

Wall Three : The Sector-Scale Collapse

Your actual workflow isn’t about one company — it’s a sector.

To answer the original NBFC question reliably, you need:

20 companies × 10–20 filings each = hundreds of PDFs, thousands of pages.

General LLMs break here for three reasons:

(a) Upload limits

You can upload a handful of documents per prompt or a limited set per workspace — never your full universe.

(b) Memory limits

Even long-context models cannot maintain precise multi-year relationships across hundreds of documents.

(c) No entity-level continuity

General LLMs treat each PDF as an isolated island.
Financial research requires a timeline, not a file list.

At this scale, “chat with PDF” is a good reading tool but not a research system.

The Real Document Footprint in Financial Research

Use Case Companies Documents (3-Year Window) General LLM Behaviour Actual Failure Mode
Single Stock \~1 50–100 • AR (3) • Credit Rating (6) • Concalls (12) • Presentations (12) • Announcements (60) Partially Works Local inconsistencies: standalone vs consolidated mixing, guidance/actual blending, missing middle pages, weak citations
Sector View 30–50 3,000–5,000 Not Operationally Possible Structural failures: upload limits, attention decay, timeline confusion, no cross-company continuity
Market View 500+ 50,000+ Not Operationally Possible Architecture ceiling: cannot ingest full universes, cannot maintain multi-year links, cannot aggregate or compare at scale

2. The Interface Fallacy : Why “Chat” Doesn’t Map to Finance

Even if hallucinations vanished and citations were perfect, chat still comes with constrains.

It’s not often that analysts work on one-off questions. They execute workflows:

  • Screening
  • Deep dives
  • Tracking guidance
  • Monitoring disclosures
  • Updating models
  • Comparing peers
  • Preparing desk notes

Chat is linear. Workflows are cyclical, persistent, and stateful.

If you need 20 prompts to recreate a quarterly update, you haven’t automated the work — you’ve turned research into a command line.

Architectural Mismatch: General LLM Stack vs Finance Stack

Layer General LLM Behaviour Finance Requirement Mismatch
Training Objective Predict next token (probabilistic completion) Retrieve exact fact from filings Fluency over accuracy
Memory / Context Long, but lossy (compression + attention decay) Multi-year continuity across filings “Lost in the middle” problem
Citation File-level links, unreliable grounding Line-level, page-level evidence mapping No positional metadata
OCR / Parsing Generic OCR + linear text extraction Layout-aware parsing for tables, scans, footnotes Breaks on Indian PDFs & scanned filings
Indexing Each file is isolated (File_ID) Entity-level timeline, cross-document graph No cross-document coherence
Workflow Linear chat → one-off Q\&A Cyclical research loops (screen, compare, monitor) Interface mismatch

3. Under the Hood: Why General LLMs Break on Filings

Most AI failures in finance have nothing to do with “bad prompts.”
They come from deep architectural constraints that make general LLMs fundamentally misaligned with statutory filings.

Below is a breakdown of the actual failure modes.

1. Probabilistic Text Prediction ≠ Factual Grounding

General LLMs are next-token predictors, not fact retrieval engines.
When a number is missing, they generate the statistically likely value, not the true one.

Consequence:
Hallucinations aren’t anomalies, they’re a built-in behavior of the architecture.

Requirement for finance:
A retrieval-first engine that is hard-coded to return “Not disclosed” when a fact is absent, instead of predicting a replacement.

2. Inability to Represent “Negative Knowledge”

LLMs are trained via RLHF to be helpful, not precise.
They avoid null answers because non-answers are penalised during training.

They cannot differentiate:

  • “The filing is silent on this,”
  • “I couldn’t find it,”
  • “Here’s a generic policy from somewhere else.”

Consequence:
Fabricated positives or irrelevant policies appear where regulatory silence is the actual fact.

Requirement for finance:
A system that performs container-level scans (e.g., Notes on Borrowings) and can return deterministic absence-as-information.

3. Attention Decay, Token Limits & “Lost in the Middle”

Even 1M-token models exhibit non-linear attention decay.
LLMs overweight the beginning and end of long contexts and underweight the middle.

Dense financial tables are the worst-case input: high precision, low redundancy, zero tolerance for interpolation.

Consequence:
Upload 2,000 pages → the model confuses years, interpolates missing rows, or merges standalone & consolidated data.

Requirement for finance:
Extract → Normalize → Compare
Facts must be extracted and anchored before reasoning.

4. The Ingestion Bottleneck (80% of the Real Work)

Real filings contain:

  • scanned pages
  • misaligned tables
  • multi-column layouts
  • nested footnotes
  • vernacular text
  • rotated or low-resolution images

Generic OCR + linear parsing collapses on a large share of Indian PDFs.

Consequence:
Downstream reasoning is corrupted before the LLM even begins — extracted data is incomplete, mis-ordered, or missing rows.

Requirement for finance:
layout-aware parser engineered for financial documents with deterministic reconstruction of tables, footnotes, and headers.

5. Coverage Gaps in Non-Standard Markets

Global systems optimise for standardized filings (10-K/10-Q).
Indian filings vary massively in naming, formatting, templates, and disclosure density.
Many small- and mid-cap filings are scanned or image-heavy.

Consequence:
Critical data is missing, OCR fails, entities are misclassified, and the long tail becomes invisible.

Requirement for finance:
Direct exchange-level ingestion (BSE/NSE) with India-specific parsing rules and entity resolution.

4. The Shift: From Reading to Indexing to Workflows

The fundamental mistake is assuming the unit of work is a PDF.

In finance, the unit of work is:

  • The Entity (the company), and
  • The Workflow (screening → deep dive → model update → monitoring)

A finance engine shouldn’t start reading when you ask a question.
It should have already parsed, indexed, structured, and linked the entire market before you log in.

The Takeaway

General-purpose LLMs are extraordinary conversational systems.
But they were not built for:

  • statutory precision
  • multi-year continuity
  • sector-scale ingestion
  • auditability
  • null reasoning
  • workflow persistence

Financial research is built on these constraints.
Until those architectural gaps close, chatbots will remain useful assistants but not research infrastructure.

CompoundingAI is an enterprise-grade vertical intelligence engine that transforms unstructured data into decision-grade insights within minutes, with source-level traceability for confident & auditable workflows across research, risk, and investment teams.

We cut the noise & directly deliver insights.

Powered by CompoundingAI — AI-powered Indian equity research