What Andrej Karpathy’s 2025 LLM Review Means For CRE

December 19, 2025

I spent some time with Andrej Karpathy’s excellent “2025 LLM Year in Review” and I think it is one of the clearest windows into where large language models are actually headed.

You can read the original post here:
2025 LLM Year in Review, by Andrej Karpathy

Karpathy is writing for a broad technical audience, not specifically for commercial real estate. In this post I want to translate his key ideas into what they mean for CRE professionals, and especially for clients of CRE Agents.

At a high level, his message is simple:

LLMs in 2025 are extremely useful, very weird, and nowhere near their ceiling.

Below is how I would summarize his article through a CRE lens, followed by the three biggest takeaways for CRE Agents clients.

1. LLMs Learned To “Play For Points,” Not Just “Sound Smart”

Karpathy’s first and most important theme is the rise of Reinforcement Learning from Verifiable Rewards (RLVR).

Historically, LLMs were trained in three main stages:

Pretraining on internet-scale text
Supervised finetuning to follow instructions
RLHF (Reinforcement Learning from Human Feedback) to be more helpful and safe

In 2025, a new stage became central: RLVR, where models are trained against objective, automatically checkable rewards in domains like math and code.

Instead of “please sound like a helpful assistant,” the model is pushed to “get this answer exactly correct” across millions of small environments. Over long runs, the model starts to develop behaviors we would casually call “reasoning”:

Breaking problems into intermediate steps
Trying multiple solution paths
Checking and revising its own work

Why this matters for CRE Agents clients

RLVR is exactly the kind of training that makes AI useful for work like:

Building and checking complex underwriting models
Debugging Excel logic or code for internal tools
Systematically exploring scenarios and sensitivities

Practically, it means the “digital coworker” you use is not just parroting patterns from text, it has been hardened against tasks where there is a right and wrong answer.

You still have to review its work, but the baseline quality and consistency keep climbing because the underlying models are being trained to win at games where correctness is rewarded directly.

2. “Ghosts,” Not “Animals” – Why LLMs Feel Brilliant And Broken At The Same Time

Karpathy argues that we should stop thinking about LLMs as “robots getting smarter” and start thinking of them as “ghosts” we summon with text.

Humans and LLMs are optimized for completely different things:

Humans: survival and social success in the physical world
LLMs: predicting text, solving verifiable tasks, and getting rewarded in synthetic environments

The result is jagged intelligence:

In some domains (coding, math puzzles, some forms of writing) they are shockingly capable
In others (basic common sense, subtle security awareness, avoiding traps) they can be childlike or worse

Karpathy notes that benchmarks are increasingly unreliable signals of “general intelligence.” Labs can overfit models to beat specific tests, which produces impressive scores without solving broad reasoning.

Why this matters for CRE Agents clients

You should treat LLMs as:

Specialist savants in certain workflows (data extraction, modeling, coding, document drafting)
Unreliable generalists in others (unguided judgment, security, unsupervised autonomy)

For you, that implies:

Use AI heavily where the work is structured, checkable, and repeatable: underwriting, data cleaning, pipeline maintenance, memo drafting, process documentation
Keep a human in the loop where the work is ambiguous, political, or irreversible: investment committee decisions, capital partner communication strategy, key negotiation positions

Karpathy’s “ghosts vs. animals” framing is a good safety rail. You are not raising a junior analyst who will “grow up.” You are operating a non-human intelligence that will always have sharp spikes and deep holes.

3. A New Layer: “Cursor For X” And The Rise Of Vertical AI Apps

Karpathy highlights Cursor as a turning point: not just a wrapper around an LLM, but a new kind of product that:

Engineers context for a specific domain
Orchestrates multiple LLM calls in a directed acyclic graph (DAG)
Provides a domain-specific UI
Gives the user an “autonomy slider”

This is exactly the pattern CRE Agents is built on, just pointed at commercial real estate instead of software development.

Karpathy’s view is that:

Foundation model labs will ship “generally capable college students”
Vertical apps will organize them into professionals in specific industries by adding data, tools, feedback loops, and workflow logic

Why this matters for CRE Agents clients

You should expect your AI stack to look less like “one chat window” and more like:

A CRE-native front end that understands cap rates, DSCR, lease-up, expense ratios, tax reassessments
A tool orchestration layer that chains models with your spreadsheets, data rooms, email, and calendar
A governed autonomy slider, from “draft this email” up to “run this entire underwriting pattern, then summarize what changed”

In other words, value will sit in vertical AI applications, not just which base model is hottest this quarter.

4. Agents That “Live On Your Computer” And Know Your World

Karpathy points to Claude Code as the first convincing example of an LLM agent that:

Runs close to your own environment
Has access to your files, repos, tools, and configuration
Chains tools and reasoning in a loop until a problem is actually solved

He contrasts this with cloud-only agents that live in remote containers and never really become “part of your machine.”

Why this matters for CRE Agents clients

For CRE, the analog is clear:

Your “digital coworker” cannot live only in a browser tab
It needs controlled access to your models, OM library, drive structure, email threads, and pipeline sheets
It must be able to call tools and workflows on that environment, not just chat about it in the abstract

Done right, your AI agent starts to feel less like a website and more like a specialist who sits inside your firm’s systems.

5. Vibe Coding And The Coming Flood Of Custom Tools

Karpathy coined the term “vibe coding” for building real software by just describing what you want in natural language and iterating with the AI.

In 2025, the capabilities finally reached the point where:

Non-expert developers can ship surprisingly strong tools without deep language or framework knowledge
Experienced developers can spin up one-off utilities, experiments, and internal tools that would never have been economical before
Code starts to feel cheap, disposable, and abundant

He gives examples of creating custom tokenizers, small applications, and even ephemeral tools just to debug a single issue.

Why this matters for CRE Agents clients

Over the next few years you should expect:

Far more firm-specific automations, scripts, and micro-tools around your deals
Analyst and associate roles that include “describe what you want, vibe code it with the AI, then test and deploy”
A gradual shift from “buy or build” to “describe, generate, and refine” for internal software

If you are willing to let AI generate and manage more of the glue code and automation logic, your bottleneck shifts to:

Defining the workflow clearly
Governing data access and permissions
Reviewing outputs and failure cases

This is where a platform like CRE Agents sits: opinionated workflows and guardrails for a domain that will be increasingly full of AI-authored code.

6. Beyond Chat: The “LLM GUI” And Visual Interfaces

Finally, Karpathy calls Google Gemini Nano “banana” one of the most paradigm-shifting models of 2025 because it hints at a true LLM GUI.

His point:

Text is the “native format” for models and computers
Humans prefer visual and spatial information: diagrams, dashboards, slides, whiteboards, small web apps
Just like traditional computing evolved from command lines to GUIs, LLMs will evolve from plain chat to rich, visual interfaces generated and manipulated by the model itself

Nano banana is interesting not only because it can generate images, but because it can combine text, images, and world knowledge in one brain.

Why this matters for CRE Agents clients

You should expect your AI coworker to:

Draw site plans, rent roll visualizations, sensitivity charts, timelines, capital stacks
Turn conversations into live dashboards and explorable models
Replace many static PDF outputs with interactive, AI-generated GUIs that sit on top of your data

This is extremely aligned with how CRE people actually think: you want to “see” the deal, not just read another memo.

The Three Biggest Takeaways For CRE Agents Clients

If I had to distill Karpathy’s article into three practical points for CRE Agents clients, they would be:

Reasoning is real, but jagged.
- RLVR has made models far better at structured, checkable tasks like modeling and coding.
- At the same time, they are still shockingly brittle in other areas.
- Use them aggressively for underwriting, automation, and document work, but keep humans in charge of ambiguous judgment and governance.
Vertical AI apps will own the value, not raw models.
- Cursor in software is the pattern: a domain-specific UI plus orchestrated LLM workflows plus an autonomy slider.
- In CRE, that maps directly to platforms like CRE Agents that bundle models, data, and tools into deal workflows, not generic chatbots.
- Your competitive edge will come from how well your firm plugs into these vertical stacks, not which base model you “picked.”
Every CRE firm will become a small software factory, whether it wants to or not.
- Vibe coding and emerging LLM GUIs mean that analysts and principals can co-create workflows, tools, and visual interfaces with AI.
- Code and UI become cheap; clear process definitions, data quality, and review become the hard part.
- Firms that embrace this shift will spin up custom automations and decision tools around their strategy, at a pace that was impossible even two years ago.

Karpathy closes by saying that we are still nowhere near 10 percent of the potential of current models, and that the field feels wide open.

If you are a CRE Agents client, that should not feel abstract. It should feel like a very concrete question:

“Given where LLMs really are in 2025, which parts of my investment, asset management, and capital raising workflows should I hand to a digital coworker this quarter, and which parts do I want to keep firmly human for now?”

That is the frontier we are building toward.

Source

Andrej Karpathy, “2025 LLM Year in Review”

AI for real estate is here. Are you ready?

Join thousands of commercial real estate professionals leading the AI transformation.

Join our newsletter

Stay ahead with exclusive market insights, deal strategies, and industry trends delivered to your inbox.