I spent some time with Andrej Karpathy’s excellent “2025 LLM Year in Review” and I think it is one of the clearest windows into where large language models are actually headed.
You can read the original post here:
2025 LLM Year in Review, by Andrej Karpathy
Karpathy is writing for a broad technical audience, not specifically for commercial real estate. In this post I want to translate his key ideas into what they mean for CRE professionals, and especially for clients of CRE Agents.
At a high level, his message is simple:
LLMs in 2025 are extremely useful, very weird, and nowhere near their ceiling.
Below is how I would summarize his article through a CRE lens, followed by the three biggest takeaways for CRE Agents clients.
1. LLMs Learned To “Play For Points,” Not Just “Sound Smart”
Karpathy’s first and most important theme is the rise of Reinforcement Learning from Verifiable Rewards (RLVR).
Historically, LLMs were trained in three main stages:
- Pretraining on internet-scale text
- Supervised finetuning to follow instructions
- RLHF (Reinforcement Learning from Human Feedback) to be more helpful and safe
In 2025, a new stage became central: RLVR, where models are trained against objective, automatically checkable rewards in domains like math and code.
Instead of “please sound like a helpful assistant,” the model is pushed to “get this answer exactly correct” across millions of small environments. Over long runs, the model starts to develop behaviors we would casually call “reasoning”:
- Breaking problems into intermediate steps
- Trying multiple solution paths
- Checking and revising its own work
Why this matters for CRE Agents clients
RLVR is exactly the kind of training that makes AI useful for work like:
- Building and checking complex underwriting models
- Debugging Excel logic or code for internal tools
- Systematically exploring scenarios and sensitivities
Practically, it means the “digital coworker” you use is not just parroting patterns from text, it has been hardened against tasks where there is a right and wrong answer.
You still have to review its work, but the baseline quality and consistency keep climbing because the underlying models are being trained to win at games where correctness is rewarded directly.
2. “Ghosts,” Not “Animals” – Why LLMs Feel Brilliant And Broken At The Same Time
Karpathy argues that we should stop thinking about LLMs as “robots getting smarter” and start thinking of them as “ghosts” we summon with text.
Humans and LLMs are optimized for completely different things:
- Humans: survival and social success in the physical world
- LLMs: predicting text, solving verifiable tasks, and getting rewarded in synthetic environments
The result is jagged intelligence:
- In some domains (coding, math puzzles, some forms of writing) they are shockingly capable
- In others (basic common sense, subtle security awareness, avoiding traps) they can be childlike or worse
Karpathy notes that benchmarks are increasingly unreliable signals of “general intelligence.” Labs can overfit models to beat specific tests, which produces impressive scores without solving broad reasoning.
Why this matters for CRE Agents clients
You should treat LLMs as:
- Specialist savants in certain workflows (data extraction, modeling, coding, document drafting)
- Unreliable generalists in others (unguided judgment, security, unsupervised autonomy)
For you, that implies:
- Use AI heavily where the work is structured, checkable, and repeatable: underwriting, data cleaning, pipeline maintenance, memo drafting, process documentation
- Keep a human in the loop where the work is ambiguous, political, or irreversible: investment committee decisions, capital partner communication strategy, key negotiation positions
Karpathy’s “ghosts vs. animals” framing is a good safety rail. You are not raising a junior analyst who will “grow up.” You are operating a non-human intelligence that will always have sharp spikes and deep holes.
3. A New Layer: “Cursor For X” And The Rise Of Vertical AI Apps
Karpathy highlights Cursor as a turning point: not just a wrapper around an LLM, but a new kind of product that:
- Engineers context for a specific domain
- Orchestrates multiple LLM calls in a directed acyclic graph (DAG)
- Provides a domain-specific UI
- Gives the user an “autonomy slider”
This is exactly the pattern CRE Agents is built on, just pointed at commercial real estate instead of software development.
Karpathy’s view is that:
- Foundation model labs will ship “generally capable college students”
- Vertical apps will organize them into professionals in specific industries by adding data, tools, feedback loops, and workflow logic
Why this matters for CRE Agents clients
You should expect your AI stack to look less like “one chat window” and more like:
- A CRE-native front end that understands cap rates, DSCR, lease-up, expense ratios, tax reassessments
- A tool orchestration layer that chains models with your spreadsheets, data rooms, email, and calendar
- A governed autonomy slider, from “draft this email” up to “run this entire underwriting pattern, then summarize what changed”
In other words, value will sit in vertical AI applications, not just which base model is hottest this quarter.
4. Agents That “Live On Your Computer” And Know Your World
Karpathy points to Claude Code as the first convincing example of an LLM agent that:
- Runs close to your own environment
- Has access to your files, repos, tools, and configuration
- Chains tools and reasoning in a loop until a problem is actually solved
He contrasts this with cloud-only agents that live in remote containers and never really become “part of your machine.”
Why this matters for CRE Agents clients
For CRE, the analog is clear:
- Your “digital coworker” cannot live only in a browser tab
- It needs controlled access to your models, OM library, drive structure, email threads, and pipeline sheets
- It must be able to call tools and workflows on that environment, not just chat about it in the abstract
Done right, your AI agent starts to feel less like a website and more like a specialist who sits inside your firm’s systems.
5. Vibe Coding And The Coming Flood Of Custom Tools
Karpathy coined the term “vibe coding” for building real software by just describing what you want in natural language and iterating with the AI.
In 2025, the capabilities finally reached the point where:
- Non-expert developers can ship surprisingly strong tools without deep language or framework knowledge
- Experienced developers can spin up one-off utilities, experiments, and internal tools that would never have been economical before
- Code starts to feel cheap, disposable, and abundant
He gives examples of creating custom tokenizers, small applications, and even ephemeral tools just to debug a single issue.
Why this matters for CRE Agents clients
Over the next few years you should expect:
- Far more firm-specific automations, scripts, and micro-tools around your deals
- Analyst and associate roles that include “describe what you want, vibe code it with the AI, then test and deploy”
- A gradual shift from “buy or build” to “describe, generate, and refine” for internal software
If you are willing to let AI generate and manage more of the glue code and automation logic, your bottleneck shifts to:
- Defining the workflow clearly
- Governing data access and permissions
- Reviewing outputs and failure cases
This is where a platform like CRE Agents sits: opinionated workflows and guardrails for a domain that will be increasingly full of AI-authored code.
6. Beyond Chat: The “LLM GUI” And Visual Interfaces
Finally, Karpathy calls Google Gemini Nano “banana” one of the most paradigm-shifting models of 2025 because it hints at a true LLM GUI.
His point:
- Text is the “native format” for models and computers
- Humans prefer visual and spatial information: diagrams, dashboards, slides, whiteboards, small web apps
- Just like traditional computing evolved from command lines to GUIs, LLMs will evolve from plain chat to rich, visual interfaces generated and manipulated by the model itself
Nano banana is interesting not only because it can generate images, but because it can combine text, images, and world knowledge in one brain.
Why this matters for CRE Agents clients
You should expect your AI coworker to:
- Draw site plans, rent roll visualizations, sensitivity charts, timelines, capital stacks
- Turn conversations into live dashboards and explorable models
- Replace many static PDF outputs with interactive, AI-generated GUIs that sit on top of your data
This is extremely aligned with how CRE people actually think: you want to “see” the deal, not just read another memo.
The Three Biggest Takeaways For CRE Agents Clients
If I had to distill Karpathy’s article into three practical points for CRE Agents clients, they would be:
- Reasoning is real, but jagged.
- RLVR has made models far better at structured, checkable tasks like modeling and coding.
- At the same time, they are still shockingly brittle in other areas.
- Use them aggressively for underwriting, automation, and document work, but keep humans in charge of ambiguous judgment and governance.
- Vertical AI apps will own the value, not raw models.
- Cursor in software is the pattern: a domain-specific UI plus orchestrated LLM workflows plus an autonomy slider.
- In CRE, that maps directly to platforms like CRE Agents that bundle models, data, and tools into deal workflows, not generic chatbots.
- Your competitive edge will come from how well your firm plugs into these vertical stacks, not which base model you “picked.”
- Every CRE firm will become a small software factory, whether it wants to or not.
- Vibe coding and emerging LLM GUIs mean that analysts and principals can co-create workflows, tools, and visual interfaces with AI.
- Code and UI become cheap; clear process definitions, data quality, and review become the hard part.
- Firms that embrace this shift will spin up custom automations and decision tools around their strategy, at a pace that was impossible even two years ago.
Karpathy closes by saying that we are still nowhere near 10 percent of the potential of current models, and that the field feels wide open.
If you are a CRE Agents client, that should not feel abstract. It should feel like a very concrete question:
“Given where LLMs really are in 2025, which parts of my investment, asset management, and capital raising workflows should I hand to a digital coworker this quarter, and which parts do I want to keep firmly human for now?”
That is the frontier we are building toward.
Source
- Andrej Karpathy, “2025 LLM Year in Review”