Claude Code review: the agentic CLI that actually finishes the task

After six months of using Claude Code as my primary AI coding interface, here is what it does better than Cursor, where it still trips, and which jobs it should own on a senior engineer’s machine.

C Charles Lin · May 12, 2026

Our verdict

Best for: Senior engineers who already think in shells. Multi-file refactors, large-codebase exploration, anything where you want an agent that actually finishes before asking you what to do next.

Not for: Engineers who want a GUI-first experience or live in an IDE that the CLI cannot reach into. If you do not have a terminal open most of the day, the friction adds up.

9.0 / 10

I have been using Claude Code as my primary AI coding interface since late 2025. Before that I rotated between Cursor, Aider, and a hand-rolled setup with the Anthropic API directly. The reason Claude Code stuck is not that it has the best autocomplete — Cursor still beats it at line-by-line tab-tab-tab — but that it is the first agent I have used where I can reasonably hand off a multi-step task and not feel the need to check on it every ninety seconds.

This is a working-engineer review. I am not going to walk through the install — the docs cover it. I want to talk about what changes in how you work, where the rough edges still are, and whether it is worth replacing the IDE-centric workflow you already have.

What Claude Code actually is

Claude Code is a CLI tool. You run claude in a terminal, point it at a directory, and you have a conversational interface to Anthropic’s Claude model with a set of built-in tools — file read and edit, shell execution, web fetch, and a few more — plus whatever MCP servers you have plugged in. That is the entire mental model.

The CLI metaphor matters more than it sounds. Every other AI coding tool I have used embeds itself in an editor. The editor’s text buffer is the unit of work. Claude Code’s unit of work is the conversation — the editor is incidental. If you are used to thinking in terminal sessions, this lands immediately. If you are not, the first few hours feel weird.

The agentic loop, in practice

Most “agentic” coding tools spend a lot of marketing energy on what their agent can do. In daily use the only thing that matters is whether the agent finishes. I would rather have an agent that does five things and stops cleanly than one that does fifteen things and asks me to confirm each one.

Claude Code, with the right model selected (Opus 4 for hard tasks, Sonnet for routine work), finishes. Here are the things I have handed it without babysitting:

“Add a --dry-run flag to this script. It needs to skip the destructive parts but still log what it would have done. Update the tests.”
“I am seeing a flaky test in __tests__/auth.test.ts. Find the race condition and fix it. If it requires changing the implementation, do that too.”
“Read the schema migration in migrations/0042_*.sql and tell me whether the backfill is safe under concurrent writes. If it isn’t, suggest the fix.”

In each of those, Claude Code went through the loop — read the relevant files, formed a hypothesis, made changes, ran the tests, iterated when they failed, reported back — without me intervening. This is not always the case. About one task in five it stalls or goes off into a wrong direction and needs a course correction. But the success rate is dramatically higher than what I was getting from Cursor’s agent mode six months ago, and it has continued to improve through model updates.

Where it consistently beats the IDE-based competitors

1. Multi-file refactors

This is the killer use case. Tell Claude Code “rename getUser to loadUser across the codebase and update all the tests” and it actually does it — not because the rename is hard (any IDE can do that) but because it handles the variants. The function with a slightly different name. The string reference in a config file. The comment block that mentions the old name. The mock object in the test file. Claude Code reads the codebase, makes the changes, and verifies they compile.

Cursor in agent mode can do a lot of this too, but in my experience it is noticeably more likely to give up partway, leaving you with a half-renamed codebase. Claude Code’s loop is more stubborn — which is exactly what you want for refactor work.

2. Large-codebase exploration

“How does authentication flow through this service?” is the kind of question that costs me thirty minutes the first time I touch a new codebase. With Claude Code, it takes about ninety seconds. The agent uses grep, reads the relevant files, follows the call chain, and gives me a structured answer with file paths and line numbers. The answer is usually right. When it is wrong, it is because the codebase has an unusual pattern that bit it — not because the agent gave up.

3. Long-running tasks

Claude Code has surprisingly robust handling for tasks that take real time. I have left it running multi-step migrations and come back twenty minutes later to find a clean working tree, passing tests, and a coherent summary of what it did. The session state is durable enough that I can Ctrl-C mid-task, come back, and pick up the conversation.

4. Honest “I do not know”

This is small but it matters. When Claude Code does not have enough context to be confident, it says so and asks. Cursor’s agent mode (as of late 2025) was more likely to guess. The cost of a confident-but-wrong agent compounds — you stop reading the diffs as carefully, which is exactly when the bug slips through.

Where it trips

Twelve weeks in — what the daily friction actually looks like

Pros

Agent loop reliably finishes multi-step tasks without intervention
Multi-file refactors are dramatically better than IDE-based competitors
Excellent at exploring unfamiliar codebases via grep and structured search
Honest about uncertainty — asks instead of guessing when context is missing
Native MCP support — drop-in tools for Postgres, GitHub, browser, filesystem, etc.
Session durability — Ctrl-C and resume works cleanly
No editor lock-in — works with Vim, Helix, VS Code, JetBrains, or none of them

Cons

Inline completion is non-existent — this is not a Cursor replacement for tab-tab coding
Cost adds up fast on Opus 4 for long sessions; budgeting per task takes discipline
Web UI (Claude.ai) and CLI session state do not share context — you start fresh in each
Terminal-only means TUI-based experience for diffs; Cursor wins on visual diff review
Permission system is conservative by default — first hour is heavy on approval prompts
Spinning up MCP servers correctly is a per-machine ritual, not a one-click experience

The single biggest weakness is inline completion. Claude Code is not built for the tab-tab-tab flow that Cursor and Copilot have nailed. If you spend most of your day writing single functions and want the IDE to finish your sentences, Claude Code will feel slow. The right pairing is to keep Copilot or Cursor’s autocomplete on in your editor and use Claude Code for the heavier tasks. That is what I do.

The second is cost. Opus 4 is excellent and expensive. I have had individual long-running sessions hit $4–6 in API cost when I let the agent grind through a complex refactor with lots of test iterations. This is still cheap relative to my hourly rate, but you have to think about it more than you think about a flat-rate Cursor subscription. The Claude Pro / Max plans flatten this for individual use; for team use the bill needs an owner.

The third is the permission model. Claude Code asks before doing anything destructive — which is correct — but the first time you point it at a new project you spend twenty minutes approving file edits, shell commands, and tool calls. There is a way to configure auto-approvals in ~/.claude/settings.json and per-project; the docs cover this. Plan to invest the time on day one or it will frustrate you.

The MCP ecosystem is the real moat

The Model Context Protocol — Anthropic’s open spec for plugging tools into AI assistants — is where Claude Code starts to feel like a different category of product. You can give it access to:

Your Postgres database (read-only queries, schema inspection)
A headless browser (fetch a URL, interact with a page, screenshot it)
GitHub (read PRs, issues, run actions)
A local filesystem outside the project root
Custom tools you write yourself in a few dozen lines

In practice this means Claude Code can do things like “look at this failing PR, read the CI logs, figure out what went wrong, and push a fix” or “query our production read-replica for users who hit the bug, summarise the pattern, and propose the fix.” Once you set this up, the rest of the AI coding ecosystem starts to feel narrow.

Cursor has been adding MCP support too, but as of late 2025 / early 2026 the Claude Code ergonomics are better. The configuration lives in ~/.claude/ and is portable across machines. Spinning up a new MCP server for a new tool is a config-file change, not a re-install.

Specs that actually matter

Claude Code at a glance

Pricing (individual): Free tier via API; Claude Pro $20/mo; Claude Max ~$100/mo with significantly higher usage caps
Pricing (team): API usage-based; team plans via Anthropic enterprise
Models: Claude Opus 4.x, Sonnet 4.x, Haiku 4.x — switchable mid-session
Platforms: macOS, Linux, Windows (via WSL); also runs in remote dev environments
IDE integrations: VS Code extension surfaces inline diffs; works with any editor via terminal
MCP support: Native — server config in ~/.claude/settings.json or per-project
Open source: CLI is closed source; SDK and MCP spec are open
Privacy posture: Code never used for training under standard ToS; enterprise options available

How I actually use it day-to-day

My typical day has two modes:

Editor mode (60% of the day): Helix open in a terminal, Copilot-style autocomplete on. I am writing new code, the keystrokes per second is high, the AI completes my sentences. Claude Code is closed.

Agent mode (40% of the day): Claude Code open in a second terminal pane. I hand it tasks that span multiple files or need exploration. “Refactor this module to use the new logger.” “Find every place we still call the deprecated endpoint.” “The build is failing on main — figure out why.”

The split changes by week. On a debugging-heavy week, Claude Code dominates. On a feature-writing week with a clear spec, editor mode dominates. The point is that the two are complementary, not competing — anyone telling you that you have to pick one or the other has never used both seriously.

Who should switch to it

You should try Claude Code if: you live in a terminal already, you do a lot of cross-file work, you maintain large legacy codebases, you spend significant time exploring unfamiliar code, or you are tired of agent modes that quit halfway. The learning curve is short — you will know in a week whether the workflow fits.

You should not switch if: you are deeply IDE-native, your primary work is greenfield single-file coding, you cannot expense API costs, or you are not comfortable with the agent making file edits without per-line approval. Cursor remains the better choice for the IDE-centric workflow, and we cover that in our dedicated Cursor review.

For most senior engineers I know, the right answer is to use both — Cursor or Copilot for inline completion in the editor, Claude Code in a second terminal for the heavier lifts. That hybrid is what I do, and the productivity gains over running either tool alone are real and consistent.

The bottom line

Claude Code is the first AI coding tool I have used where the agent is reliable enough that I trust it with multi-step tasks. That is a high bar and very few products clear it. The CLI-first design is a feature, not a limitation — it forces a clean separation between “AI does the heavy task” and “I do the keystrokes,” which makes the productivity gains legible. Twelve months from now, I expect every serious coding tool to look more like this.

Sources

Every reference behind this piece. If we make a claim, it's because at least one of these said so — or we lived it ourselves.

Firsthand Six months of daily Claude Code usage across personal and client work
Docs Claude Code official documentation — Anthropic
YouTube Claude Code is INSANE - The Best AI Agent for Coding — Various creators
Blog How Anthropic teams use Claude Code — Anthropic
Changelog Claude Code release notes — Anthropic