background
All Blogs

AI Coding Tools Work Better Than Expected. But They Quietly Change How We Trust.

AI Coding Tools Work Better Than Expected. But They Quietly Change How We Trust.

The productivity gains from AI coding tools are real. But the more interesting thing I learned wasn't about productivity at all — it was about how repeated competence quietly changes the way you trust.

Lessons from weeks of real AI-assisted development at Cyborg using Claude Code, Codex, and similar tools.

At Cyborg, we ship real production software for clients across finance, healthcare, and operations. So when we started pushing AI coding tools harder into our day-to-day engineering work over the past few weeks, the goal was simple: find out where they actually help, where they quietly hurt, and how a serious engineering team should think about working with them.

Not in the "generate a toy app" sense. Actual engineering work:

  • debugging production issues
  • implementing features
  • navigating repositories
  • updating documentation
  • handling iterative changes
  • managing multi-step implementation flows

And honestly, the overall experience was surprisingly positive.

The productivity gains are real

In supervised workflows, these tools are genuinely good at:

  • accelerating implementation
  • reducing repetitive work
  • helping investigate issues
  • navigating unfamiliar code
  • maintaining momentum during long debugging sessions

This was never a fully autonomous setup where specifications were dumped into an AI and left unattended. We still handled:

  • architecture direction
  • Git strategy
  • commit management
  • implementation planning
  • backups
  • operational oversight
  • release thinking
  • documentation discipline

The AI acted more like a fast implementation partner than an autonomous engineer. And with that structure in place, it worked much better than we initially expected.

The Interesting Problems Were Rarely Pure Coding Problems

Most implementation tasks actually went reasonably well. The more interesting failures happened around:

  • operational context
  • environment awareness
  • repository state
  • confidence calibration

For example, during one debugging session, a production widget stopped loading correctly. The AI initially diagnosed the issue as a frontend caching problem and modified cache-busting logic in the loader.

Technically, the change was valid. The problem was that caching had nothing to do with the actual issue.

Later in the same session, the AI replaced an internal API abstraction with a more generic implementation because the generic pattern looked more familiar. Except the abstraction already existed intentionally and likely handled concerns such as:

  • authentication
  • retries
  • logging
  • environment-specific behavior

Eventually, after several rounds of investigation, the real root cause turned out to be something entirely outside the repository logic — a required database migration had not been applied in production.

Individually, these were manageable mistakes. But collectively, they revealed an interesting pattern: the AI was often very strong at local implementation reasoning while still weaker at broader operational context.

And honestly, that distinction matters much more in real systems than most AI demos suggest.

The Moment That Made Us Pause

One particular workflow made this even clearer. The repository already had:

  • ongoing unrelated work
  • multiple modified files
  • uncommitted changes on dev

The task itself sounded simple:

  • move one newly created commit onto another branch
  • keep the remaining uncommitted work untouched

As part of these workflow experiments, we had intentionally allowed the AI assistant to perform certain repository operations directly, so we could better understand how far supervised AI-assisted workflows could practically go.

At one point, the assistant ran a hard reset on the branch — git reset --hard.

The command technically solved one objective: it removed the commit from the branch. But it also wiped the uncommitted working tree changes.

Fortunately, this did not turn into catastrophic loss. Important workflows were already being handled carefully — backups existed, Git flow was still supervised, implementation structure remained controlled. So practically, it became more of a learning moment than a disaster.

Still, the incident exposed something important. The AI had enough information available to avoid the mistake. The working tree was dirty. Modified files were visible. The instruction was to preserve existing work.

Yet it optimized primarily for "move the commit off the branch" without fully protecting "preserve unrelated working tree state."

That distinction stayed with us long after the debugging session ended.

The Most Interesting Shift Was Psychological

The biggest thing we learned was not that AI makes mistakes. Humans make mistakes constantly too.

The more interesting realization was how repeated successful interactions gradually changed how we reviewed the AI's work. After enough correct implementations, useful debugging assistance, and productive iterations, we noticed ourselves moving from "verify every operational step carefully" toward "this is probably fine."

Not because we stopped caring. And not because the AI became perfect. But because repeated competence naturally builds trust.

That is where AI-assisted development becomes fundamentally different from traditional tooling.

The risk is not blind automation. The risk is gradual trust transfer.

Why AI-Assisted Development Feels Different

Traditional tools do not simulate reasoning. AI systems do.

They explain themselves. Justify decisions. Navigate repositories fluently. Sound confident. And often succeed repeatedly.

After enough successful interactions, humans naturally compress their review process. That is not irrational behavior — it is how humans interact with capable systems everywhere: senior engineers, CI pipelines, deployment systems, cloud infrastructure, automation tooling.

AI simply accelerates this effect because it combines speed, fluency, confidence, and partial correctness extremely well.

The Real Boundary Became Clear

After working this way for weeks, our conclusion became surprisingly simple:

AI is becoming very good at driving implementation. Humans still need to own system state.

That includes:

  • Git safety
  • destructive operations
  • migration awareness
  • deployment context
  • production reasoning
  • operational boundaries

In other words — AI can increasingly help execute engineering work, but humans still need to remain custodians of system integrity. At least for now.

How Our Workflow Changed

Interestingly, these experiences did not make us stop using AI coding tools. If anything, we probably use them more now. But the workflow evolved.

We trust AI heavily for acceleration, exploration, iterative implementation, debugging assistance, and repository navigation.

But we now treat operationally sensitive actions differently — branch manipulation, resets, migrations, infrastructure changes, production-impacting decisions. Those areas still deserve slower human supervision.

Not because AI is incapable. But because operational awareness is still very different from implementation capability.

And honestly, we think that is the real lesson many engineering teams are currently learning as AI-assisted development becomes normal.

The future probably is not "AI replaces engineers." It is more likely "AI increasingly drives execution while humans remain responsible for state, boundaries, and operational judgment."

So far, that combination is already surprisingly powerful.

Working through similar questions in your own engineering function? We help teams adopt AI tools without giving up operational rigor. Learn more about our AI Implementation Services →

Please fill out the form and we’ll be in touch with you.

up to 20MB