The Ethics of 'Vibe Coding' in Healthcare

2025-09-11

I gave an AI agent full control over a clinical research data science task in a development environment. After accepting all edits for about an hour, it produced a model with AUC 0.67. When I asked why it chose certain features over others, it hallucinated evidence.

As AI agents become more autonomous (via tools like Weco, Cursor, or GitHub Copilot in “agent mode”), we’re sneaking around a space of untraceable decision-making. An AI Agent’s ability to reason, iterate, and act can in many ways also makes them opaque. I am excited that coding barriers (Python, R, FHIR, etc.) are being broken down with the help of LLMs so that more clinicians can be involved in developing the technology that improves care, but I am also concerned that decision making may be offloaded to AI during development.

In healthcare, who is responsible when the developed model fails? The user? The AI? The AI provider?

The danger isn’t that AI agents are bad at reasoning. It’s that they are too good at simulating it and are merely reflecting rational thinking seen in their training data.

We need a new standard: decision logs

Just as we require clinical trials to document every step, every hypothesis, every deviation, we must demand the same from AI agents. Every choice like feature selection, model architecture, hyperparameter tuning should be logged with:

The rationale (even if speculative) - Logging of all “thinking steps”
The evidence (or lack thereof) - An agent can do this too, but sources need to be validated
The model’s confidence level - Probabilities, internal assessments, LLM committee-as-judge frameworks
The source of the data or insight - References that are true, valid, reliable, and represent the best of science

This isn’t about slowing down AI. It’s about making it responsible.

If you develop a model that impacts a patient, we should be able to prove to the patient how it works, the steps taken to eliminate bias and harm, and the paths not chosen with clearly defined evidence.