Every engineering team I talk to is adding AI agents to their workflow. Almost none of them are updating the practices around those agents. The DevOps practices we built over the last two decades apply directly, but the failure modes have changed. If you don’t adapt to a world where some of your developers aren’t human, you’ll ship bugs faster than you ever could before.The biggest shift is that the bottleneck moved from shipping code to learning from what you shipped, and most teams haven’t built the rituals to close that gap. Gene Kim’s Three Ways from the DevOps handbook…
Author: drweb
Configuration drift is the gap between the infrastructure state declared in code and the state actually running in your environment. It occurs when resources are changed outside of your infrastructure as code (IaC) workflow, so the live system no longer matches its definition.In a single cloud, drift is usually straightforward to find and correct. Across multiple providers, it is harder to detect and more costly to leave unaddressed.Why Does Multicloud Make Drift Worse?Each provider has its own API, resource model, console, and defaults. A change made directly in one cloud does not resemble the equivalent change in another, so the…
Threat actors are exploiting a known security flaw in the SimpleHelp remote monitoring and management (RMM) software to drop two previously unknown pieces of malware that can compromise a broad range of systems and steal massive amounts of sensitive data.Researchers with Blackpoint Cyber’s Adversary Pursuit Group said they detected an intrusion in which the adversaries abused a critical authentication bypass vulnerability — tracked as CVE-2026-48558 — to obtain an authenticated technician session without valid credentials on an internet-facing SimpleHelp server.“The compromised RMM platform provided the operator with a trusted administrative channel capable of transferring files and executing commands on systems…
It’s officially summer, and I am bringing you some HOT Python deals today! Get 33% off almost all my books and courses on Gumroad today using the following H5N5F7K You can start learning the basics of Python with Python 101, or get more targeted learning with my book, Python Logging. If you want to create a user interface, then you might enjoy Creating TUI Applications with Textual and Python. I have over a DOZEN Python books to choose from! Check them out today: https://driscollis.gumroad.com/ Plus even more that aren’t pictured here!
A survey of 406 IT decision makers at organizations with more than 250 employees in North America finds 93% have experienced at least one infrastructure incident caused by reliance on artificial intelligence (AI) tooling.Conducted by Panterra Group on behalf of Spacelift, a provider of a platform for automating the management of infrastructure-as-code (IaC), the survey also finds 86% reporting that AI has increased demands on infrastructure teams, with security vulnerabilities appearing faster (40%), governance becoming harder (40%), change rates increasing (37%), more strain on pipelines experienced (35%) and growing infrastructure drift (35%) being seen.In general, more than two thirds (67%)…
Many years ago, before I joined Oracle, I was working on a major modernisation project. We were replacing an existing non-Oracle system with an entirely new Oracle database application written from scratch. Not long after deploying a new version into our test environment, the results came back and a large number of tests had failed.I sat down with one of the subject matter experts, a long-serving employee who had helped build the original system. As we worked through the failures, he looked at me and said: “The problem here is blue sky programming.” I’d never heard the expression before.“What do…
One of the things I’ve been requesting for a number of years is cost information. I could see this coming in 2015 with the move to the cloud and need to justify the resources provisioned along with sizes. Doing that effectively needs cost information.Redgate Monitor has added a bit of cost information, and the virtual machine section in the Estate tab contains this. This post looks at what is available (as of June 2026).This is part of a series of posts on Redgate Monitor. Click to see the other posts.Virtual MachinesWhen I first started managing VMs and moving database loads…
You ran dnf update, and now something has stopped working. Instead of spending hours troubleshooting, you just want to go back to the package version that was working before. This happens more often than most Linux administrators would like to admit. Maybe a new Nginx release introduced a default configuration change that broke your virtual hosts. A Python library update changed an API that your internal scripts rely on. Or perhaps a kernel update no longer works with a third-party driver. From DNF’s point of view, everything installed successfully, but that doesn’t always mean your applications will continue to work…
Traditional software deployments are high-risk, all-or-nothing events. A single faulty release configuration can cascade into outages, increased error rates, customer impact and costly rollbacks. Progressive delivery changes this paradigm by introducing controlled, observable and reversible releases. The traditional ‘big bang’ release — where code is merged and deployed at 2:00 a.m. — is increasingly a relic of the past.‘Progressive delivery’ is the modern evolution of continuous delivery, designed to reduce the blast radius of new features and decouple ‘deployment’ (moving code to production) from ‘release’ (exposing features to users).In a progressive-delivery model, the goal is to move from a binary…
Last month, one of our autonomous coding agents (not a copilot suggesting inline completions, but a system that reads a ticket, plans a multi-file implementation and opens a PR without a human touching the keyboard) analyzed a ticket, touched 37 files, updated two database migrations and opened a PR in 11 minutes flat. The diff looked clean. Tests passed. The reviewer approved it.We found the problem at 2:47 a.m. on a Thursday, three days later, during an unrelated log audit. One of our SREs was tailing canary logs trying to trace an intermittent 401, and there it was: A staging…
