

Ship It Conversations: Human-in-the-Loop Fixer Bots and AI Guardrails in CI/CD (with Gracious James)
1/12/2026 | 22 mins.
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).In this Ship It: Conversations episode I talk with Gracious James Eluvathingal about TARS, his “human-in-the-loop” fixer bot wired into CI/CD.We get into why he built it in the first place, how he stitches together n8n, GitHub, SSH, and guardrailed commands, and what it actually looks like when an AI agent helps with incident response without being allowed to nuke prod. We also dig into rollback phases, where humans stay in the loop, and why validating every LLM output before acting on it is the single most important guardrail.If you’re curious about AI agents in pipelines but hate the idea of a fully autonomous “ops bot,” this one is very much about the middle ground: segmenting workflows, limiting blast radius, and using agents to reduce toil instead of replace engineers.Gracious also walks through where he’d like to take TARS next (Terraform, infra-level decisions, more tools) and gives some solid advice for teams who want to experiment with agents in CI/CD without starting with “let’s give it root and see what happens.”Links from the episode:Gracious on LinkedIn: https://www.linkedin.com/in/gracious-james-eluvathingalTARS overview post: https://www.linkedin.com/posts/gracious-james-eluvathingal_aiagents-devops-automation-activity-7391064503892987904-psQ4If you found this useful, share it with the person on your team who’s poking at AI automation and worrying about guardrails.More information on our website: https://shipitweekly.fm

n8n Critical CVE (CVE-2026-21858), AWS GPU Capacity Blocks Price Hike, Netflix Temporal
1/09/2026 | 16 mins.
This week on Ship It Weekly, Brian’s theme is basically: the “automation layer” is not a side tool anymore. It’s part of your perimeter, part of your reliability story, and sometimes part of your budget problem too.We start with the n8n security issue. A lot of teams use n8n as glue for ops workflows, which means it tends to collect credentials and touch real systems. When something like this drops, the right move is to treat it like production-adjacent infra: patch fast, restrict exposure, and assume anything stored in the tool is high value.Next is AWS quietly raising prices on EC2 Capacity Blocks for ML. Even if you’re not a GPU-heavy shop, it’s a useful signal: scarce compute behaves like a market. If you do rely on scheduled GPU capacity, it’s time to revisit forecasts and make sure your FinOps tripwires catch rate changes before the end-of-month surprise.Third is Netflix’s write-up on using Temporal for reliable cloud operations. The best takeaway is not “go adopt Temporal tomorrow.” It’s the pattern: long-running operational workflows should be resumable, observable, and safe to retry. If your critical ops are still bash scripts and brittle pipelines, you’re one transient failure away from a very dumb day.In the lightning round: Kubernetes Dashboard getting archived and the “ops dependencies die” reality check, Docker pushing hardened images as a safer baseline and Pipedash.LinksSRE Weekly issue 504 (source roundup) https://sreweekly.com/sre-weekly-issue-504/n8n CVE (NVD) https://nvd.nist.gov/vuln/detail/CVE-2026-21858n8n community advisory https://community.n8n.io/t/security-advisory-security-vulnerability-in-n8n-versions-1-65-1-120-4/247305AWS price increase coverage (The Register) https://www.theregister.com/2026/01/05/aws_price_increase/Netflix: Temporal powering reliable cloud operations https://netflixtechblog.com/how-temporal-powers-reliable-cloud-operations-at-netflix-73c69ccb5953Kubernetes SIG-UI thread (Dashboard archiving) https://groups.google.com/g/kubernetes-sig-ui/c/vpYIRDMysek/m/wd2iedUKDwAJKubernetes Dashboard repo (archived) https://github.com/kubernetes/dashboardPipedash https://github.com/hcavarsan/pipedashDocker Hardened Images https://www.docker.com/blog/docker-hardened-images-for-every-developer/More episodes and more details on this episode can be found on our website: https://shipitweekly.fm

Ship It Conversations: Backstage vs Internal IDPs, and Why DevEx Muscle Matters (with Danny Teller)
1/06/2026 | 26 mins.
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).I sat down with Danny Teller, a DevOps Architect and Tech Lead Manager at Tipalti, to talk about internal developer platforms and the reality behind “just set up a developer portal.” We get into Backstage versus internal IDPs, why adoption is the real battle, and why platform/DevEx maturity matters more than whatever tool you pick.What we coveredBackstage vs internal IDPs Backstage is a solid starting point for a developer portal, but it doesn’t magically create standards, ownership, or platform maturity. We talk about when Backstage fits, and when teams end up building internal tooling anyway.DevEx muscle (the make-or-break) Danny’s take: the portal UI is the easy part. The hard part is the ongoing work that makes it useful: paved roads, sane defaults, support, and keeping the catalog/data accurate so engineers trust it.Where teams get burned Common failure mode: teams ship a portal first, then realize they don’t have the resourcing, ownership, or workflows behind it. Adoption fades fast if the portal doesn’t remove real friction.A build vs buy gut check We walk through practical signals that push you toward open source Backstage, a managed Backstage offering, or a commercial portal. We also hit the maintenance trap: if you build too much, you’ve created a second product.Links and resources Danny Teller's Linkedin: https://www.linkedin.com/in/danny-teller/matlas — one CLI for Atlas and MongoDB: https://github.com/teabranch/matlas-cliBackstage: https://backstage.io/ Roadie (managed Backstage): https://roadie.io/ Port: https://www.port.io/ Cortex: https://www.cortex.io/ OpsLevel: https://www.opslevel.com/ Atlassian Compass: https://www.atlassian.com/software/compass Humanitec Platform Orchestrator: https://humanitec.com/products/platform-orchestrator Northflank: https://northflank.com/If you enjoyed this episode Ship It Weekly is still the weekly news recap, and I’m dropping these guest convos in between. Follow/subscribe so you catch both, and if this was useful, share it with a platform/devex friend and leave a quick rating or review. It helps more than it should.Visit our website at https://www.shipitweekly.fm

Fail Small, IaC Control Planes, and Automated RCA
1/03/2026 | 17 mins.
This week on Ship It Weekly, Brian kicks off the new year with one theme: automation is getting faster, and that makes blast radius and oversight matter more than ever.We start with Cloudflare’s “fail small” mindset. The core idea is simple: big outages usually come from correlated failure, not one box dying. If a bad change lands everywhere at once, you’re toast. “Fail small” is about forcing problems to stay local so you can stop the bleeding before it becomes global.Next is Pulumi’s push to be the control plane for all your IaC, including Terraform and HCL. The interesting part isn’t syntax wars. It’s the workflow layer: approvals, policy enforcement, audit trails, drift, and how teams standardize without signing up for a multi-year rewrite.Third is Meta’s DrP, a root cause analysis platform that turns repeated incident investigation steps into software. Even if you’re not Meta, the pattern is worth stealing: automate the first 10–15 minutes of your most common incident types so on-call is consistent no matter who’s holding the pager.In the lightning round: a follow-up on GitHub Actions direction (and a quick callback to Episode 6’s runner pricing pause), AWS ECR creating repos on push, a smarter take on incident metrics, Terraform drift visibility, and parallel “coding agent” workflows.We wrap with a human reminder about the ironies of automation: automation doesn’t remove responsibility, it moves it. Faster systems require better brakes, better observability, and easier rollback.Links from this episodeSRE Weekly issue 503 (source roundup - CloudFlare) https://sreweekly.com/sre-weekly-issue-503/Pulumi: all IaC, including Terraform and HCL https://www.pulumi.com/blog/all-iac-including-terraform-and-hcl/Meta DrP: https://engineering.fb.com/2025/12/19/data-infrastructure/drp-metas-root-cause-analysis-platform-at-scale/GitHub Actions: “Let’s talk about GitHub Actions” https://github.blog/news-insights/product-news/lets-talk-about-github-actions/Episode 6 (GitHub runner pricing pause, Terraform Cloud limits, AI in CI) https://www.tellerstech.com/ship-it-weekly/github-runner-pricing-pause-terraform-cloud-limits-and-ai-in-ci/AWS ECR: create repositories on push https://aws.amazon.com/about-aws/whats-new/2025/12/amazon-ecr-creating-repositories-on-push/DriftHound https://drifthound.io/Superset https://superset.sh/More episodes + contact info, and more details on this episode can be found on our website: https://shipitweekly.fm

Ship It Conversations: From Full-Stack to Cloud/DevOps, One Project at a Time (with Eric Paatey)
12/30/2025 | 23 mins.
This is a guest conversation episode of Ship It Weekly (separate from the weekly news recaps).I sat down with Eric Paatey, a Cloud & DevOps Engineer who’s been transitioning from full-stack web development into cloud/devops, and building real skills through hands-on projects instead of just collecting tools and buzzwords.We talk about what that transition actually feels like, what’s helped most, and why you don’t need a rack of servers to learn DevOps.What we covered Eric’s path into DevOps How he moved from building web apps to caring about pipelines, infra, scalability, reliability, and automation. The “oh… code is only part of the job” moment that pushes a lot of people toward DevOps.The WHY behind DevOps Eric’s take: DevOps is mainly about breaking down silos and improving communication between dev, ops, security, and the business. We also hit the idea from The DevOps Handbook: small batches win. The bigger the release, the harder it is to recover when something breaks.Leveling up without drowning in tools DevOps has an endless tool list, so we talked about how to stay current without burning out. Eric’s recommendation: stay connected to the industry. Meet people, join user groups, go to events, and don’t silo yourself.The homelab mindset (and why simple is fine) Eric shared his “homelab on the go” setup and why the hardware isn’t the point. It’s about using a safe environment to build habits: automation, debugging, systems thinking, monitoring, breaking things, recovering, and improving the design.A practical first project for aspiring DevOps engineers We talked through a starter project you can actually show in interviews: Dockerize a simple app, deploy it behind an ALB, and learn basic networking/security along the way. You don’t need to understand everything on day one, but you do need to build things and learn what breaks.Agentic AI and guardrails We also touched on AI agents and MCPs, what they could mean for ops teams, and why you should not give agents full access to anything. Least privilege and policy guardrails matter, because “non-deterministic” and “prod permissions” is a scary combo.Links and resources Eric Paatey on LinkedIn: https://www.linkedin.com/in/eric-paatey-72a87799/Eric’s website/portfolio: https://ericpaatey.com/If you enjoyed this episode Ship It Weekly is still the weekly news recap, and I’m dropping these guest convos in between. Follow/subscribe so you catch both, and if this was useful, share it with a coworker or your on-call buddy and leave a quick rating or review. It helps more than it should.Visit our website at https://www.shipitweekly.fm



Ship It Weekly - DevOps, SRE, and Platform Engineering News