Inside AI – Ep 3: Swiss Cheese and Bow Ties - Novel Models of AI Risk

Podcasts

2 June

Jan Esman (CTO Consulting Head of Enterprise Strategy) and Dale Rogers (Atturra Strategy and Service Design Senior Consultant) explore how organisations should approach AI risk management, drawing parallels from high-hazard industries such as petrochemicals. The central metaphor is the Swiss Cheese Model: individual layers of defence (policy, alignment, training, human-in-the-loop review, output filtering) each have gaps, and when those gaps align, failures slip through.

Dale argues that while organisations are increasingly focused on ethical alignment and responsible AI frameworks, they largely neglect operational and mitigating controls — the real-time mechanisms that contain failures once they occur. Unlike traditional IT deployments, AI systems are non-deterministic and require continuous monitoring, testing, and recalibration — more like tending a garden than flipping a switch.

Key takeaways for leaders include establishing clear incident response plans, designating an accountable individual authorised to "pull the plug," and embedding a genuine risk-aware culture — not just policy documents. The concept of ALARP (As Low As Reasonably Practicable) offers a pragmatic framework for calibrating controls without chasing perfection.

Runtime [00:24:40]

Inside AI – Ep 3: Swiss Cheese and Bow Ties - Novel Models of AI Risk

Our Speakers

Jan Esman

Jan Esman is a seasoned digital advisory leader with extensive experience in IT strategy, enterprise architecture, and digital transformation. His expertise in aligning technology solutions with business objectives ensures that CTO Consulting clients benefit from strategic insights and effective digital initiatives.

Dale Rogers

Dale Rogers is a Service Design Leader specialising in digital transformation across utilities, transport, and government. With expertise in strategic planning, agile delivery, and human-centred design, Dale translates complex systems into actionable change with a keen eye on emerging technologies like AI, blockchain, and IoT.

Jan - [0:06] Welcome to Inside AI, the podcast where we bring you unfiltered insights from people who are shaping AI agendas that matter. My name is Jan Esman. I'm the head of strategy for CTO Consulting Group. And our guest today is Dale Rogers. He is a service designer and a business liaison working across digital transformation and ICT governance at the Australian government department of climate, energy and the environment and water. Almost got that right. I'd like to open by saying that the views that you're going to hear today are expressly our own and do not reflect agencies or organizations. So this is very much Jan and Dale. Um, but I think we've got to have a good conversation because we are working at the coal phase and I think there's some really great ideas we can bring into this conversation. So, what is it about? It's about Swiss cheese. And I think this is a fabulous metaphor because the idea of Swiss cheese is that each slice has got holes in it. Um, and when you set them up, you can either get a solid piece of cheese or if the holes line up, you have a direct pathway all the way through the cheese and that becomes that risk failure mode which creates the unexpected catastrophes and we all know that large scale infrastructure projects, resource extraction projects have been dealing with large scale risk for a long time in ways that it hasn't always had to deal with. We've never really managed risk on the scale of AI before, but we can learn a lot from what those kinds of organizations have done to cope with some of the major disasters. You know, the Deep Water Horizon, the Piper Alpha, they've had big failures and we really don't want big failures in AI. So, that's really the concept behind why we're having this conversation.
Jan - [2:03] So let's do a little bit of an opening starter. Dale, what do you see — how organizations are approaching their AI governance right now and what kind of patterns are emerging around this risk management?
Dale - [2:20] I think the most common patterns around risk management around alignment and guard rails and you know there's a real need to sort of align so ethics reviews and responsible AI frameworks, model time fine tuning, but I think you know that's only one slice of the Swiss cheese. It's not a governance system. And so I think what's really absent in the conversation about AI risk is the operational controls, the things that actually stop or contain a bad outcome in real time regardless of whether the model is aligned or not. Um and I think the maturity just isn't there. But the confidence is — I think people feel like oh we're starting to get this alignment thing solved. So it provides a false sense of security.
Jan - [3:23] Is part of that that transition from — we historically think of an ICT project: get the requirements, build it, test it, deploy it, run and operate it. Whereas with an AI, what we're really doing is engaging in an ongoing development, learning, training, exception management, continuing governance. Is that part of the change in mindset around these risks?
Dale - [3:46] Yeah, I think so. Um I think that change in the sort of mode of continuous operation, and also in large part the non-determinacy of AI means that sometimes the outcomes aren't exactly what we expect and also the failure modes and the testing — it's not as easy to script.
Jan - [4:14] Now, is it as simple as writing a policy? I mean, is that the answer? We just need to have a policy for AI.
Dale - [4:27] Oh, look, I wish it was. Um, no, it's not as simple as writing a policy. You know, there's a lot of moving parts that need to be put together. The policies are important.
Jan - [4:38] Do you think a policy risks putting the brakes on too early? Is there a risk that before we even really understand how we're going to apply AI into government scenarios, we're throwing controls and policy at it too early? Or should we be doing something around that framing?
Dale - [5:01] I think we definitely should be doing, you know, we should have policies and we should have alignment. I guess what's missing though is the operational context — the controls that actually enact the policy. You know, a policy is only as good as the controls that are in place.
Jan - [5:28] Yeah. Really important, isn't it? It's almost like the policy is the reflection of a mature set of governance and controls rather than: I'm going to write a policy and hope that the maturity comes out of the policy. Let's talk about the Swiss cheese because I love the metaphor. What are the layers in the Swiss cheese model for AI?
Dale - [5:58] We discussed already — policy, we talked about alignment. They're two separate layers. We then need to start to think about training and human in the loop reviews. We want to have in place output filtering and hooks that detect aberrant output. And so at each layer of defense you need to be able to catch different aspects of an error that might be propagating through the system.
Jan - [6:41] Can you give an example of when the holes line up and things go wrong? How does it happen? Give us an example of one you've seen.
Dale - [6:55] Yeah, look, I think in AI when the holes line up it's not always catastrophic. It's not a really obvious disastrous failure and it's more common that small errors go missed. As parts of the human in the loop aspect of the layers of defense occur, as people become more confident and comfortable that the systems are working correctly, there's less of an eye on the detail and so small errors can propagate through, be missed by the human. And ultimately those errors — I think the largest effect is around bias. It's often these systems we're using around decision support tools, so it's AI helping assessment officers make decisions based on, you know, licensing of a doctor or something like that. And the errors that propagate through the system mean that the output might completely ignore a cohort of people or completely bias or affect a cohort of people on the output.
Jan - [8:39] There's a really interesting point because I love the way you got down to an assessment for a license — something really concrete — with the idea that AI can help accelerate or provide a more 360 view of the decision to allow that human in the loop to make the decision. But there's a really interesting insight in that — AI is not going to necessarily be perfect. It might have a much lower error rate than a human being, but still not perfect. And we know with self-driving cars, the standard of expectation for an electronically driven car is so much higher than it is for a human. Have you seen that kind of struggle — like what's an acceptable error rate for a system as opposed to a human?
Dale - [9:30] I have. Yeah. So you know we do expect much greater accuracy from our automated systems including AI, and I think it's actually reasonable to expect much greater accuracy. I think these systems can do a much better job at the expert roles that they have. But funnily enough — it's not 100% AI but it is machine learning related — we had some work at the Royal Australian Mint and there were some vision systems that would count coins. And again those small minor errors meant that after you process 100,000 tons of dollar coins, the accounting system is up or down by a percentage. And so it's really important again with these layers of defense to put in place those tolerances around — well, how much plus or minus 1% is okay, or 2%? And also have a variety of different checking mechanisms so you're not just applying one approach to counting coins, but you're also passing it through a different test.
Jan - [11:17] Yeah, good example. I love it. Hey, we started out by talking a little bit about how we could learn from some of those big resource infrastructure type projects. And one of those concepts you brought up is bow tie analysis. Can you talk a bit about bow tie analysis and how it might apply to AI thinking?
Dale - [11:39] Yeah. So bow tie analysis is really common in petrochemical industries and high-risk industries. Essentially it's the idea that we have preventative controls — we might have a pressure relief valve on our boiler — but we also have on the other side remediation controls. So coming out of the central point of the event — something went wrong — on the other side, well, okay, we didn't mean it to go wrong, but it did. And so what controls have we got in place right now to address that? And if you put that into an AI context, you know, we do have the alignment of our systems. We are designing and training our language models to apply our morals and values and ethics. But at times AIs do go wrong — I'm thinking of Twitter and X but you know Grok started to sort of call itself Hitler or something, I can't remember.
Jan - [12:58] MegaHitler, right. So when that error happens — okay, we did all of the things we could to train to prevent that error, that alignment. But after it happened, how do we control it? How do we put the genie back in the bottle? How do we stop the error from continuing?
Jan - [13:22] Yeah, and I like that risk mitigation. We often think about trying to understand the risks, take the threat and the consequence and look at how we can mitigate that, but also how can we avoid those risks as well. How does that come together into a systems view of risk? Because it's almost like the holes in the cheese — if I can make a list of things that can go wrong, great, I've just listed out a bunch of holes and I can write mitigations or avoidance strategies against that. But because this is a learning ongoing system, you need a system of risk management. So how do you evolve it from the Swiss cheese up to a system?
Dale - [14:06] Oh, look, I mean, I think we sort of come full loop into the policy space where, you know, you have a framework, you set expectations, you design — you identify all of the risks that you imagine might be involved in the system or involved in the delivery of the large language model. And yeah, it's not very sexy, but it is about documentation. It's also about testing. And I think you've really hit the point for me there around being able to have an ongoing test. You need to be able to sample those results consistently to determine whether the model is actually veering off course or is not coping with real life scenarios as opposed to the theoretical scenarios that it was created with and trained on. And that ongoing test evidence — reinforce and go back to the learning model, set the guardrails up, carry on and keep that process. That's a new mechanism. We haven't actually used that traditionally on technology projects.
Jan - [15:32] I mean, yeah, monitoring and telemetry is a thing. I think we have used it — we use it in industrial controls, we even use it in the web, and like Google Analytics, marketing people in particular are constantly running AB tests. So there is a bit of knowledge out there in marketing and other disciplines to bring monitoring and telemetry to the table.
Jan - [16:06] Yeah. And I like that AB thinking as well because that is real life — you're testing a new scenario, a new process pattern, and then you're able to measure results and upgrade and evolve by constantly having a series of tests in the market. And I think that's really applicable to AI thinking, isn't it?
Dale - [16:22] Yeah. Yeah. I agree.
Jan - [16:28] So the high hazard industries went through this a lot. And those high hazard industries — I think there's a cultural aspect to that as well. They are very consciously discussing safety, risks, and operational standards in every meeting. Is that where we're going in terms of AI capable business units that are supported with AI specialists?
Dale - [17:11] I very much hope so. You know, those high-risk industries were forced to take safety really seriously, forced to take risk really seriously because of those Piper Alpha etc type incidents. And you know it wasn't that the types of errors were invisible — it was just that people weren't looking. And certainly changing culture in ICT, in AI, developing a culture of risk management and really looking under the bed — not being afraid to look for those errors that could be there.
Jan - [18:08] Yeah. The other concept around high risk industries is ALARP. Can you talk us through what ALARP is and how that might be a relevant concept for AI?
Dale - [18:22] Yeah. So ALARP — it stands for As Low As Reasonably Practicable. It's really this concept that we don't want to spend inordinate amounts to control the problem. It's about doing enough — good enough, not perfection. It's about setting the bar right: I'm going to put in place all of these controls, I'm going to give myself a bar, and I'm not going to spend more money chasing smaller and smaller benefits.
Jan - [19:23] Yeah. No, that's a good position to take. And again, it's that pragmatism built into the culture, allowing you to manage to a business outcome, an organisational outcome. Dale, what can leaders do? What should they take away from this conversation? If you were a senior manager thinking about an AI program, what would you want to do to deal with some of the things we've talked about today?
Dale - [19:56] Look, I think the best thing I'd ask a senior manager to do is to think about what happens if things go wrong and what plans have they put in place. Do they have an incident controller? Do they have pre-approved legal comms to go out to the media? Do they have a way to pull the plug? Have they authorised somebody to pull the plug? And does that person understand the threshold at which they need to pull the plug? And that person needs to be an accountable individual — when things go wrong, you don't have time to put a committee together and decide whether or not you're going to pull the plug.
Jan - [20:54] Yeah. And is that about establishing critical controls?
Dale - [20:58] Yes. Yes.
Jan - [21:01] And what sort of critical controls do you think are really high priority for AI?
Dale - [21:09] I think user safety. It's about understanding the potential for harm in the communities that our solutions serve. You know, is our medical doctor licensing system giving licenses to people that might harm our community? And if we can identify that harm at those specific types of harm that can occur, then that's where we start to put in those preventative controls and those mitigating controls.
Jan - [21:58] Yeah. And that creates a need for an ethical overlay. Have you seen organisations do that well?
Dale - [22:07] Look, I think in the AI space — and again, this is just our personal opinion — I think we do the preventative controls really well. I just don't think we do the mitigating controls. I don't think we apply them.
Jan - [22:28] Yeah, and maybe there's a little bit of that mentality about — I traditionally deployed a finance system, a HR system and it just runs. But you deploy an AI system and you need to keep fertilizing it, watering it, and trimming the bits that have grown off in the wrong direction. So it's an ongoing commitment. It's a cultural measurement environment. It's got controls that need to be actively maintained. I think that's such a good metaphor. I've personally worked in a refinery situation and I've seen what happens when they've gone for zero injuries — that's a laudable goal, but practically you won't get there. But the cultural change to think about what you can do systemically to avoid things going wrong is not a once-off. It's not a set and forget. You don't just give everybody some safety equipment and good luck with that. No, it's a complete cultural commitment to ongoing attention to risk, corrections, and managing that ongoing service delivery. So I think it's a great metaphor. I love bringing a little bit of that alternative and repeated monitoring and training and refresher training and understanding the human element of working alongside AI.
Dale - [23:50] Yeah. And I think that's what we've got to get right. We talk about human in the loop, we talk about the threat that AI poses to traditional jobs, but we've got to develop a new relationship with this technology and it will change a lot of things and a lot of expectations. But we've got to evolve quite quickly as well to take advantage of that.
Jan - [24:18] Good. Thank you Dale. I've really enjoyed this conversation and yeah, look forward to having many more.
Dale - [24:25] Great. Look, thanks for inviting me along and yeah, really enjoyed sharing my insights.
Jan - [24:34] Brilliant.

AIJan Esman

Kerry Carroll

Inside AI – Ep 3: Swiss Cheese and Bow Ties - Novel Models of AI Risk

Our Speakers

Transcript

Inside AI - Ep 4: AI Trajectory, Resilence, and Ethical Frameworks

Inside AI - Ep 2: The Agent Problem Nobody’s Ready For