Kyaw Min Thu 1 / 21

Kyaw Min Thu

System-Level Controls Engineer — Multi-Domain Industrial Systems

Focused on solving real system failures across control, mechanical, and process domains

Expertise PLC · BMS · SCADA
Background Oil & Gas · Fire Systems · IPC
Approach Isolate · Validate · Fix
01

Fire Training Simulator

Burner management · SCADA integration · Safety-critical commissioning

Fire simulator commissioning
Fire Systems

Fire Training Simulator

  • Burner management system (BMS) design and programming
  • PLC-controlled gas, air, and ignition sequencing
  • SCADA integration for operator visibility
  • Commissioning alongside fire service teams
  • Safety-critical interlocks and shutdown logic
Field Work

Hands-On System Integration

  • On-site instrumentation installation and calibration
  • Sensor loop testing and signal verification
  • Valve and actuator commissioning
  • Integration testing across control, mechanical, and process layers
Field pipe installation
02

Field Experience

Offshore operations · Hydraulic top drive systems · Multi-domain integration

Offshore platform
Field Commissioning

Offshore Systems Experience

  • Hydraulic top drive systems on offshore drilling rigs
  • Installation, commissioning, and hands-on troubleshooting
  • Mechanical, hydraulic, and control system integration
  • Operating in high-risk, safety-critical environments
  • Working with multinational field teams under pressure
Key insight from field: This is where I learned that failures are most often caused by the physical system — not control logic.
System Integration

Multi-Domain System Thinking

Real systems require coordination across multiple engineering domains simultaneously.

Mechanical Hydraulic Electrical Control / PLC Human Interface
"Failures are rarely isolated to one domain. Identifying which system layer is actually responsible — before making any changes — is the first step."
Hydraulic top drive system
03

Root Cause Analysis

Structured isolation · Data-driven thinking · System-level reasoning

RCA Approach

How I Approach Complex Failures

Principles

  • Avoid assumptions — verify the actual condition
  • Separate symptom from root cause
  • Validate with data, not intuition
  • Isolate domains before changing anything
  • Do not replace components blindly

Why This Matters

  • Control system can behave correctly while the process fails
  • Intermittent issues often indicate process instability
  • Random fixes waste time and mask the real cause
  • Instrumentation gaps hide root causes entirely
7-Step Framework

Structured Investigation Process

1
Define the symptom clearly — when, how often, under what conditions
2
Identify the affected domain — control, mechanical, fluid, electrical, or process
3
Collect data — logs, signals, pressure readings, valve timing
4
Isolate variables — verify each domain independently
5
Test hypothesis — confirm the suspected cause is repeatable
6
Confirm root cause — validate with evidence, not assumption
7
Implement fix + validate — correct, test, then prevent recurrence
RCA Case Study

Intermittent Fire Ignition Failure

Problem Statement

  • Fire ignition failed intermittently
  • System required multiple attempts to start
  • No clear pattern — appeared random
  • No alarms triggered on the control side

Initial Observations

  • Electrical trigger signal present and correct
  • BMS responded as expected
  • Gas valve and air valve actuated normally
  • No pressure instrumentation available — no visibility
Key insight: Everything in the control layer appeared correct. The issue had to originate in the physical process layer.
Failure Path

Why Ignition Was Failing

Compressor
Air Source
Air Filter
Heavily Clogged ⚠
Insufficient air pressure → ignition fails
Air Valve
Opens (correct)
Burner
Ignition Phase

Why It Was Intermittent

Compressor needed time to rebuild pressure after each attempt. Eventually pressure was sufficient — so the system would "self-recover," masking the root cause.

Why It Was Invisible

No pressure transmitter was installed. The control system had no way to see the pressure drop. All signals looked correct.

Steps 2–4 · Investigation

Domain Isolation Approach

Domains Verified Independently

Control / BMS ✓ Valves / Mechanical ✓ Air Supply — No data Ignition Electrical ✓

Control, mechanical, and electrical layers all verified correct. The uninstrumented air supply became the sole remaining domain.

Pattern That Confirmed Direction

  • Failures correlated with delayed ignition cycles
  • System recovered on its own after multiple attempts
  • Recovery pattern indicated process-side instability
  • Logic errors do not self-recover — process conditions do
Steps 5–7 · Resolution

Root Cause, Fix & Outcome

Root Cause

  • Air compressor needed time to build sufficient pressure
  • Air filter heavily contaminated — restricting airflow
  • Insufficient air pressure at ignition phase
  • No sensor → failure invisible to the system

Corrective Actions

  • Replaced contaminated air filter
  • Added air pressure transmitter
  • Integrated pressure monitoring into SCADA
  • Implemented preventive maintenance schedule
Before
Multiple attempts required · Intermittent failure
After
Ignition immediate and stable · Zero recurrence
Engineering Lessons

Key Takeaways

  • Lack of instrumentation hides root causes — add visibility first
  • Control system may behave correctly while the process fails
  • Intermittent issues often indicate process-side instability
  • Preventive maintenance is critical for reliable operation
"Many failures originate from the physical process, not the control system. Without proper instrumentation, these issues are invisible until they become critical."

One-Line Summary

"Root cause: insufficient air pressure due to a clogged filter — resolved by adding pressure monitoring and preventive maintenance controls."

Field team at drilling site
Offshore Reinforcement

Real-Field Debugging — Same Discipline

  • Limited documentation in offshore field environments
  • Real-time pressure to restore operation quickly
  • Rapid isolation: mechanical, hydraulic, or control?
  • Cross-domain failure sources with no initial pattern
  • Decision-making with incomplete information
The same domain-isolation discipline applied offshore — before the fire systems, before the formal framework. The approach came from real field necessity.
RCA Case 2

Intermittent System Shutdown — No Observable Cause

Problem Statement

  • System stopped unexpectedly with no alarm
  • Occurred intermittently — daily or a few times per week
  • Not reproducible on demand
  • Standard diagnostics returned no fault condition

Key Challenges

  • No logging or historical event data available
  • PLC trending insufficient — event rarity made capture impossible
  • Could not observe the trigger condition in real time
  • Unknown whether source was control, process, or human
Key decision: PLC trending was insufficient due to event rarity — I needed to implement continuous external monitoring before any diagnosis was possible.
Investigation Strategy

Building Visibility to Find the Fault

Monitoring Scope

  • All stop conditions and stop commands
  • All permissives and run conditions
  • All state transitions — continuously
  • Any event that could prevent or interrupt operation

Implementation

  • Implemented a continuous monitoring system in Python
  • 24/7 signal capture — no event missed regardless of timing
  • Real-time alerting via Slack on any stop event
  • Correlated multi-source logs to identify timing patterns
"When the existing tools are insufficient, build the visibility layer first — then diagnose."
Steps 5–7 · Resolution

Root Cause, Fix & Outcome

Root Cause

  • Intermittent manual input — unintentional button press by associate
  • Not previously capturable due to complete absence of event logging
  • Human interaction as a hidden failure source — not suspected initially
  • Only visible through continuous real-time monitoring

Corrective Actions

  • Identified exact triggering condition through monitoring data
  • Monitoring and alerting system permanently retained
  • Improved system observability — all stop events now visible
  • Enabled immediate response and verification on any future event
Before
Random unexplained stops · No data · No diagnosis possible
After
Root cause confirmed · Full event visibility · Zero recurrence
RCA Summary

Two Types of Failures — Two Different Approaches

Type Case Approach Key Tool
Process failure Fire ignition failure Physical system analysis — domain isolation across control, mechanical, and process layers Add instrumentation · Pressure transmitter · PM schedule
Intermittent / human-system Random system shutdown Continuous monitoring — capture what existing tools could not see Python monitoring system · Real-time Slack alerting · 24/7 logging
"The root cause of the second case was an intermittent manual input — solved by implementing a 24/7 monitoring system to capture and correlate stop conditions in real time."
04

Relevance & Fit

How this background maps to complex multi-physics systems

Why This Transfers

Multi-Physics System Parallels

My Experience

  • Hydraulic + mechanical + control integration (offshore)
  • Gas + air + thermal + ignition sequencing (fire systems)
  • Failures from the process layer, not control logic
  • Structured domain isolation under real operating pressure

Complex Multi-Physics Systems

  • Motion + gas + thermal system coupling
  • Failures may originate outside control logic
  • Requires domain isolation and data validation
  • Instrumentation gaps are risks — visibility is critical
"Where gas flow, thermal conditions, and control logic interact — identifying the correct domain is the first and most critical step."

I focus on systems where control decisions directly impact physical outcomes.

Real systems  ·  Real failures  ·  Real consequences.

Built Complex Systems
Operated Harsh Environments
Solved Root Causes