AI Water Operations Research & Pilots
Cognitive Architectures for the Modern Utility: A Technical Review of AI Integration, Autonomous Control Frameworks, and Micro-App Deployment in Water and Wastewater Operations
Executive Overview
The global water and wastewater sector currently stands at a precarious yet transformative operational inflection point. For decades, the industry has relied on Supervisory Control and Data Acquisition (SCADA) systems as the centralized nervous system for facility management. These systems, while robust and deterministic, were designed for an era of static setpoints, manual intervention, and abundant veteran expertise. Today, however, utilities face a convergence of escalating pressures that legacy OT (Operational Technology) architectures are ill-equipped to handle alone. The "Silver Tsunami" of workforce retirements is draining institutional knowledge at an unprecedented rate.1 Simultaneously, the regulatory landscape is shifting toward increasingly stringent effluent limits and lower tolerances for Disinfection Byproducts (DBPs), demanding a level of process precision that exceeds human reaction times.2 Furthermore, the economic imperatives to reduce energy consumption—often the second largest operating expense for a utility—and minimize Non-Revenue Water (NRW) have made operational inefficiency a financial liability.1
This technical review posits that the solution lies not in replacing SCADA, but in augmenting it with a cognitive architectural layer—a "System of Intelligence" that sits above the "System of Control." This report, written from the perspective of a Senior Water Systems Engineer and AI Architect, serves as a comprehensive guide for Technical Operations Directors and Plant Managers navigating this transition. It moves beyond the hype of "Digital Twins" to provide a rigorous engineering analysis of how Artificial Intelligence (AI) and Machine Learning (ML) can be securely integrated into the utility stack.
We define the concept of "Intelligent Handoffs," where Natural Language Processing (NLP) synthesizes unstructured operator logs with structured time-series telemetry to contextualize plant behavior.1 We analyze the control theory tension between Open-Loop and Closed-Loop systems, advocating for a "Human-in-the-Loop" (HITL) transition state that builds operator trust through explainable AI.4 We critically evaluate market-leading platforms like Xylem Vue powered by GoAigua and Aquasight, assessing their technical underpinnings and Return on Investment (ROI).6
Crucially, this document offers a practical roadmap for the internal development of "Micro-Apps"—lightweight, purpose-built tools that bridge the gap between legacy OT systems and modern IT capabilities. We propose six detailed pilot architectures, ranging from intelligent shift turnover assistants to soft sensors for chemical dosing, all designed with strict adherence to industrial cybersecurity standards such as IEC 62443 and NIST SP 800-82.8 By leveraging the "Air Gap" through data diodes and secure gateways, utilities can harness the power of cloud-based Large Language Models (LLMs) without compromising the integrity of critical infrastructure.
1. The Operational Landscape: From Deterministic SCADA to Probabilistic Intelligence
1.1 The Data Deluge and the Silo Problem
Modern water utilities are paradoxical environments: they are data-rich but often information-poor. A typical mid-sized treatment facility generates millions of data points daily. Programmable Logic Controllers (PLCs) poll sensors for flow, pressure, turbidity, and amperage every few seconds. Laboratory Information Management Systems (LIMS) archive daily compliance samples. Advanced Metering Infrastructure (AMI) floods billing systems with consumption data. Yet, this vast ocean of data remains fragmented across rigid, isolated silos. SCADA systems manage real-time process control; GIS manages spatial assets; CMMS tracks work orders; and ERP systems handle financials. This phenomenon, often termed the "Data Deluge," results in a cognitive overload for operators who must mentally stitch together disparate signals to make critical decisions.10
The core challenge is interoperability and context. Legacy SCADA historians (e.g., proprietary flat files) act as "data jails," storing values that are difficult to query or correlate with external datasets. Furthermore, the most critical operational data—context—is often trapped in unstructured formats: handwritten paper logbooks, shift turnover emails, and the "tribal knowledge" of senior operators.1 When a veteran operator retires, the nuanced understanding of how "Pump 2 vibrates when the wet well is low" often leaves with them. AI agents embedded within the operational architecture offer a solution by continuously interpreting telemetry, predicting issues, and synthesizing diverse data sources into a unified operational picture.1
1.2 Control Theory in the Age of AI: The Open vs. Closed Loop Tension
To architect effective AI solutions, one must understand the fundamental control philosophies governing water treatment and how AI challenges traditional paradigms.
Open-Loop Systems: The Realm of Variability
In open-loop configurations, the control action is independent of the process output. Operators manually adjust chemical dosing or pump speeds based on periodic lab tests, visual inspections, or intuition. These systems are inherently reactive and highly variable. For example, a cooling tower or an open-air clarifier is an open-loop system exposed to environmental variables (temperature, sunlight, biological load) that rigid PID controllers struggle to manage.4 The operator functions as the feedback mechanism, testing water quality and adjusting dosing pumps manually. This introduces latency and human error, as the system can drift significantly between checks.4
Closed-Loop Systems: The Standard for Stability
Closed-loop systems utilize automated feedback mechanisms where the system adjusts based on real-time sensor data. A classic example is a PID controller maintaining a tank level or a dissolved oxygen setpoint. Closed loops require internal stability and high-fidelity sensor data to function correctly. While common for hydraulic control (levels, pressures), they are less frequently applied to complex biological or chemical processes due to the risk of sensor drift, fouling, or process upset.4 A purely closed loop on a biological process can be dangerous if the sensor fails, leading to "runaway" control actions.
The AI Opportunity: Supervisory Control
AI offers a third path: "Supervisory Closed Loop" or "Open-Loop Advisory." Unlike a PID controller that reacts to a single variable, an AI agent can analyze multivariate correlations (e.g., influent temperature + ammonia load + time of day + biological oxygen demand) to predict future states and recommend setpoint optimization.5
Open-Loop Advisory: The AI predicts the optimal setpoint and presents it to the operator for validation. This keeps the human in the loop (HITL), building trust and providing a safety layer against model hallucination or sensor failure.
Supervisory Closed Loop: The AI adjusts the setpoints of the PID controllers, but the PID controllers still manage the immediate actuation. This hierarchical control ensures that if the AI fails, the PID controller falls back to a safe baseline.12
This distinction is vital for safety. As noted in the analysis of closed-loop systems, treating an open system (like a biological reactor) with the rigid logic of a closed loop can lead to instability.4 AI provides the adaptive, probabilistic logic necessary to manage the inherent variability of open systems while maintaining the safety of closed-loop constraints.
1.3 The "Intelligent Handoff": Synthesizing NLP and SCADA
The "Intelligent Handoff" represents the convergence of Natural Language Processing (NLP) and industrial telemetry. In a traditional shift handover, an outgoing operator might scribble a note: "Pump 2 was acting up during the storm." The incoming operator must then manually interpret "acting up"—does it mean vibration? Noise? Low flow?—and then manually query the SCADA historian for Pump 2's data during the storm event.
An AI-driven Intelligent Handoff automates this correlation. Using Entity Extraction and NLP, the system parses the log entry "Pump 2 acting up," identifies "Pump 2" as Asset_ID: P-201, and automatically retrieves the vibration, amperage, and flow data for the relevant timeframe.1 It identifies the "storm" context by correlating with external weather data or influent flow spikes. The system then presents the incoming operator with a synthesized insight: "Operator reported anomalies on Pump 2. Analysis confirms a 15% spike in vibration coincident with the storm surge at 14:00. Recommended Action: Schedule impeller inspection for cavitation damage."
This capability directly addresses the workforce challenge. By embedding AI in the handover process, utilities accelerate the onboarding of new staff. Junior operators are not just given a logbook; they are given a data-enriched diagnostic context that teaches them to associate qualitative observations ("noisy pump") with quantitative data signatures (vibration spikes).1 This transforms the SCADA system from a passive display into an active learning tool.
2. Evidence-Based Analysis of Market Platforms
To inform the design of custom Micro-Apps, it is essential to evaluate how commercial platforms currently tackle these challenges. The market is bifurcated between holistic "Digital Twin" platforms that aim to be the single source of truth, and specialized "Micro-Service" analytics that plug into existing stacks.
2.1 Xylem Vue powered by GoAigua: The "System of Systems"
Architecture and Deployment
Xylem Vue powered by GoAigua functions as a comprehensive "System of Systems." It utilizes a "Smart Water Engine" to integrate data from SCADA, GIS, ERP, CMMS, and IoT sensors into a unified, standardized data model.6 The platform's pedigree is significant; it was born out of Global Omnium, a Spanish utility that digitized its own operations, lending the platform operational credibility.14
Key Capabilities and ROI
Digital Twins (Water & Sewer): The platform creates detailed hydraulic models that run in near-real-time. The Water Twin and Sewer Twin allow operators to simulate scenarios—such as a valve closure or a massive storm surge—to predict network behavior before it happens.3 This moves operations from reactive to predictive.
Leak Detection Efficiency: By integrating Smart Metering (AMI) with pressure sensors and hydraulic models, the platform uses algorithms to detect spectral anomalies indicative of leaks. A case study in Hot Springs, Arkansas, demonstrated a nearly 50% reduction in non-revenue water (NRW) by utilizing these virtual district metering areas (vDMAs).15
Integrated Operations: The platform excels at breaking down silos. It visualizes commercial data (billing) alongside operational data (flow/pressure) on a single pane of glass. This holistic view enables decisions that consider both hydraulic performance and customer impact/revenue.10
Global Omnium Case Study: Implementation at Global Omnium allowed for the centralized management of 400 different services. The "Smart Water Engine" facilitated the integration of disparate legacy sensors and protocols, creating a "single version of the truth" that improved operational efficiency by approximately 15%.10
Assessment for Pilots: Xylem Vue represents the "Heavy Lift" approach. It is best suited for large utilities ready for a full enterprise transformation. Its strength lies in its deep hydraulic modeling capabilities. However, for smaller utilities or specific operational pain points, the complexity of full integration may be a barrier, highlighting the need for lighter "Micro-Apps."
2.2 Aquasight: Real-Time AIoT and "Digital Workforce"
Architecture and Philosophy
Aquasight positions itself closer to a "Real-Time Intelligence" layer that sits on top of existing SCADA and LIMS systems without requiring a full "rip and replace" or massive integration project. It emphasizes "Artificial Intelligence of Things" (AIoT) and plug-and-play modules.7
Key Modules and Capabilities
APOLLO (Wastewater Intelligence): This module focuses on process optimization and compliance. It analyzes real-time data to optimize energy and chemical usage. The Central San case study highlights how the APOLLO digital twin was used to cut energy costs, improve chemical efficiency, and flag maintenance issues early.18 It provides a "Check Engine Light" for the plant.
AMP (Asset Management): Uses AI to predict asset failure and optimize capital planning. It shifts utilities from reactive repairs to predictive replacement strategies.19
AVA (AI Assistant): Perhaps most relevant to the "Intelligent Handoff" concept, Aquasight recently introduced AVA, an LLM-based assistant. AVA allows operators to query system status using natural language (e.g., "Why is the pH high in Basin 2?"). AVA interprets complex sensor data and delivers plain-language explanations of causes and recommended actions.20
Assessment for Pilots: Aquasight offers a modular, "app-store" like approach. Its introduction of AVA validates the industry trend toward conversational interfaces and LLM integration. The "digital workforce" concept aligns with the goal of augmenting human operators rather than replacing them.
2.3 Emerging Players: Ainwater and Specialized Optimization
Ainwater: This player focuses heavily on energy efficiency and process stabilization in wastewater treatment. Their approach involves creating AI-based Digital Twins specifically for aeration optimization. By dynamically adjusting setpoints based on real-time biological load rather than fixed timers, they have demonstrated significant ROI.
Case Studies: In a pilot at a Chilean facility (4,150 m³/day), Ainwater's AI models stabilized the secondary biological treatment process and reduced energy consumption. Another deployment optimized pump and aeration schedules to align with tariff windows, directly cutting electricity costs.21
Assessment: Ainwater demonstrates that high-value optimization does not strictly require a massive platform. Targeted algorithms focusing on the most energy-intensive control loops (aeration) can yield rapid ROI and are prime candidates for "Micro-App" pilots.
2.4 ROI Synthesis and The Trust Equation
The evidence consistently points to three primary sources of ROI for AI in water treatment:
Energy Optimization: Moving from static/timer-based control to dynamic, load-based control typically yields 10-20% reductions in energy costs (aeration and pumping).1
Chemical Savings: Feed-forward control based on influent quality prediction allows for precise dosing of coagulants and polymers, reducing waste.2
Asset Life Extension: Predictive maintenance (vibration analysis, performance degradation tracking) prevents catastrophic failure, extending the useful life of expensive assets and reducing emergency "truck rolls".1
The Trust Deficit: Despite these benefits, a significant barrier remains: operator skepticism. When an AI system operates as a "Black Box," recommending actions without context, operators are hesitant to adopt it. Research indicates that trust is significantly higher when the system provides explainability—showing the reasoning behind a recommendation (e.g., "Recommend increasing airflow because ammonia load increased by 10% upstream").24 This reinforces the design requirement for our Micro-Apps: they must be transparent, providing "Just-in-Time" context to support the operator's decision-making process.
3. Technical Architecture for Secure AI Integration
Integrating cloud-based AI with on-premise Operational Technology (OT) requires a rigorous security architecture that respects the sanctity of the Industrial Control System (ICS). The architecture must bridge the "Air Gap" without creating vulnerabilities, adhering to the Defense-in-Depth principles of IEC 62443 and NIST SP 800-82.
3.1 OT/IT Convergence and Network Segmentation
The guiding framework for this architecture is the Purdue Model (ISA-95), which segments the network into distinct levels to isolate critical control systems from enterprise networks.26
Level 0-2 (OT Zone): This includes the physical processes, sensors, PLCs, and local HMIs. This zone is critical and must be isolated. No direct connection to the internet is permitted.
Level 3 (Operations Management / DMZ): This layer acts as the buffer. It hosts the Site Historian, Gateway servers, and Patch Management systems. This is the "Staging Ground" for data.
Level 4/5 (Enterprise / Cloud): The corporate network and the cloud environment where the heavy AI workloads (LLMs, complex analytics) reside.
Zones and Conduits: Following IEC 62443, we define the OT network as a high-security Zone. Any connection leaving this zone is a Conduit that must be strictly monitored and controlled. We employ a "Deny by Default" policy on all conduits.28
3.2 The Data Diode and Unidirectional Gateways
For water utilities, which are critical infrastructure, a standard software firewall is often considered insufficient for the OT perimeter. Data Diodes (hardware-enforced unidirectional gateways) are the gold standard.8
Mechanism: A data diode physically permits light (fiber optic) to travel in only one direction—from OT to IT. It physically separates the transmit and receive functions.
Security Value: This allows telemetry to flow out to the AI models in the cloud, but physically prevents any external actor (hacker, malware, or rogue AI) from sending commands back into the PLCs or SCADA network. It effectively creates an "Air Gap" for inbound traffic while allowing outbound monitoring.8
The Feedback Challenge: This architecture creates a challenge for "Closed-Loop" AI control. If the AI is in the cloud, it cannot write back to the SCADA system through a diode.
Architectural Solutions:
Open-Loop Advisory (Recommended for Pilots): The AI pushes recommendations to a Level 4 dashboard (the Micro-App). The operator views the recommendation on a corporate tablet and manually inputs the setpoint change into the SCADA HMI. This maintains the "Air Gap" logic and keeps the Human-in-the-Loop.
Verified Write-Back (Advanced): For trusted automated loops, a specialized "Replicator" or "Proxy" in the DMZ can be used. This proxy verifies the digital signature of a command from the cloud before passing it to the OT network. However, this reintroduces a bidirectional path and requires strict authentication and "Jump Host" protocols.31
3.3 Data Ingestion Patterns and APIs
To fuel the AI, data must be liberated from the proprietary historian formats.
Pattern A: OSIsoft PI Web API
The PI System is the industry standard. The PI Web API provides a RESTful interface to access time-series data.33
Integration: A Python script running on a secure Gateway in the DMZ queries the PI Web API.
Authentication: The script uses Kerberos or Bearer Token authentication to securely access the API.33
Data Flow: PLC -> PI Interface -> PI Server (Level 3) -> PI Web API -> Python Script -> Data Diode -> Cloud Data Lake.
Pattern B: SQL Historians (Wonderware/AVEVA)
Many legacy systems store data in SQL Server.
Tool: The On-Premises Data Gateway (Microsoft) allows PowerApps and Power BI to securely query on-prem SQL data without opening inbound firewall ports. It uses an outbound Azure Service Bus relay to establish a secure tunnel.36
Security: This is a preferred method for Low-Code Micro-Apps as it abstracts the complexity of VPNs while encrypting credentials and data in transit.
3.4 Entity Linking: The Semantic Bridge
Raw SCADA tags (e.g., 345-FIT-201-A) are semantically meaningless to an LLM. Entity Linking (EL) is the process of mapping these tags to a Knowledge Graph that defines their physical reality.39
Technique: We use Named Entity Recognition (NER) to extract asset names from maintenance logs. We then map these entities to the SCADA tag database using a dictionary or fuzzy matching algorithm.13
Application: When an operator types "Show me the flow on the north pump," the NLP engine resolves "north pump" -> Entity: Pump_North -> Tag: 345-FIT-201-A. It then queries the PI Web API for that specific tag's data. This semantic layer is crucial for enabling the "Intelligent Handoff".42
4. Practical Pilot Proposals: The Micro-App Ecosystem
The "Micro-App" strategy involves deploying lightweight, purpose-built applications that solve specific operational friction points. Unlike monolithic ERP upgrades, these can be built and deployed rapidly using Low-Code (Microsoft PowerApps) or Python (Streamlit) frameworks. We propose six pilots designed for immediate impact and high operator engagement.
4.1 Pilot 1: The Intelligent Shift Turnover Assistant
Objective: Transform the shift turnover log from a static text file into a dynamic, queryable knowledge base that proactively correlates operator observations with SCADA reality.
Technical Specifications:
Frontend: Microsoft PowerApps (Canvas App) accessible on ruggedized tablets.
Backend: SQL Server (for storing logs) + OSIsoft PI Web API (for fetching telemetry).
AI Engine: A Retrieval Augmented Generation (RAG) pipeline using a secured LLM (e.g., Azure OpenAI GPT-4).
Data Flow:
Ingest: Operator dictates or types: "Clarifier 2 torque was running high during the storm event."
Contextualize: The RAG system identifies "Clarifier 2" and "High Torque." It triggers a query to the PI System for the Clarifier_2_Torque tag over the last 8 hours.
Synthesize: The app appends a "Data Card" to the log entry automatically. This card displays the max torque value (e.g., "Peak: 85%") and a sparkline image of the trend.
Suggest: The LLM analyzes the context ("storm event") and prompts: "Torque exceeded 80% during high flow. Did you verify the shear pin status?".43
Cybersecurity: The app resides in the Business Network (Level 4). It reads SCADA data via the unidirectional gateway replica. It does not write back to SCADA.
4.2 Pilot 2: Interactive SOP Chatbot (SOP-GPT)
Objective: Reduce "Time to Resolution" for alarms and improve safety compliance by providing instant, conversational access to Standard Operating Procedures (SOPs) and O&M manuals.
Technical Specifications:
Data Source: Digitized PDF versions of manufacturer O&M manuals, internal safety protocols, and regulatory compliance docs.
Architecture:
Vector Database: (e.g., Pinecone or Chroma) stores the SOPs which have been "chunked" and embedded into vectors.45
Orchestrator: LangChain or Semantic Kernel to manage the query flow.
Interface: A chat widget embedded in the Operator Dashboard (Streamlit).
Logic:
Operator queries: "How do I reset the VFD fault on the Return Activated Sludge (RAS) pump?"
Retrieval: The system performs a semantic search on the Vector DB to find the specific manufacturer procedure for that VFD model.46
Generation: The LLM summarizes the steps: "1. Lock out power at MCC. 2. Wait 5 minutes for capacitors to discharge. 3. Press Reset on the faceplate..."
Safety Layer: The output includes a mandatory warning banner: "WARNING: Verify Lock-Out/Tag-Out (LOTO) procedure before proceeding. Confirm absence of voltage." This uses a hardcoded safety guardrail to prevent the AI from giving unsafe advice.47
4.3 Pilot 3: Visual Anomaly "Morning Coffee" Dashboard
Objective: Provide Plant Managers with a simplified "at-a-glance" view of system health that highlights only statistically significant deviations, filtering out noise.
Technical Specifications:
Framework: Python Streamlit. This framework is ideal for rapid data visualization.48
Visualization: "Data Cards" layout. Each card represents a critical process unit (e.g., Headworks, Filtration, Disinfection).
Logic (The "Sparkline" Concept):
Instead of just showing current values, each card displays a Sparkline (a miniature trend chart without axes) covering the last 24 hours.
Anomaly Detection: The background color of the card changes based on a Z-Score calculation. If the current value deviates by more than 2 standard deviations ($2\sigma$) from the historical rolling mean, the card turns Amber. If $>3\sigma$, it turns Red.50
Drill-down: Clicking a card opens the detailed Power BI or PI Vision report for that asset.
Code Concept: Python # Conceptual Streamlit Code import streamlit as st import pandas as pd def display_card(label, value, trend_data, z_score): color = "green" if abs(z_score) > 2: color = "orange" if abs(z_score) > 3: color = "red" with st.container(): st.markdown(f"### {label}") st.metric("Value", value) st.line_chart(trend_data) # Sparkline st.caption(f"Status: {color.upper()}")
4.4 Pilot 4: Chemical Dosing Copilot (Soft Sensor)
Objective: Optimize coagulant (e.g., Alum or Ferric) dosing by predicting the required dosage based on raw water parameters, replacing reactive manual jar tests with proactive recommendations.
Technical Specifications:
Model: A Machine Learning Regressor (e.g., XGBoost or Random Forest) trained on historical data sets containing: Raw Water Turbidity, pH, Temperature, Alkalinity, Flow Rate, and historical successful Dosing Rates.51
Inputs: Real-time streams of raw water parameters from SCADA.
Output: Predicted optimal Alum dosage (mg/L).
Workflow (Human-in-the-Loop):
The model runs every 15 minutes on an Edge Gateway or Secure Cloud instance.
It predicts the optimal dose.
Alerting: If the predicted dose differs from the current setpoint by more than a threshold (e.g., 5%), it triggers an alert in the Micro-App: "Advisory: Incoming turbidity spike detected. Model suggests increasing Alum to 25 mg/L to maintain effluent quality."
Action: The operator reviews the suggestion and manually adjusts the feed pump.
4.5 Pilot 5: Predictive Maintenance Work Order Generator
Objective: Automate the creation of maintenance tickets based on asset health signatures, moving from schedule-based to condition-based maintenance.
Technical Specifications:
Inputs: Vibration sensors (accelerometers) and Amperage readings on large pumps/blowers.
Logic:
Detection: An algorithm (e.g., Fast Fourier Transform analysis on the edge or simple thresholding) detects a developing bearing fault pattern (e.g., rising vibration at 2x running speed).
Correlation: The system checks if the pump is actually running (to avoid false positives from sensor noise).
Action: It interacts with the CMMS API (e.g., Maximo or SAP).
Output: It drafts a Work Order populated with the asset data, the vibration chart, and a "High Priority" flag.
Notification: The Maintenance Supervisor receives a push notification on their mobile device via the Micro-App to approve the Work Order.1
4.6 Pilot 6: Disinfection Byproduct (DBP) Formation Predictor
Objective: Predict the formation of regulated DBPs (THMs/HAAs) in the distribution system days before they form, allowing for proactive treatment adjustments.
Technical Specifications:
Model: A "Soft Sensor" model trained on water age, temperature, chlorine residual, and TOC (Total Organic Carbon).2
Application:
The model calculates the predicted TTHM (Total Trihalomethanes) levels at critical remote points in the distribution network.
Scenario Analysis: Operators can use sliders in the App to simulate: "If I reduce Chlorine at the plant by 0.2 mg/L, what is the impact on TTHMs at the farthest node?"
Regulatory Value: This helps utilities avoid violations and public health notices by managing the precursor conditions rather than reacting to a failed lab test week later.2
5. Horizon Scanning: The Future of the Cognitive Utility
As we look toward 2030, emerging technologies will mature from research concepts to operational necessities, further blurring the line between physical and digital operations.
5.1 Multimodal AI: Fusion of Vision and Telemetry
Current AI in water focuses largely on tabular, time-series data. The future is Multimodal, combining numbers with video, sound, and text.
Concept: A camera monitors the surface of a secondary clarifier. The AI uses Computer Vision to detect visual anomalies like "pinpoint floc," "ash," or "carryover" events that a submerged turbidity sensor might miss or misinterpret.
Data Fusion: The AI fuses this visual data with the SCADA flow rates and sludge blanket depth sensors.
Inference: "Visual confirmation of sludge blanket rising (Vision) + Flow surge detected (SCADA) = High Confidence Process Upset." This cross-validation significantly reduces false alarms.53
5.2 Advanced Computer Vision for Flocculation Analysis
Optimizing coagulation is currently an art form relying on visual inspection of "jar tests." Computer vision turns it into a rigorous science.
Technique: High-speed cameras and microscopic imaging (in-situ) capture images of floc particles in the flocculation basin.
Algorithms: Deep Learning models, specifically Convolutional Neural Networks (CNNs) such as U-Net or Mask R-CNN, segment the images to calculate precise metrics: Mean Floc Size, Fractal Dimension, and Settling Velocity.56
Control Loop: This visual feedback provides a real-time signal for polymer dosing. If floc size decreases below a target, the system advises an increase in polymer, optimizing the physical chemistry based on direct observation.57
5.3 Agentic AI and Autonomous Negotiation
Agentic AI refers to AI systems that can plan, reason, and execute multi-step goals, rather than just predicting a value.59
Future Scenario: A "Energy Management Agent" negotiates with a "Hydraulic Control Agent."
Energy Agent: "Electricity prices are peaking between 4 PM and 8 PM. I want to shut down High Service Pump 3."
Hydraulic Agent: "Simulating impact... Tank levels allow for a 1.5-hour shutdown, but pressure will drop below 40 PSI in Zone 4 after that. Counter-proposal: Shutdown Pump 3 for 1.5 hours, but pre-fill Tank A now to buffer the pressure."
Outcome: The agents agree on an optimized schedule that saves cost without violating the "Service Level Agreement" (minimum pressure), all without human intervention.1
Conclusion and Strategic Roadmap
The transition to a cognitive utility is not a "big bang" replacement of legacy systems, but a thoughtful, layered augmentation. It begins with Data Hygiene—liberating data from historians and creating semantic links between assets and tags. It proceeds to Augmentation—deploying Micro-Apps like the Shift Turnover Assistant that capture human intelligence and build trust. It matures into Optimization—using soft sensors and digital twins to drive efficiency.
Strategic Recommendations for Directors:
Prioritize the "Intelligent Handoff": Start with the Shift Turnover Micro-App. It solves an immediate pain point (workforce training) and has low technical risk, building momentum for AI adoption.
Bridge the Air Gap Responsibly: Do not compromise security for convenience. Use Data Diodes for telemetry export and maintain HITL protocols for control actions until "Verified Write-Back" technologies are fully mature.
Invest in Explainability: Ensure every AI pilot includes an "Explain" feature. Operators must understand why the AI is making a recommendation. Trust is the currency of the control room.
Embrace Hybrid Intelligence: Position AI as a "Co-Pilot" (like Aquasight's AVA). The goal is to make the operator superhuman, not to make the plant unmanned.
By following this architectural blueprint, water utilities can navigate the complexities of digital transformation, ensuring a resilient, efficient, and sustainable future for the critical infrastructure that sustains life.
Appendix: Pilot Implementation Tables
Table 1: Comparative Analysis of OT Data Connectivity Methods
Table 2: Pilot 1 (Shift Turnover) Technical Stack
Table 3: Near-Future Technology Roadmap (2025-2030)
Works cited
AI Agents in SCADA & Remote Monitoring for Water Utilities | Digiqt ..., accessed December 26, 2025, https://digiqt.com/blog/ai-agents-in-scada-&-remote-monitoring-for-water-utilities/
Machine learning and the future of water quality monitoring - Carollo Engineers, accessed December 26, 2025, https://carollo.com/publications/machine-learning-water-quality-monitoring/
Case studies - IDRICA, accessed December 26, 2025, https://www.idrica.com/case-studies/
Closed Loop vs Open Loop: How to Treat Each System - R2J Chemical Services, accessed December 26, 2025, https://www.r2j.com/blog/closed-loop-vs-open-loop/
Safe and Trustful AI for Closed-Loop Control Systems - MDPI, accessed December 26, 2025, https://www.mdpi.com/2079-9292/12/16/3489
How AI and digital twins are changing the paradigm in treatment plants - Idrica, accessed December 26, 2025, https://www.idrica.com/blog/how-ai-and-digital-twins-are-changing-the-paradigm-in-treatment-plants/
Aquasight | AI-Powered Water & Wastewater Intelligence for Utilities, accessed December 26, 2025, https://www.aquasight.io/
Data Diode vs Firewall: Understanding the Key Differences in OT Security, accessed December 26, 2025, https://waterfall-security.com/ot-insights-center/ot-cybersecurity-insights-center/data-diode-vs-firewall-understanding-the-key-differences-in-ot-security/
Guide to Industrial Control Systems (ICS) Security - NIST Technical Series Publications, accessed December 26, 2025, https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.SP.800-82r1.pdf?utm_source=chatgpt.com
Data, Amplified: Empowering Water Utilities with Integrated Solutions, accessed December 26, 2025, https://www.iwa-network.org/blogs/data-amplified-empowering-water-utilities-with-integrated-solutions
Natural Language Processing of Maintenance Records Data - DiVA portal, accessed December 26, 2025, https://www.diva-portal.org/smash/get/diva2:975548/FULLTEXT01.pdf
How edge computing and AI can revolutionize SCADA systems—use cases in the water & wastewater industry - Schneider Electric Blog, accessed December 26, 2025, https://blog.se.com/industry/2024/09/10/how-edge-computing-and-ai-can-revolutionize-scada-systems-use-cases-in-the-water-wastewater-industry/
[2108.05454] Extracting Semantics from Maintenance Records - arXiv, accessed December 26, 2025, https://arxiv.org/abs/2108.05454
Case Study: GoAigua digitally transforms Global Omnium - Idrica, accessed December 26, 2025, https://www.idrica.com/case-studies/digital-transformation-global-omnium/
Partnership in action: Driving digital transformation for water utilities | Xylem Ireland, accessed December 26, 2025, https://www.xylem.com/en-ie/making-waves/water-utilities-news/partnership-in-action-driving-digital-transformation-for-water-utilities/
How a digital twin enhanced daily operations in Global Omnium's drinking water network - Xylem, accessed December 26, 2025, https://www.xylem.com/siteassets/brand/xylem-vue/resources/case-studies/digital-twin-case-study-paper_en.pdf
AI-Driven Wastewater Treatment Intelligence with APOLLO™ | Aquasight, accessed December 26, 2025, https://www.aquasight.io/solutions/wastewater-treatment?hsLang=en
Digital Twins in Action: How Central San is Reimagining Wastewater Process Optimization, accessed December 26, 2025, https://www.aquasight.io/blog/central-san-digital-twin-efficiency
Water Utilities Insights for Smarter Operations - Aquasight, accessed December 26, 2025, https://www.aquasight.io/industry-insights
A New Era of Utility Intelligence: Aquasight Unveils Three Breakthrough Innovations at WEFTEC 2025, accessed December 26, 2025, https://www.aquasight.io/blog/aquasight-weftec-2025-ai-data-asset-management-innovations?hsLang=en
Ainwater: the intelligence of water - UpLink - Contribution, accessed December 26, 2025, https://uplink.weforum.org/uplink/s/uplink-contribution/a01TE000009zOBgYAM/ainwater-the-intelligence-of-water
Machine Learning And The Future Of Water Quality Monitoring, accessed December 26, 2025, https://www.wateronline.com/doc/machine-learning-and-the-future-of-water-quality-monitoring-0001
AI in the Water Treatment Industry: Applications, Trends, and Digital Transformation, accessed December 26, 2025, https://ips-ai.com/resource-centre/blogs/ai-in-the-water-treatment-industry-applications-trends-and-digital-transformation/
Full article: Analyzing Operator States and the Impact of AI-Enhanced Decision Support in Control Rooms: A Human-in-the-Loop Specialized Reinforcement Learning Framework for Intervention Strategies, accessed December 26, 2025, https://www.tandfonline.com/doi/full/10.1080/10447318.2024.2391605
Smart water systems perform better with smart people: Water operators in the age of AI, accessed December 26, 2025, https://smartwatermagazine.com/blogs/marcello-michael-serrao/smart-water-systems-perform-better-smart-people-water-operators-age-ai
What Is the Purdue Model for ICS Security? | A Guide to PERA - Palo Alto Networks, accessed December 26, 2025, https://www.paloaltonetworks.com/cyberpedia/what-is-the-purdue-model-for-ics-security
Understanding the Purdue Model for ICS & OT Security - Sangfor Technologies, accessed December 26, 2025, https://www.sangfor.com/glossary/cybersecurity/what-is-purdue-model-ics-security
Understanding ISA/IEC 62443: A Guide for OT Security Teams - Dragos, accessed December 26, 2025, https://www.dragos.com/blog/isa-iec-62443-concepts
What Is IEC 62443? Definition, Breakdown & Methodology - Zscaler, accessed December 26, 2025, https://www.zscaler.com/zpedia/what-is-iec-62443
Data Diode Use for Water and Wastewater Infrastructure - Fend Incorporated, accessed December 26, 2025, https://www.fend.tech/water-infrastructure
Internet-Exposed HMIs Pose Cybersecurity Risks to Water and Wastewater Systems | EPA, accessed December 26, 2025, https://www.epa.gov/system/files/documents/2024-12/joint-factsheet-epa-cisa-internet-exposed-human-machine-interfaces-508c.pdf
New CISA and EPA guidelines aim to shield water and wastewater systems from cyber threats, accessed December 26, 2025, https://industrialcyber.co/utilities-energy-power-water-waste/new-cisa-and-epa-guidelines-aim-to-shield-water-and-wastewater-systems-from-cyber-threats/
Using PI Web API with Python - AVEVA Community, accessed December 26, 2025, https://community.aveva.com/pi-square-community/b/aveva-blog/posts/using-pi-web-api-with-python
Mastering the PI Web API: From Beginner to Advanced | OSI AVEVA - PI System - PiSolved, accessed December 26, 2025, https://www.pisolved.in/article/52/mastering-the-pi-web-api-from-beginner-to-advanced
requests-kerberos - PyPI, accessed December 26, 2025, https://pypi.org/project/requests-kerberos/
PowerApps with SQL server on-premises – Using the Gateway, Part One, accessed December 26, 2025, https://community.powerplatform.com/blogs/post/?postid=1a933286-ca02-4d11-bd55-05845ac30b36
Connect to your on-premises data sources from PowerApps using on-premises data gateway - Microsoft Power Platform Blog, accessed December 26, 2025, https://www.microsoft.com/en-us/power-platform/blog/power-apps/connect-to-your-on-premises-data-sources-using-on-premises-data-gateway-from-powerapps/
About on-premises gateways - Power Platform - Microsoft Learn, accessed December 26, 2025, https://learn.microsoft.com/en-us/power-platform/admin/wp-onpremises-gateway
What is Entity Linking | Ontotext Fundamentals, accessed December 26, 2025, https://www.ontotext.com/knowledgehub/fundamentals/what-is-entity-linking/
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions - Database Group, accessed December 26, 2025, https://dbgroup.cs.tsinghua.edu.cn/wangjy/papers/TKDE14-entitylinking.pdf
End-to-End Structured Extraction with LLM — Part 1: Batch Entity Extraction | by AI on Databricks | Medium, accessed December 26, 2025, https://medium.com/@AI-on-Databricks/end-to-end-structured-extraction-with-llm-part-1-batch-entity-extraction-876ce17b290f
Improving “entity linking” between texts and knowledge bases - Amazon Science, accessed December 26, 2025, https://www.amazon.science/blog/improving-entity-linking-between-texts-and-knowledge-bases
RAGOps: Operating and Managing Retrieval-Augmented Generation Pipelines - arXiv, accessed December 26, 2025, https://arxiv.org/html/2506.03401v1
Applying Generative AI to Create SOP, Reducing API Costs Through Prompt Compression and Evaluating LLM Responses with Tonic Validate RAG Metrics - IEEE Xplore, accessed December 26, 2025, https://ieeexplore.ieee.org/document/10867024/
(PDF) AI-Powered Standard Operating Procedure Generation and Optimization Using Large Language Models and Chroma Databases in Chemistry - ResearchGate, accessed December 26, 2025, https://www.researchgate.net/publication/392676440_AI-Powered_Standard_Operating_Procedure_Generation_and_Optimization_Using_Large_Language_Models_and_Chroma_Databases_in_Chemistry
Transforming Industries with Retrieval-Augmented Generation (RAG): A Comprehensive Exploration of Recent Advances | by Jesus Ruiz | Medium, accessed December 26, 2025, https://medium.com/@jesus.ruiz_76380/transforming-industries-with-retrieval-augmented-generation-rag-a-comprehensive-exploration-of-9eba21e7c75d
RAGuard: A Novel Approach for in-context Safe Retrieval Augmented Generation for LLMs, accessed December 26, 2025, https://arxiv.org/html/2509.03768v1
Building a dashboard in Python using Streamlit - Show the Community!, accessed December 26, 2025, https://discuss.streamlit.io/t/building-a-dashboard-in-python-using-streamlit/60621
Streamlit Real-time Design Patterns: Creating Interactive and Dynamic Data Visualizations, accessed December 26, 2025, https://dev-kit.io/blog/python/streamlit-real-time-design-patterns-creating-interactive-and-dynamic-data-visualizations
Optimizing Performance of Equipment Fleets Under Dynamic Operating Conditions: Generalizable Shift Detection and Multimodal LLM-Assisted State Labeling - MDPI, accessed December 26, 2025, https://www.mdpi.com/2071-1050/18/1/132
Development of a Soft Sensor Using Machine Learning Algorithms for Predicting the Water Quality of an Onsite Wastewater Treatment System - PMC - PubMed Central, accessed December 26, 2025, https://pmc.ncbi.nlm.nih.gov/articles/PMC10515708/
Development of water quality prediction model for water treatment plant using artificial intelligence algorithms - Environmental Engineering Research, accessed December 26, 2025, https://www.eeer.org/journal/view.php?number=1512
Multimodal AI – 10 Innovative Applications and Real-World Examples - Appinventiv, accessed December 26, 2025, https://appinventiv.com/blog/multimodal-ai-applications/
MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network - MDPI, accessed December 26, 2025, https://www.mdpi.com/1424-8220/24/2/637
Multi-modal Time Series Analysis: A Tutorial and Survey - arXiv, accessed December 26, 2025, https://arxiv.org/html/2503.13709v1
Machine Learning in Flocculant Research and Application: Toward ..., accessed December 26, 2025, https://www.mdpi.com/2297-8739/12/8/203
Low-cost methodology for the characterization of floc size in low turbidity and low alkalinity waters using image analysis | Water Practice & Technology | IWA Publishing, accessed December 26, 2025, https://iwaponline.com/wpt/article/17/4/887/87789/Low-cost-methodology-for-the-characterization-of
Using image texture to monitor the growth and settling of flocs - ResearchGate, accessed December 26, 2025, https://www.researchgate.net/publication/373857777_Using_image_texture_to_monitor_the_growth_and_settling_of_flocs
AI Agents in Your Water Utility: A Powerful Co-Pilot or a Risky Takeover? - Qatium, accessed December 26, 2025, https://qatium.com/blog/ai-agents-in-your-water-utility-a-powerful-co-pilot-or-a-risky-takeover/
Building autonomous water utility operations with agentic AI on AWS, accessed December 26, 2025, https://aws.amazon.com/blogs/industries/building-autonomous-water-utility-operations-with-agentic-ai-on-aws/
99 of the Best Power Apps Examples Across 11 Industries - Bespoke XYZ, accessed December 26, 2025, https://www.bespoke.xyz/best-power-apps-examples/
How AVEVA™ PI Data Infrastructure enables a next-generation AVEVA™ PI System™ for utilities, accessed December 26, 2025, https://cdn.osisoft.com/osi/presentations/2023-AVEVA-San-Francisco/UC23NA-1PAU01-AVEVA-Moore-How-hybrid-data-infrastructure-based-on--PI-System-empowers-utilities-for-the-future.pdf