AI and GNN-Based Streamflow Prediction in Data-Sparse and Ungauged Basins

March 28, 2026

As climate change intensifies hydrological extremes such as floods, droughts, and irregular seasonal runoff, reliable streamflow prediction has become increasingly important for water resources management, disaster preparedness, reservoir operation, and climate adaptation. However, many river basins around the world remain data-sparse, with limited or deteriorating gauging networks. In these regions, it is often difficult to monitor streamflow continuously or to calibrate hydrological models using sufficient observations.

This challenge is especially severe in ungauged basins, where streamflow observations are unavailable or highly incomplete. Conventional hydrological prediction in such regions has traditionally relied on process-based hydrological models, which attempt to simulate rainfall-runoff processes using physical equations and watershed characteristics. While these models are scientifically meaningful, they usually require extensive calibration, expert knowledge, and substantial computational cost. Their performance may also degrade when transferred to locations with limited observations.

This is exactly where AI can provide transformative value. Instead of relying solely on manually calibrated parameters or local observations, AI can learn complex nonlinear spatial and temporal relationships directly from hydrological states, meteorological forcing, and river-network structure. When combined with physically meaningful guidance from hydrological models, AI can support more scalable, transferable, and efficient streamflow prediction, especially in regions where conventional approaches struggle most.

Our Goal

We aim to develop a robust and scalable streamflow prediction framework for prediction in ungauged basins by integrating process-based hydrological understanding with modern AI architectures. Our research introduces a Process-Guided Graph-Transformer framework that improves streamflow prediction across gauged and ungauged river basins without explicitly calibrating the internal parameters of a process-based hydrological model.

Rather than replacing hydrological knowledge, our goal is to use it more effectively. By leveraging uncalibrated outputs from a physically based model and combining them with graph neural networks and transformers, we seek to build a framework that remains physically consistent while achieving better predictive skill, stronger generalization, and lower computational burden.

Our goal is to provide reliable streamflow estimates in data-sparse regions where local calibration is impractical, and to support hydrological prediction in diverse climate regimes with a framework that is both transferable and efficient.

Our Approach

Our approach combines hydrological modeling, spatial graph learning, and temporal sequence learning into a unified framework.

First, we use a process-based hydrological model to generate physically meaningful features such as soil moisture, groundwater flow, evapotranspiration, runoff, and water yield. Importantly, these model outputs are used without calibrating the internal hydrological parameters for each basin. This allows us to preserve physical structure while avoiding the heavy burden of basin-specific calibration.

Next, we represent river systems as connected graphs, where subbasins and stream nodes are linked through upstream-downstream relationships. A Graph Neural Network is then used to learn how hydrological information propagates across the river network. This enables the model to capture spatial dependencies and transfer information from gauged locations to ungauged ones.

At the same time, a Transformer-based temporal encoder learns how hydrological conditions evolve over time, including antecedent rainfall, runoff memory, and delayed flow response. By integrating graph learning and temporal attention, the framework captures both the spatial structure of the watershed and the temporal dynamics of streamflow generation.

This process-guided design allows the model to function as a surrogate for traditional hydrological model calibration while improving predictive performance. It also supports realistic prediction in ungauged basins by withholding observations at selected transfer nodes during training and testing the model under strict prediction-in-ungauged-basin conditions.

Our results show that this framework improves predictive skill over both uncalibrated and calibrated baseline hydrological models, enhances extreme-event detection, reduces computational cost, and maintains robust transferability across diverse hydro-climatic regions.

Why This Matters

This research is important because it addresses one of the most persistent challenges in hydrology: how to make reliable streamflow predictions where observations are sparse or unavailable.

The proposed framework is meaningful in several ways. It reduces dependence on local calibration, which is often the biggest obstacle in applying hydrological models in practice. It uses globally accessible inputs, making the approach more scalable across regions and countries. It also strengthens physical consistency by incorporating hydrological process information instead of relying on purely black-box learning.

Beyond improving overall streamflow prediction, the framework also helps with event-based applications such as flood detection and extreme-flow analysis. Because it combines process-guided inputs, river connectivity, and temporal learning, it can better identify hydrologically meaningful events while maintaining reasonable uncertainty estimates.

In this sense, the framework offers a promising pathway toward next-generation hydrological prediction systems that are physically informed, computationally efficient, and operationally useful in regions with limited monitoring infrastructure.

Tools and Expertise

Our research builds on advanced hydrological modeling, AI, and geospatial analysis capabilities:

Programming: Python and deep learning frameworks for model development and large-scale hydrological learning

Hydrological Modeling: Process-based simulation using physically meaningful watershed states and runoff components

AI Modeling: Graph Neural Networks and Transformer architectures for spatial-temporal streamflow prediction

Spatial Analysis: River-network topology, subbasin connectivity, and watershed-level feature extraction

Computing: Efficient model training and surrogate modeling to reduce traditional calibration cost

Domain Knowledge: Expertise in hydrology, ungauged basin prediction, flood processes, and hybrid physics-AI modeling

Key Areas of Research Interest

In addition to streamflow prediction, this project explores the following topics:

(1) Developing process-guided AI frameworks for prediction in ungauged basins

(2) Learning river-network connectivity through graph-based hydrological modeling

(3) Improving temporal prediction of runoff and streamflow using transformer architectures

(4) Reducing dependence on calibration-intensive process-based hydrological models

(5) Enhancing extreme-event detection for flood forecasting and risk assessment

(6) Quantifying predictive uncertainty in data-sparse hydrological environments

(7) Building scalable surrogate frameworks for multi-basin and multi-site hydrological applications

Broader Impact

By advancing process-guided AI for hydrology, our research aims to strengthen streamflow prediction in regions where conventional calibration-based modeling is difficult or impossible. The framework provides a practical and scientifically grounded solution for transferring hydrological information from gauged to ungauged locations, while maintaining computational efficiency and physical plausibility.

Ultimately, this work supports more reliable flood forecasting, drought preparedness, water allocation planning, and climate-resilient water management. It also demonstrates how AI and hydrological science can be combined to build transferable, scalable, and next-generation prediction systems for real-world environmental challenges.

‍

Contact

Read other projects

From Iran to North Korea: Investigating Flood Impacts with AI and Satellite Data Sets

This project focuses on improving weather prediction by integrating multiple satellite-based soil moisture observations into a coupled land-atmosphere modeling system. By combining radar and radiometer data, it aims to better capture land surface conditions and their influence on atmospheric processes. The project seeks to enhance predictions of temperature, humidity, and precipitation through more accurate and physically consistent representation of soil moisture dynamics.