AI and GNN-Based Streamflow Prediction in Data-Sparse and Ungauged Basins

March 20, 2026

As climate change intensifies hydrological extremes such as floods, droughts, and irregular seasonal runoff, reliable streamflow prediction has become increasingly important for water resources management, disaster preparedness, reservoir operation, and climate adaptation. However, many river basins around the world remain data-sparse, with limited or deteriorating gauging networks. In these regions, it is often difficult to monitor streamflow continuously or to calibrate hydrological models using sufficient observations.

This challenge is especially severe in ungauged basins, where streamflow observations are unavailable or highly incomplete. Conventional hydrological prediction in such regions has traditionally relied on process-based hydrological models, which attempt to simulate rainfall-runoff processes using physical equations and watershed characteristics. While these models are scientifically meaningful, they usually require extensive calibration, expert knowledge, and substantial computational cost. Their performance may also degrade when transferred to locations with limited observations.

This is exactly where AI can provide transformative value. Instead of relying solely on manually calibrated parameters or local observations, AI can learn complex nonlinear spatial and temporal relationships directly from hydrological states, meteorological forcing, and river-network structure. When combined with physically meaningful guidance from hydrological models, AI can support more scalable, transferable, and efficient streamflow prediction, especially in regions where conventional approaches struggle most.

Our Goal

We aim to develop a robust and scalable streamflow prediction framework for prediction in ungauged basins by integrating process-based hydrological understanding with modern AI architectures. Our research introduces a Process-Guided Graph-Transformer framework that improves streamflow prediction across gauged and ungauged river basins without explicitly calibrating the internal parameters of a process-based hydrological model.

Rather than replacing hydrological knowledge, our goal is to use it more effectively. By leveraging uncalibrated outputs from a physically based model and combining them with graph neural networks and transformers, we seek to build a framework that remains physically consistent while achieving better predictive skill, stronger generalization, and lower computational burden.

Our goal is to provide reliable streamflow estimates in data-sparse regions where local calibration is impractical, and to support hydrological prediction in diverse climate regimes with a framework that is both transferable and efficient.

Our Approach

Our approach combines hydrological modeling, spatial graph learning, and temporal sequence learning into a unified framework.

First, we use a process-based hydrological model to generate physically meaningful features such as soil moisture, groundwater flow, evapotranspiration, runoff, and water yield. Importantly, these model outputs are used without calibrating the internal hydrological parameters for each basin. This allows us to preserve physical structure while avoiding the heavy burden of basin-specific calibration.

Next, we represent river systems as connected graphs, where subbasins and stream nodes are linked through upstream-downstream relationships. A Graph Neural Network is then used to learn how hydrological information propagates across the river network. This enables the model to capture spatial dependencies and transfer information from gauged locations to ungauged ones.

At the same time, a Transformer-based temporal encoder learns how hydrological conditions evolve over time, including antecedent rainfall, runoff memory, and delayed flow response. By integrating graph learning and temporal attention, the framework captures both the spatial structure of the watershed and the temporal dynamics of streamflow generation.

This process-guided design allows the model to function as a surrogate for traditional hydrological model calibration while improving predictive performance. It also supports realistic prediction in ungauged basins by withholding observations at selected transfer nodes during training and testing the model under strict prediction-in-ungauged-basin conditions.

Our results show that this framework improves predictive skill over both uncalibrated and calibrated baseline hydrological models, enhances extreme-event detection, reduces computational cost, and maintains robust transferability across diverse hydro-climatic regions.

Why This Matters

This research is important because it addresses one of the most persistent challenges in hydrology: how to make reliable streamflow predictions where observations are sparse or unavailable.

The proposed framework is meaningful in several ways. It reduces dependence on local calibration, which is often the biggest obstacle in applying hydrological models in practice. It uses globally accessible inputs, making the approach more scalable across regions and countries. It also strengthens physical consistency by incorporating hydrological process information instead of relying on purely black-box learning.

Beyond improving overall streamflow prediction, the framework also helps with event-based applications such as flood detection and extreme-flow analysis. Because it combines process-guided inputs, river connectivity, and temporal learning, it can better identify hydrologically meaningful events while maintaining reasonable uncertainty estimates.

In this sense, the framework offers a promising pathway toward next-generation hydrological prediction systems that are physically informed, computationally efficient, and operationally useful in regions with limited monitoring infrastructure.

Tools and Expertise

Our research builds on advanced hydrological modeling, AI, and geospatial analysis capabilities:

Programming: Python and deep learning frameworks for model development and large-scale hydrological learning

Hydrological Modeling: Process-based simulation using physically meaningful watershed states and runoff components

AI Modeling: Graph Neural Networks and Transformer architectures for spatial-temporal streamflow prediction

Spatial Analysis: River-network topology, subbasin connectivity, and watershed-level feature extraction

Computing: Efficient model training and surrogate modeling to reduce traditional calibration cost

Domain Knowledge: Expertise in hydrology, ungauged basin prediction, flood processes, and hybrid physics-AI modeling

Key Areas of Research Interest

In addition to streamflow prediction, this project explores the following topics:

(1) Developing process-guided AI frameworks for prediction in ungauged basins

(2) Learning river-network connectivity through graph-based hydrological modeling

(3) Improving temporal prediction of runoff and streamflow using transformer architectures

(4) Reducing dependence on calibration-intensive process-based hydrological models

(5) Enhancing extreme-event detection for flood forecasting and risk assessment

(6) Quantifying predictive uncertainty in data-sparse hydrological environments

(7) Building scalable surrogate frameworks for multi-basin and multi-site hydrological applications

Broader Impact

By advancing process-guided AI for hydrology, our research aims to strengthen streamflow prediction in regions where conventional calibration-based modeling is difficult or impossible. The framework provides a practical and scientifically grounded solution for transferring hydrological information from gauged to ungauged locations, while maintaining computational efficiency and physical plausibility.

Ultimately, this work supports more reliable flood forecasting, drought preparedness, water allocation planning, and climate-resilient water management. It also demonstrates how AI and hydrological science can be combined to build transferable, scalable, and next-generation prediction systems for real-world environmental challenges.

Contact

Read other projects

Multi-Sensor Satellite Data Assimilation for Numerical Weather Prediction

This project focuses on improving weather prediction by integrating multiple satellite-based soil moisture observations into a coupled land-atmosphere modeling system. By combining radar and radiometer data, it aims to better capture land surface conditions and their influence on atmospheric processes. The project seeks to enhance predictions of temperature, humidity, and precipitation through more accurate and physically consistent representation of soil moisture dynamics.

Read this project
AI and GNN-Based Streamflow Prediction in Data-Sparse and Ungauged Basins

This project focuses on advancing streamflow prediction in data-sparse and ungauged basins using AI and graph neural networks. By capturing spatial relationships between river networks and hydrological processes, it aims to overcome limitations of traditional models that rely on dense observations. The project seeks to improve prediction accuracy and generalization across regions, enabling more reliable water resource management and flood forecasting.

Read this project
AI-Driven Water Body Detection Using Satellite Constellations and SAR: A GNSS-R and Sentinel Fusion Approach

We are developing an AI based water body detection system that combines CYGNSS satellite constellation data with Sentinel optical and SAR observations. CYGNSS provides all weather and high frequency measurements, while Sentinel offers detailed spatial information. By applying AI based fusion and classification, our approach identifies both permanent and temporary surface water even under clouds, vegetation, or limited ground monitoring. This research supports flood response, drought assessment, and improved water resource management in data denied regions.

Read this project
Building Koreas First NASA Core Validation Site for Advanced Earth Observation Satellites

-

Read this project
WRF-Hydro and AI: Advancing Hydrological Modeling with Next-Generation Tools with AI

-

Read this project
Physics-Informed Large AI Models for Hydrology and the Water Cycle

-

Read this project
Multi-Scale Soil Moisture Monitoring Using ELBARA-III and Drone Radiometer

The Portable L-Band Radiometer, mounted on a drone, enables high-resolution soil moisture retrieval across diverse terrains. Operating at 1.4 GHz (L-band), it collects brightness temperature (TB) data over rice paddies and varied vegetation zones, complementing satellite missions like SMAP and SMOS. By integrating airborne observations with ground-based sensors, the system enhances spatial coverage and improves soil moisture retrieval models for climate and agricultural applications. This drone-based approach significantly increases data accuracy and efficiency compared to traditional methods.

Read this project
Developing the Long-Term Brightness Temperature Measurement Site in South Korea

Soil moisture (SM) is essential for agriculture and hydrometeorology but difficult to measure. To improve validation, Korea’s first Core Validation Site is being built in Hampyeong-gun and Naju-si with TEROS sensors, ESA’s ELBARA-III radiometer, and drone-based PoLRa. The site will provide continuous SM and temperature data, advancing satellite validation, retrieval models, and climate resilience research.

Read this project
AI-Based Water Quality Prediction for Inland Water using Satellite and Land Surface Model Data

Climate change-driven heatwaves and water cycle disruptions threaten inland water quality (WQ), necessitating efficient monitoring. Traditional methods are labor-intensive with limited coverage, prompting us to develop an AI-based model for predicting chlorophyll-a concentrations in lakes and rivers. By integrating high-resolution satellite imagery (Landsat-8/9, Sentinel-2/3) with land surface models (ERA5-Land, GLDAS, MERRA-2), our model tailors predictions to different water bodies. Future plans include incorporating socio-statistical data (e.g., population, livestock) and climate scenarios (RCP, SSP). Using AI, GIS, and high-performance computing, we explore low-concentration chlorophyll-a prediction, precipitation and flow speed impacts on WQ, transfer learning, multi-sensor data fusion, and uncertainty quantification. Beyond chlorophyll-a, we aim to extend predictions to turbidity and dissolved oxygen, providing a comprehensive AI-driven approach to monitoring and mitigating climate change effects on inland water quality.

Read this project
Harnessing Deep Learning to Predict and Decode the Mysteries of Flash Droughts (GAN/SHAP/3D-CNN with Transfer Learning)

The application of deep learning in predicting flash droughts offers a transformative approach to understanding and anticipating these rapid-onset events, significantly enhancing preparedness and response strategies. By unraveling the complex mechanisms behind flash droughts, this project aims to provide precise, timely forecasts, thereby mitigating the severe agricultural, ecological, and socioeconomic impacts associated with these phenomena.

Read this project
Streamflow and Drought Predictions over Ungaged Regions using Deep and Transfer Learning Approaches

Streamflow and flash drought predictions are essential for managing water resources and mitigating potential disasters in ungaged regions. With remotely-sensed data, deep and transfer learning approaches provide powerful tools to analyze complex hydrological data, enabling more accurate predictions and better decision-making in these areas.

Read this project
Applications of Bayesian Machine Learning in Big Data in Earth Science

Bayesian methods help us improve our guesses by using new information. In Earth science, these methods are applied to big data to better understand our planet. This approach is useful for predicting things like natural disaster patterns and climate changes. By continuously updating our knowledge with new data, we can make more accurate predictions and decisions in Earth science.

Read this project
Water Balance Budgeting with Bayesian Machine Learning

The water balance equation in Earth science, P = E + R + etc, describes the relationship between precipitation (P), evaporation (E), runoff (R), and etc (e.g., soil moisture, ground water) in a given area. Bayesian inference can be applied to solve this equation by incorporating prior knowledge and updating the probability distributions of the variables based on new data, ultimately improving water resource management and prediction.

Read this project
Integrating Earth Science and Engineering for Climate Resilience: Innovative Approaches to Infrastructure and Societal Justice

Earth science informs infrastructure development by providing insights into site suitability, resource management, and sustainable design, enhancing the resilience and long-term viability of projects. It also plays a crucial role in addressing societal justice related to climate change by helping identify vulnerable communities and develop mitigation strategies, ensuring equitable access to resources and protection from environmental hazards.

Read this project
Enhancing Earth Science Predictions through Advanced Data Assimilation Techniques

Data assimilation is vital in earth science as it integrates diverse observations and model simulations, improving the accuracy of forecasts and predictions. This process enhances our understanding of complex Earth systems, enabling better decision-making for environmental management and climate adaptation.

Read this project
Floods and Droughts Predictions using Machine Learning Approaches

Satellite data and machine learning transformed Earth science by predicting and monitoring natural disasters. This combination delivers precise and timely predictions, crucial for mitigating the impacts of events like floods and droughts.

Read this project
Data Error Characterizations

Characterizing the error of satellite data and land surface models is vital in Earth science, as it ensures the accuracy and reliability of information used for monitoring and predicting environmental phenomena. By understanding these errors, scientists can refine data interpretation, enhance models, and ultimately make better-informed decisions about the Earth's complex systems.

Read this project
Developing Algorithms to Improve the Temporal Sampling of Satellite Data

Enhancing the temporal repeat of satellite data for obtaining soil moisture information is a vital research area due to its implications for agriculture, water resource management, climate change research, and ecosystem health. It helps in making informed decisions, increasing productivity, and reducing the impact of natural disasters, as well as contributing to our understanding of the global climate system.

Read this project
Exploring the Impact of Human Activities on the Subdaily Global Terrestrial Water Cycle

Humans have been modifying the Earth's surface for thousands of years, with practices like clearing forests for agriculture and creating uniform land covers. But how do these changes impact the subdaily global terrestrial water cycle? That's the question a project aims to answer.

Read this project
Satellite Image Disaggregation with Machine Learning

Microwave soil moisture data is critical for agriculture, weather, and climate modeling, but has low spatial resolution. Disaggregation via machine learning can improve resolution, offering detailed local soil moisture data. Machine learning can handle complex relationships between microwave signals and soil moisture.

Read this project