top of page

Working on reducing the errors in the model

Updated: Sep 26, 2019

Going into this project, my interest is to understand the relationships between streamflow and other variables in the hydrologic systems (precipitation, temperature shift, snowmelt, basin characteristics, etc.) and how can we make better streamflow prediction based on our understanding of these interactions.


A recent implementation of the Random Forests (RF) model has allowed me to take a close look at these relationships.


I built a basic RF model that currently takes 4 predictors:


1. Mean daily streamflow from previous day (a common predictor variable) at

the same gage

2. Daily total precipitation from previous day (a common predictor variable)

drawing from the closest GHCN station*

3. The month of the predicted streamflow (to somewhat account for seasonality)

4. The sum of precipitation from n-previous days (I vary this variable and

observe the performance of the model)


Output is the predicted daily average streamflow.


I performed the model on 2 randomly chosen USGS gages (No 14145500 and No

14137000) from 2 sub-watersheds within HUC 17 Pacific Northwest (17-8 and 17-9)

that are different in size and geometry but locate in the similar region that can

facilitate the comparison.

HUC 17 (in light blue) and two USGS gauges from 2 sub-watersheds (red)

Some preliminary observations

At both gages, the streamflow at previous day has the highest predictive power. The selection of input variables seems to have considerable impact on the accuracy of the output.




The error tends to increase as the magnitude of streamflow increases (plot below). So the next step would be looking into how to reduce the error as well as the correlation (error should be occurring at random).


The model underestimates the extreme values, which is consistent with previous studies.

Moving forward. These are just very preliminary results and I think the model will improve with addition of other predictors (snowmelt, temperature, streamflow from nearby gauges, and possibly climate indices). In order to model a big watershed like HUC 17, an approach would be to optimize the input variables by growing a different RF for each sub-watershed. I'm also exploring some data-decomposition techniques to improve the quality of the input data.

3 views0 comments
bottom of page