Snow Modeling¶

Contributors: Melissa Wrzesien^1,2, Brendan McAndrew^1,3, Jian Li^4,5, Caleb Spradlin^4,6

¹ Hydrological Sciences Lab, NASA Goddard Space Flight Center, ² University of Maryland, ³ Science Systems and Applications, Inc., ⁴ InuTeq, LLC, ⁵ Computational and Information Sciences and Technology Office (CISTO), NASA Goddard Space Flight Center, ⁶ High Performance Computing, NASA Goddard Space Flight Center

Learning Objectives

Learn about the role of modeling in a satellite mission
Open, explore, and visualize gridded model output
Compare model output to raster- and point-based observation datasets

The goal of a future snow satellite is to provide improved understanding of global snow mass, as compared to current estimates. However, a single sensor likely won’t be able to accurately observe all types of snow in all conditions for the entire globe. Instead, we need to combine future snow observations with other tools - including currently available satellite data and modeling.

Models can also help to extend snow observations to areas where observations are not available. Data may be missing due to orbit gaps or masked out in regions of higher uncertainty. Remote sensing observations and models can be combined through data assimilation, a process where observations are used to constrain model estimates.

The figure above shows a conceptual example of how satellite observations with orbit gaps could be combined with a model to produce results with no missing data.

What is LIS?¶

Since models will likely play a role in processing observations from a snow satellite mission, it is important to become comfortable working with gridded data products. Today we will use sample model output from NASA’s Land Information System (LIS).

The Land Information System is a software framework for land surface modeling and data assimilation developed with the goal of integrating satellite and ground-based observational data products with advanced modeling techniques to produce estimates of land surface states and fluxes.

TL;DR LIS = land surface models + data assimilation

One key feature LIS provides is flexibility to meet user needs. LIS consists of a collection of plug-ins, or modules, that allow users to design simulations with their choice of land surface model, meteorological forcing, data assimilation, hydrological routing, land surface parameters, and more. The plug-in based design also provides extensibility, making it easier to add new functionality to the LIS framework.

Current efforts to expand support for snow modeling include implementation of snow modules, such as SnowModel and Crocus, into the LIS framework. These models, when run at the scale of ~100 meters, will enable simulation of wind redistribution, snow grain size, and other important processes for more accurate snow modeling.

Development of LIS is led by the Hydrological Sciences Laboratory at NASA’s Goddard Space Flight Center.

Working with Modeled Output¶

Here we will use sample model output from a LIS-SnowModel simulation over a western Colorado domain. Daily estimates of SWE, snow depth, and snow density are written to output. The SnowModel simulation has a spatial resolution of 100 m. We’ve provided four years of output, though here we’ll mostly use output from water year 2020.

Below, we’ll walk through how to open and interact with the LIS-SnowModel output. Our main objectives are to demonstrate how to do the following with a gridded dataset like LIS-SnowModel:

Understand attributes of model data file and variables available within
Create a spatial map of model output
Create time series at a point and averaged across domain
Compare LIS-SnowModel to raster and point data
Create a visualization for quick evaluation of model estimates

Import Libraries¶

# interface to Amazon S3 filesystem
import s3fs

# interact with n-d arrays
import numpy as np
import xarray as xr

# interact with tabular data (incl. spatial)
import pandas as pd
import geopandas as gpd

# interactive plots
import holoviews as hv
import geoviews as gv
import hvplot.pandas
import hvplot.xarray
# set bokeh as the holoviews plotting backend
hv.extension('bokeh')
# set holoviews backend to Bokeh
gv.extension('bokeh')

# used to find nearest grid cell to a given location
from scipy.spatial import distance

import fsspec
from datetime import datetime as dt

from geoviews import opts
from geoviews import tile_sources as gvts
from datashader.colors import viridis
import datashader
from holoviews.operation.datashader import datashade, shade, dynspread, spread, rasterize
from holoviews.streams import Selection1D, Params
import panel as pn
import param as pm
import warnings
import holoviews.operation.datashader as hd
warnings.filterwarnings("ignore")

pn.extension()
pn.param.ParamMethod.loading_indicator =True

Load the LIS-SnowModel Output¶

The xarray library makes working with labelled n-dimensional arrays easy and efficient. If you’re familiar with the pandas library it should feel pretty familiar.

Here we load the LIS-SnowModel output into an xarray.Dataset object:

# create S3 filesystem object
s3 = s3fs.S3FileSystem(anon=False)

# define the name of our S3 bucket
bucket_name = 'eis-dh-hydro/SNOWEX-HACKWEEK'

# define path to store on S3
lis_output_s3_path_time_chunk = f's3://{bucket_name}/2022/ZARR/SURFACEMODEL/LIS_HIST_rechunkedV4.d01.zarr'
lis_output_s3_path = f's3://{bucket_name}/2022/ZARR/SURFACEMODEL/LIS_HIST_default_chunks.d01.zarr/'

# create key-value mapper for S3 object (required to read data stored on S3)
lis_output_mapper = s3.get_mapper(lis_output_s3_path)
lis_output_mapper_tc = s3.get_mapper(lis_output_s3_path_time_chunk)

# open the dataset
lis_output_ds = xr.open_zarr(lis_output_mapper, consolidated=True)
lis_output_ds_tc = xr.open_zarr(lis_output_mapper_tc, consolidated=True)

Explore the Data¶

Display an interactive widget for inspecting the dataset by running a cell containing the variable name. Expand the dropdown menus and click on the document and database icons to inspect the variables and attributes.

lis_output_ds

Accessing Attributes¶

Dataset attributes (metadata) are accessible via the attrs attribute:

lis_output_ds.attrs

Accessing Variables¶

Variables can be accessed using either dot notation or square bracket notation. Here we will stick with square bracket notation:

# square bracket notation
lis_output_ds['SM_SnowDepth_inst']

Watch out for large datasets!

Note that the 4-ish years of model output (1694 daily time steps for a domain of size 6650 x 4800) has a size of over 200 gb! As we’ll see below, this is just for an area over western Colorado. If we’re interested in modeling the western United States or CONUS or even global at high resolution, we need to be prepared to work with some large datasets.

Dimensions and Coordinate Variables¶

The dimensions and coordinate variable fields put the “labelled” in “labelled n-dimensional arrays”:

Dimensions: labels for each dimension in the dataset (e.g., time)
Coordinates: labels for indexing along dimensions (e.g., '2020-01-01')

We can use these labels to select, slice, and aggregate the dataset.

Subsetting in Space or Time¶

xarray provides two methods for selecting or subsetting along coordinate variables:

index selection: ds.isel(time=0)
value selection ds.sel(time='2020-01-01')

For example, we can use value selection to select based on the the coorindates of a given dimension:

lis_output_ds.sel(time='2020-01-01')

The .sel() approach also allows the use of shortcuts in some cases. For example, here we select all timesteps in the month of January 2020. Note that length of the time dimension is now only 31.

lis_output_ds.sel(time='2020-01')

Latitude and Longitude¶

You may have noticed that latitude (lat) and longitude (lon) are not listed as dimensions. This dataset would be easier to work with if lat and lon were coordinate variables and dimensions. Here we define a helper function that reads the spatial information from the dataset attributes, generates arrays containing the lat and lon values, and appends them to the dataset:

def add_latlon_coords(dataset: xr.Dataset)->xr.Dataset:
    """Adds lat/lon as dimensions and coordinates to an xarray.Dataset object."""
    
    # get attributes from dataset
    attrs = dataset.attrs
    
    # get x, y resolutions
    dx = .001 #round(float(attrs['DX']), 3)
    dy = .001 #round(float(attrs['DY']), 3)
       
    # get grid cells in x, y dimensions
    ew_len = len(dataset['east_west'])
    ns_len = len(dataset['north_south'])
    
    # get lower-left lat and lon
    ll_lat = round(float(attrs['SOUTH_WEST_CORNER_LAT']), 3)
    ll_lon = round(float(attrs['SOUTH_WEST_CORNER_LON']), 3)
    
    # calculate upper-right lat and lon
    ur_lat =  41.5122 #ll_lat + (dy * ns_len)
    ur_lon = -103.9831 #ll_lon + (dx * ew_len)
    
    # define the new coordinates
    coords = {
        # create an arrays containing the lat/lon at each gridcell
        'lat': np.linspace(ll_lat, ur_lat, ns_len, dtype=np.float32, endpoint=False).round(4),
        'lon': np.linspace(ll_lon, ur_lon, ew_len, dtype=np.float32, endpoint=False).round(4)
    }
    
    lon_attrs = dataset.lon.attrs
    lat_attrs = dataset.lat.attrs
    
    # rename the original lat and lon variables
    dataset = dataset.rename({'lon':'orig_lon', 'lat':'orig_lat'})
    # rename the grid dimensions to lat and lon
    dataset = dataset.rename({'north_south': 'lat', 'east_west': 'lon'})
    # assign the coords above as coordinates
    dataset = dataset.assign_coords(coords)
    dataset.lon.attrs = lon_attrs
    dataset.lat.attrs = lat_attrs
    
    
    return dataset

Now that the function is defined, let’s use it to append lat and lon coordinates to the LIS output:

lis_output_ds = add_latlon_coords(lis_output_ds)
lis_output_ds_tc = add_latlon_coords(lis_output_ds_tc)

Inspect the dataset:

lis_output_ds

Now lat and lon are listed as coordinate variables and have replaced the north_south and east_west dimensions. This will make it easier to spatially subset the dataset!

Basic Spatial Subsetting¶

We can use the slice() function on the lat and lon dimensions to select data between a range of latitudes and longitudes:

# uncomment line below to work with the full domain -> this will be much slower!
#lis_output_ds.sel(lat=slice(35.5, 41), lon=slice(-109, -104))

# just Grand Mesa -> smaller domain for faster run times
# note the smaller lat/lon extents in the dimensions
lis_output_ds.sel(lat=slice(38.6, 39.4), lon=slice(-108.6, -107.1))

Notice how the sizes of the lat and lon dimensions have decreased.

Subset Across Multiple Dimensions¶

Now we will combine the above examples for subsetting both spatially and temporally:

# define a range of dates to select
wy_2020_slice = slice('2019-10-01', '2020-09-30')

# define lat/lon for Grand Mesa area
lat_slice = slice(38.6, 39.4)
lon_slice = slice(-108.6, -107.1)

# select the snow depth and subset to wy_2020_slice
snd_GM_wy2020_ds = lis_output_ds['SM_SnowDepth_inst'].sel(time=wy_2020_slice, lat=lat_slice, lon=lon_slice)
snd_GM_wy2020_ds_tc = lis_output_ds_tc['SM_SnowDepth_inst'].sel(time=wy_2020_slice, lat=lat_slice, lon=lon_slice)

# inspect resulting dataset
snd_GM_wy2020_ds

Subset for more manageable sizes!

We’ve now subsetted both spatially (down to just Grand Mesa domain) and temporally (water year 2020). Note the smaller size of the data array -> a decrease from over 200 gb to 1.7 gb! This smaller dataset will be much easier to work with, but feel free to try some of the commands with the full dataset, just give it a few minutes to process!

Plotting¶

We’ve imported two plotting libraries:

matplotlib: static plots
hvplot: interactive plots

We can make a quick matplotlib-based plot for the subsetted data using the .plot() function supplied by xarray.Dataset objects. For this example, we’ll select one day and plot it:

# simple matplotlilb plot
snd_GM_wy2020_ds.sel(time='2020-01-01').plot()

Similarly we can make an interactive plot using the hvplot accessor and specifying a quadmesh plot type:

# hvplot based map
snd_GM_20200101_plot = snd_GM_wy2020_ds.sel(time='2020-01-01').hvplot.quadmesh(geo=True, rasterize=True, project=True,
                                                                               xlabel='lon', ylabel='lat', cmap='viridis',
                                                                               tiles='EsriImagery')

snd_GM_20200101_plot

Pan, zoom, and scroll around the map. Hover over the LIS-SnowModel data to see the data values.

If we try to plot more than one time-step hvplot will also provide a time-slider we can use to scrub back and forth in time:

snd_GM_wy2020_ds.sel(time='2020-01').hvplot.quadmesh(geo=True, rasterize=True, project=True,
                             xlabel='lon', ylabel='lat', cmap='viridis',
                             tiles='EsriImagery')

From here on out we will stick with hvplot for plotting.

Timeseries Plots¶

We can generate a timeseries for a given grid cell by selecting and calling the plot function:

# define point to take timeseries (note: must be present in coordinates of dataset)
ts_lon, ts_lat = (-107.8702, 39.0504)

# plot timeseries
snd_GM_wy2020_ds_tc.sel(lat=ts_lat, lon=ts_lon).hvplot.line(title=f'Snow Depth Timeseries @ Lon: {ts_lon}, Lat: {ts_lat}',
                                                   xlabel='Date', ylabel='Snow Depth (m)') + \
    snd_GM_20200101_plot * gv.Points([(ts_lon, ts_lat)]).opts(size=10, color='red')
    

In the next section we’ll learn how to create a timeseries over a broader area.

Aggregation¶

We can perform aggregation operations on the dataset such as min(), max(), mean(), and sum() by specifying the dimensions along which to perform the calculation.

For example we can calculate the daily mean snow depth for the region around Grand Mesa for water year 2020:

Area Average¶

# take area-averaged mean at each timestep
mean_snd_GM_wy2020_ds = snd_GM_wy2020_ds_tc.mean(['lat', 'lon'])

# inspect the dataset
mean_snd_GM_wy2020_ds

# plot timeseries (hvplot knows how to plot based on dataset's dimensionality!)
mean_snd_GM_wy2020_ds.hvplot(title='Mean LIS-SnowModel Snow Depth for Grand Mesa area', xlabel='Date', ylabel='Snow Depth (m)')

Comparing Model Output¶

Now that we’re familiar with the model output, let’s compare it to two other datasets: SNODAS (raster) and SNOTEL (point).

LIS-SnowModel (raster) vs. SNODAS (raster)¶

First, we’ll load the SNODAS dataset which we also have hosted on S3 as a Zarr store. This command will take a minute or two to run.

# load SNODAS dataset

#snodas depth
key = "SNODAS/snodas_snowdepth_20161001_20200930.zarr"    
snodas_depth_ds = xr.open_zarr(s3.get_mapper(f"{bucket_name}/{key}"), consolidated=True)

# apply scale factor to convert to meters (0.001 per SNODAS user guide)
snodas_depth_ds = snodas_depth_ds * 0.001

Next we define a helper function to extract the (lon, lat) of the nearest grid cell to a given point:

def nearest_grid(ds, pt):
    
    """
    Returns the nearest lon and lat to pt in a given Dataset (ds).
    
    pt : input point, tuple (longitude, latitude)
    output:
        lon, lat
    """
    
    if all(coord in list(ds.coords) for coord in ['lat', 'lon']):
        df_loc = ds[['lon', 'lat']].to_dataframe().reset_index()
    else:
        df_loc = ds[['orig_lon', 'orig_lat']].isel(time=0).to_dataframe().reset_index()
    
    loc_valid = df_loc.dropna()
    pts = loc_valid[['lon', 'lat']].to_numpy()
    idx = distance.cdist([pt], pts).argmin()
    
    return loc_valid['lon'].iloc[idx], loc_valid['lat'].iloc[idx]

The next cell will look pretty similar to what we did earlier to plot a timeseries of a single point in the LIS-SnowModel data. The general steps are:

Extract the coordinates of the SNODAS grid cell nearest to our LIS-SnowModel grid cell (ts_lon and ts_lat from earlier)
Subset the SNODAS and LIS-SnowModel data to the grid cells and date ranges of interest
Create the plots!

# drop off irrelevant variables
drop_vars = ['orig_lat', 'orig_lon']
lis_output_ds = lis_output_ds.drop(drop_vars)

# get lon, lat of snodas grid cell nearest to the LIS coordinates we used earlier
snodas_ts_lon, snodas_ts_lat = nearest_grid(snodas_depth_ds, (ts_lon, ts_lat))

# define a date range to plot (shorter = quicker for demo)
start_date, end_date = ('2020-01-01', '2020-06-01')
plot_daterange = slice(start_date, end_date)

# select SNODAS grid cell and subset to plot_daterange
snodas_snd_subset_ds = snodas_depth_ds.sel(lon=snodas_ts_lon,
                                             lat=snodas_ts_lat,
                                             time=plot_daterange)

# select LIS grid cell and subset to plot_daterange
lis_snd_subset_ds = lis_output_ds['SM_SnowDepth_inst'].sel(lat=ts_lat,
                                                        lon=ts_lon,
                                                        time=plot_daterange)

# create SNODAS snow depth plot
snodas_snd_plot = snodas_snd_subset_ds.hvplot(label='SNODAS')

# create LIS snow depth plot
lis_snd_plot = lis_snd_subset_ds.hvplot(label='LIS-SnowModel')

# create SNODAS vs LIS snow depth plot
lis_vs_snodas_snd_plot = (lis_snd_plot * snodas_snd_plot)

# display the plot
lis_vs_snodas_snd_plot.opts(title=f'Snow Depth @ Lon: {ts_lon}, Lat: {ts_lat}',
                            legend_position='right',
                            xlabel='Date',
                            ylabel='Snow Depth (m)')

LIS-SnowModel (raster) vs. SNODAS (raster) vs. SNOTEL (point)¶

Now let’s add SNOTEL point data to our plot.

First, we’re going to define some helper functions to load the SNOTEL data:

# load csv containing metadata for SNOTEL sites in a given state (e.g,. 'colorado')
def load_site(state):
    
    # define the path to the file
    key = f"SNOTEL/snotel_{state}.csv"
    
    # load the csv into a pandas DataFrame
    df = pd.read_csv(s3.open(f's3://{bucket_name}/{key}', mode='r'))
    
    return df

# load SNOTEL data for a specific site
def load_snotel_txt(state, var):
    
    # define the path to the file
    key = f"SNOTEL/snotel_{state}{var}_20162020.txt"
    
    # determine how many lines to skip in the file (they start with #)
    fh = s3.open(f"{bucket_name}/{key}")
    lines = fh.readlines()
    skips = sum(1 for ln in lines if ln.decode('ascii').startswith('#'))
    
    # load the data into a pandas DataFrame
    df = pd.read_csv(s3.open(f"s3://{bucket_name}/{key}"), skiprows=skips)
    
    # convert the Date column from strings to datetime objects
    df['Date'] = pd.to_datetime(df['Date'])
    return df

For the purposes of this tutorial let’s load the SNOTEL data for sites in Colorado. We’ll pick one site to plot in a few cells.

# load SNOTEL snow depth for Colorado into a dictionary
snotel_depth = {'CO': load_snotel_txt('CO', 'depth')}

We’ll need another helper function to load the depth data:

# get snotel depth
def get_depth(state, site, start_date, end_date):
    
    # grab the depth for the given state (e.g., CO)
    df = snotel_depth[state]
    
    # define a date range mask
    mask = (df['Date'] >= start_date) & (df['Date'] <= end_date)
    
    # use mask to subset between time range
    df = df.loc[mask]
    
    # extract timeseries for the given site
    return pd.concat([df.Date, df.filter(like=site)], axis=1).set_index('Date')

Load the site metadata for Colorado:

co_sites = load_site('colorado')

# peek at the first 5 rows
co_sites.head()

The point we’ve been using so far in the tutorial actually corresponds to the coordinates for the Park Reservoir SNOTEL on Grand Mesa! Let’s extract the site data for that point:

# get the depth data by passing the site name to the get_depth() function
park_res_snd_df = get_depth('CO', 'Park Reservoir (682)', start_date, end_date)

# convert from cm to m
park_res_snd_df = park_res_snd_df / 100

Now we’re ready to plot:

# create SNOTEL plot
park_res_plot = park_res_snd_df.hvplot(label='SNOTEL')

# combine the SNOTEl plot with the LIS vs SNODAS plot
(lis_vs_snodas_snd_plot * park_res_plot).opts(title=f'Snow Depth @ Lon: {ts_lon}, Lat: {ts_lat}', legend_position='right')

SnowEx Hackweek 2022

Snow Modeling

Contents

Snow Modeling¶

What is LIS?¶

Working with Modeled Output¶

Import Libraries¶

Load the LIS-SnowModel Output¶

Explore the Data¶

Accessing Attributes¶

Accessing Variables¶

Dimensions and Coordinate Variables¶

Subsetting in Space or Time¶

Latitude and Longitude¶

Basic Spatial Subsetting¶

Subset Across Multiple Dimensions¶

Plotting¶

Timeseries Plots¶

Aggregation¶

Area Average¶

Comparing Model Output¶

LIS-SnowModel (raster) vs. SNODAS (raster)¶

LIS-SnowModel (raster) vs. SNODAS (raster) vs. SNOTEL (point)¶

Interactive Data Exploration¶

Load Data¶

SNOTEL Sites info¶

SNOTEL Depth & SWE¶

SNODAS SWE¶

LIS-SnowModel Outputs¶

Interactive Widget¶

SNOTEL Site Map and Timeseries¶

Exercises¶

Exercise 1¶

Exercise 2¶

Exercise 3 (for an extra challenge!)¶

Conclusion¶

Resources¶