Chapter 5 Species Distribution Model Fitting and Projecting

About:

This stage of the workflow is where we fit the seasonal VAST model. After fitting the model, we then make statistical inferences and use the fitted model to project species distribution and abundance changes under future environmental conditions expected from the CMIP6 global climate models. The code for this stage is found within the TargetsSDM GitHub repository and the vast_functions.R script. Importantly, to leverage these functions, you will want to have defined a few core objects, including: a habitat formula governing the species environmental covariates relationship, the field configuration settings to determine whether spatial or spatio-temporal variability will be turned on, and the rho configuration setting that define if there is an autoregressive structure on the intercepts or the spatio-temporal variability component. You can see how we have done this at the beginning of our code to implement the workflow using the R Targets functionality.

5.1 Steps

This is one of the more complicated stages of the workflow and has a variety of different steps. This complexity arises for two reasons. First, there are the complexities related to integrating data from two surveys and fitting a seasonal VAST model that includes habitat covariates, catchability covariates, persistent spatial variability, ephemeral spatio-temporal variability, and potential temporal correlations in species occurrence across the study domain in successive seasons. Second, we tried to break down the entire modeling cycle into incremental steps.

  1. VAST objects. There are a variety of objects that we create which define how the VAST model is constructed.

    1a. Make VAST extrapolation grid. We create a user-specific extrapolation grid that encompasses the survey domain for both the NOAA NEFSC bottom trawl survey and the DFO bottom trawl survey. We do this by providing a shapefile that covers this domain and then use our vast_make_extrap_grid function.

    1b. Make VAST settings. We use our vast_make_settings function that leverages FishStatsUtils::make_settings and has a bit more flexibility to accommodate our own extrapolation grid while also requiring us to be a bit more explicit about the setting passed to the VAST modeling engine. For example, we could specify purpose = "index2" within the a FishStatsUtils::make_settings call or a FishStatsUtils::fit_model call and this would trigger a specific model configuration for spatial (omega), spatio-temporal variability (epsilon) and other model parameters. Rather than using those defaults, we define these before hand then pass them into our vast_make_settings function as shown here.

    1c. Make VAST spatial lists. After creating the extrapolation grid and then the settings, we create a “spatial list” object with the vast_make_spatial_lists function. This function is a warapper around the FishStatsUtils::make_extrapolation_info and FishStatsUtils::make_spatial_info functions. We use it to generate the extrapolation information and the spatial information, which includes information about the INLA mesh and its relationship to the observations and extrapolation grid cells.

    1d. Make VAST covariate effect list. With the VAST seasonal model, we need to make some adjustment to how “season” (and potentially “year”) covariates are modeled. In particular, we specify that these are going to be estimated as spatially-varying coefficients with corner constraints to ensure estimability. We use the vast_make_coveff function to do this.

  2. VAST dataframes. Before fitting the seasonal VAST model, we generate a few dataframes that are then passed to the main VAST model fitting function.

    2a. Make VAST seasonal dataframe. The standard behavior of a single species VAST species distribution model assumes that we have observations that occur annually. In turn, before fitting the VAST seasonal model, we need to do some reformatting. Specifically, we need to create a new vector that we can use as the “year” vector, but, that actually represents the season-year increments. We also want to make sure that there is a dummy observation for every season-year of interest, even those where we may not have survey data. To accomplish both of these goals, we use the vast_make_seasonal_data function. We also encourage interested people to look at the Wiki example in the VAST GitHub repository for additional details.

    2b. Make VAST sample dataframe. After creating the VAST seasonal dataframe, we then subset it into three different dataframes. The first is a sample dataframe, which includes the biological catch data information. This is accomplished using the make_vast_sample_data function.

    2c. Make VAST covariate dataframe. This is the second dataframe we create from the VAST seasonal dataframe, and includes the information for habitat covariates at each of the tow locations. We make this dataframe using the make_vast_covariate_data function.

    2d. Make VAST catchability dataframe. The final dataframe we create is a catchability dataframe, this includes information for the survey that each observation was collected.

  3. Fitting the VAST seasonal model. To fit the VAST seasonal model, we first fit a base model, setting run_model = FALSE and then we make some adjustments to accommodate the seasonal structure.

    3a. Fitting VAST base model. With the dataframes, the extrapolation grid, and the model settings created and defined, we fit a base VAST model. While you could certainly do this with the VAST::fit_model function, we use a function we wrote, vast_build_sdm. Along with fitting nicely within our workflow and the objects we have created, this function also allows us to use sf multipolygon shapefiles to specify different regions/strata.

    3b. Making adjustments to accommodate the seasonal model. After fitting the base model, we make some modifications to facilitate fitting the seasonal VAST model. Specifically, this requires adjusting the mapping the variance components for the season and year terms so that they are pooled and that we are not estimating an individual variance for each level of season and year. We so this using the vast_make_adjustments function.

    3c. Fitting VAST seasonal model. With the adjustments made, we then use the vast_fit_sdm function to fit the model. When doing this, we pass in the adjusted fit_model object with the correct parameter mapping.

  4. Making inferences from the fitted VAST seasonal model

    4a. Validating predictive skill of the model. One of the first things we want to do after fitting the model is assess its reliability. This can include evaluating the model fit to the data and with more recent versions of VAST, we can get the deviance explained from the model, which is particularly helpful when comparing different candidate model structures. Along with evaluating the model, we are especially interested in validating the predictive skill using a hold out, testing dataset. We open up this opportunity during the beginning of the modeling process by toggling the “Pred_TF” indicator for specific observations, so that they are used in the predictive component of the model and not used in the maximum likelihood estimation. We then uss our vast_get_point_preds to extract the model predictions for these “hold out” observations. We can then calculate any number of prediction skill statistics (e.g., AUC, RMSE, etc). To summarize multiple components of model prediction skill, we use our taylor_diagram_func function to generate and plot Taylor Diagrams (Taylor 2001).

    4b. Visualizing covariate effects. After checking the model for its fit to the data and predictive skill to holdout, testing data, we might want to visualize the fitted smooth functions defining the relationship between species occurrence and the environmental variables. To do that, we can run our get_vast_covariate_effects and plot_vast_covariate_effects functions.

    4b. Making projections. The final action in this stage is using the model to make projections using the future environmental conditions characterized by the ensemble of CMIP6 SSP5 8.5 scenario models. To do this, we collect the projected environmental data for each of the covariates using the vast_post_fit_pred_df function. We then use the project.fit_model_wrapper function to make the projections. This function was designed to mimic VAST’s project.fit_model function, with some adjustments to accomodate the seasonal model structure.

5.2 Output

The output from this stage includes the model fitted object, and then some results associated with inferences we hope to make from the fitted model – including marginal effect plots of fitted covariate smooth functions, maps of predicted density, and time series of total estimated biomass within spatial regions of interest. Most relevant to our specific project goals, one of the outputs from this stage is the projected species density at grid locations within the DFO/NOAA NEFSC spatial domain for fall-summer-spring seasons from 1985-2100.

5.3 Next stages

The output from this stage is then summarized and used to produce the data and visualizations for the FishViz RShiny application.