class: title-slide, middle <img src="assets/img/pr.png" width="80px" style="padding-right:10px; border-right: 2px solid #FAFAFA;"></img> <img src="assets/img/UdeS_blanc.png" width="250px" ></img> # Visualizing spatiotemporal data with R .instructors[ GSFE01 - F. Guillaume Blanchet & Steve Vissault ] --- class: clear, middle ##
Slides available at: https://steveviss.github.io/PR-GSFE01/st_manip/index.html ##
Solutions of all practices: https://steveviss.github.io/PR-GSFE01/st_manip/practice.html --- class: inverse, center, middle # The curse of dimensionality <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # The data multi-dimensionality problem <img src="./assets/img/data_cube.png" width="100%" style="display: block; margin: auto;" /> --- # Multi-dimensionality problem **The problem** - It is not easy to store and handle biological informations through all of these dimensions. -- .pull-left[ **Storage formats** - 2-d: Tabular data CSV, shapefiles (.shp) / rasters (.grd) - 3-d: NetCDF, Hierarchical Data Format - n-d: Spatial SQL database, split the dataset into multiples tables (e.g. PostgreSQL) ] .pull-right[ <img src="./assets/img/netcdf.png" width="100%" style="display: block; margin: auto;" /> ] **Take home** - Having the right format can help solve that problem, and thus make your life easier when comes the time to play and analyze the data. ??? - Tabular format lot of redundancies (increase the risk of having errors in the entries) - NetCDF/HDF convenient for gridded data / rasters (not always suitable for none gridded data) - (Nested structure / Star schema) - field stations metadata and locations - Samples attached to field stations - Observations attached to samples --- class: inverse, center, middle # Handle temporal aspect in R <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Declare `date` in R **When you import a CSV files, you may encounter those situations.** **Situation 1**: The date column is a vector of characters. ``` ## [1] "1990-01-04" "1990-05-01" "1990-08-07" ``` **Situation 2**: Date informations are scattered in 3 differents columns ``` ## year month day ## 1 1990 1 4 ## 2 1990 5 1 ## 3 1990 7 7 ``` When performing spatio-temporal analysis, we need this temporal information to be recognized properly. This is where the `Date` format comes in handy. --- # Situation 1 Dates entries might have different format (hopefully not in the same column). **Solution:** the function `as.Date()` will coerce the character vector to a vector of class `Date` allowing dates to be handle properly. ```r d1 <- as.Date(c("1990-01-04","1990-05-01","1990-08-07")) d1 ``` ``` ## [1] "1990-01-04" "1990-05-01" "1990-08-07" ``` ```r d2<- as.Date(c("1990/01/04","1990/05/01","1990/08/07")) d2 ``` ``` ## [1] "1990-01-04" "1990-05-01" "1990-08-07" ``` --- # Situation 1 If your input dates are not in a standard format, the `format` argument can be used to force a specific structure on the dates. .pull-left[ .code60[ ```r as.Date('1/15/2001',format='%m/%d/%Y') ``` ``` ## [1] "2001-01-15" ``` ```r as.Date('April 26, 2001',format='%B %d, %Y') ``` ``` ## [1] "2001-04-26" ``` ```r as.Date('22JUN01',format='%d%b%y') ``` ``` ## [1] "2001-06-22" ``` ] ] .pull-right[ .small[ | Code | Value | | --- | --- | | %d | Day of the month (decimal number) | | %m | Month (decimal number) | | %b | Month (abbreviated) | | %B | Month (full name) | | %y | Year (2 digit) | | %Y | Year (4 digit) | ] ] --- # Situation 1 A lot effort as been place in R to account for the diversity of data structure .tiny[ | Code | Meaning | Code | Meaning | | --- | --- | --- | --- | --- | | %a | Abbreviated weekday | %A | Full weekday | | %b | Abbreviated month | %B | Full month | | %c | Locale-specific date and time | %d | Decimal date | | %H | Decimal hours (24 hour) | %I | Decimal hours (12 hour) | | %j | Decimal day of the year | %m | Decimal month | | %M | Decimal minute | %p | Locale-specific AM/PM | | %S | Decimal second | %U | Decimal week of the year (starting on Sunday) | | %w | Decimal Weekday (0=Sunday) | %W | Decimal week of the year (starting on Monday) | | %x | Locale-specific Date | %X | Locale-specific Time | | %y | 2-digit year | %Y | 4-digit year | | %z | Offset from GMT | %Z | Time zone (character) | ] --- # Situation 1 If you have several dates formats... -- deep breath... Have a look at the [`lubridate`](https://lubridate.tidyverse.org/) package. -- In the R universe, you're never alone... ```r library(lubridate) sampleDates <- c("4/6/2004","4/6/2004","4/6/2004","4/7/2004", "4/6/2004","4/7/2004","2014-08-12") parse_date_time(sampleDates, c("Ymd", "mdY")) ``` ``` ## [1] "2004-04-06 UTC" "2004-04-06 UTC" "2004-04-06 UTC" "2004-04-07 UTC" ## [5] "2004-04-06 UTC" "2004-04-07 UTC" "2014-08-12 UTC" ``` ??? POSIXct is the number of seconds since the epoch. In this case the epoch Jan 1st 1970. POSIXlt is a mixed text and character format like --- # Situation 2 Date informations are scattered in 3 differents columns ``` ## year month day ## 1 1990 1 4 ## 2 1990 5 1 ## 3 1990 7 7 ``` **Solution:** Concatanate the three columns and convert the vector as dates. ```r d <- paste(s2$year, s2$month, s2$day, sep = "-") d ``` ``` ## [1] "1990-1-4" "1990-5-1" "1990-7-7" ``` ```r d <- as.Date(d) d ``` ``` ## [1] "1990-01-04" "1990-05-01" "1990-07-07" ``` --- # Taking advantage of dates format **Logical test, useful for filtering (e.g. comparing dates)** ```r d <- as.Date(c("1990-01-04","1990-05-01","1980-08-07")) d[3] > d[2] ``` ``` ## [1] FALSE ``` ```r sort(d) ``` ``` ## [1] "1980-08-07" "1990-01-04" "1990-05-01" ``` --- # Taking advantage of dates format **Reformat / reduce the resolution on the fly** ```r d <- as.Date(c("1990-01-04","1990-05-01","1980-08-07")) d_y <- format(d, "%Y") d_y ``` ``` ## [1] "1990" "1990" "1980" ``` ```r table(d_y) ``` ``` ## d_y ## 1980 1990 ## 1 2 ``` ??? Now, that time have been briefly covered in R. It's time to visualize date in a spatiotemporal context --- class: inverse, center, middle # Visualizing spatial-temporal data <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Import Halibut occurences Let's play with the [OBIS database](https://obis.org/). In this illustration, we want to know where the Halibut (*Hippoglossus hippoglossus*) occurs in the St. Lawrence estuary. ```r library(robis) library(sf) halibut <- occurrence(scientificname = "Hippoglossus hippoglossus", geometry="POLYGON((-64.84626027288095 48.91641792598304,-64.01129933538095 50.49356505646415,-68.82331105413095 50.11464395818477,-71.04254933538095 46.80876488134784,-64.84626027288095 48.91641792598304))" ) ``` ``` ## Retrieved 228 records of approximately 228 (100%) ``` ```r # Convert the data.frame retuned to a sf object sf_halibut <- st_as_sf(halibut, coords = c("decimalLongitude","decimalLatitude"), crs = 4326) ``` --- # Import Halibut occurences ```r nrow(sf_halibut) # number of occurrences ``` ``` ## [1] 228 ```
--- # Import Halibut occurences ```r nrow(sf_halibut) # number of occurrences ``` ``` ## [1] 228 ``` ``` ## ## DFO-IML DFO-NFLD ## 49 1 ## DFO-NG IML_MLI ## 18 158 ## Institute Maurice Lamontagne ## 2 ``` --- # Explore occurrences through times Select the date informations ```r head(names(sf_halibut)) ``` ``` ## [1] "rightsHolder" "country" "date_year" ## [4] "institutionID" "scientificNameID" "year" ``` ```r sf_halibut <- sf_halibut %>% dplyr::select(day, month, year, depth) # Create a new column with the proper dates sf_halibut$dates <- as.Date(paste(sf_halibut$day, sf_halibut$month, sf_halibut$year, sep = "-")) class(sf_halibut$dates) ``` ``` ## [1] "Date" ``` --- # Explore occurrences through times ```r library(ggplot2) ggplot(sf_halibut) + geom_sf() ``` <img src="index_files/figure-html/unnamed-chunk-19-1.png" width="504" style="display: block; margin: auto;" /> --- # Explore occurrences through times ```r library(ggplot2) ggplot(sf_halibut) + geom_sf() + facet_wrap(~year) ``` <img src="index_files/figure-html/unnamed-chunk-20-1.png" width="504" style="display: block; margin: auto;" /> --- # Explore occurrences through times **Some aesthetics** Add a third dimension: color the points based on the depth. ```r ggplot() + geom_sf(data = filter(sf_halibut, year > 2006), aes(col = depth)) + facet_wrap(~year) ``` <img src="index_files/figure-html/unnamed-chunk-21-1.png" width="720" style="display: block; margin: auto;" /> --- # Explore occurrences through times **Add more spatial context** ```r poly_ca <- raster::getData('GADM', country='CA', level=1) sf_ca <- st_as_sf(poly_ca) sf_estuary <- st_crop(sf_ca, st_as_sfc(st_bbox(sf_halibut))) ``` ``` ## although coordinates are longitude/latitude, st_intersection assumes that they are planar ``` ``` ## Warning: attribute variables are assumed to be spatially constant ## throughout all geometries ``` ```r ggplot() + geom_sf(data = sf_estuary, fill = "grey50", col = "white") + geom_sf(data = filter(sf_halibut, year > 2006), aes(col = depth)) + facet_wrap(~year) ``` --- # Explore occurrences through times **Add more spatial context** <img src="index_files/figure-html/unnamed-chunk-24-1.png" width="720" style="display: block; margin: auto;" /> --- # Produce count grids If you have a large dataset with a high density of points, it is a good idea to summarize the informations using a grid of abundance counts. ```r library(raster) # get number of years years <- unique(sf_halibut$year) # Create a reference grid ref_grid <- raster(sf_halibut, res=0.05) ``` --- # Produce count grids If you have a large dataset with a high density of points, it is a good idea to summarize the informations using a grid of abundance counts. ```r library(sf) # Open empty stack stack_count <- stack() for(i in 1:length(years)){ # Filter for one year (years[i]) sf_halibut_yr <- filter(sf_halibut, year == years[i]) # Prepare the geometry sp_halibut_yr <- as(st_geometry(sf_halibut_yr), "Spatial") # Count points per cell rs_count <- rasterize(sp_halibut_yr, ref_grid, fun = 'count') # Add the layer to the stack stack_count <- addLayer(stack_count,rs_count) } names(stack_count) <- years ``` --- # Produce count grids ```r mapview::mapview(sum(stack_count, na.rm = TRUE)) ```
--- class: clear, middle # Practice 1. Collect the occurences of your favorite species on OBIS. - `robis::occurrence(scientificname = "Hippoglossus hippoglossus")` 2. Visualize the distribution of this species across several years using `ggplot2`. 3. Produce count grids per year for this species. --- class: inverse, middle, center # Visualizing variability <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Spatial variability **Hovmöller plot** <img src="index_files/figure-html/unnamed-chunk-28-1.png" width="504" style="display: block; margin: auto;" /> --- # Temporal variability **Trends plots by stations** <img src="index_files/figure-html/unnamed-chunk-29-1.png" width="504" style="display: block; margin: auto;" /> --- class: inverse, center, middle # Spatio-temporal data organisation <html><div style='float:left'></div><hr color='#EB811B' size=1px width=720px></html> --- # Spatio-temporal data organisation To perform spatio-temporal variograms with `gstat`, we need to convert our data into one of the following different `spacetime` classes of object. - full grid (STF): a combination of any `sp` object and any `xts` object to represent all possible locations on the implied space-time lattice; - sparse grid (STS): same as STF, but contains only the non-missing space-time combinations on a space-time lattice; - irregular (STI): an irregular space-time data structure, where each point is allocated a spatial coordinate and a time stamp; To build these objects, the best way is to use the `stConstruct` function in the `spacetime` package. --- # Spatio-temporal data organisation The original input when dealing with the `spacetime` pacakge has to be like the one in the following example. ```r head(raw_air) ``` ``` ## station_id pm10 dates lng lat ## 1 DEBB051 8.688 2001-06-08 14.05746 52.54136 ## 2 DEBB051 7.354 2001-09-10 14.05746 52.54136 ## 3 DEBB051 15.729 2001-10-10 14.05746 52.54136 ## 4 DEBB051 26.312 2001-05-03 14.05746 52.54136 ## 5 DEBB051 13.958 2001-10-28 14.05746 52.54136 ## 6 DEBB051 39.188 2001-10-16 14.05746 52.54136 ``` Each row corresponds to an observation / sample taken at a specific location and at particular time point. The time, space and variables have distinct columns. --- # Spatio-temporal data organisation If each columns is a combination of years and variables (`wide` format), you might want to reshape the data into a `long` format. We can obtain this data structure with the function `gather()`. <img src="https://uc-r.github.io/public/images/dataWrangling/gather1.png" width="60%" style="display: block; margin: auto;" /> Have a look at [this website](https://uc-r.github.io/tidyr), and [the tidyr sheet cheat](https://github.com/rstudio/cheatsheets/raw/master/data-import.pdf). --- # Spatio-temporal data organisation Last bit of information: Convert `sf` object back to a complete `data.frame`. ```r # Add the coordinates to the data.frame halibut <- data.frame(st_coordinates(sf_halibut), sf_halibut) head(halibut) ``` ``` ## X Y day month year depth geometry dates ## 1 -67.9097 49.2012 21 8 2011 166 POINT (-67.9097 49.2012) 21-08-20 ## 2 -67.8308 48.8157 23 8 2014 85 POINT (-67.8308 48.8157) 23-08-20 ## 3 -65.2390 49.3085 22 8 2014 226 POINT (-65.239 49.3085) 22-08-20 ## 4 -66.6287 49.8437 19 8 2012 252 POINT (-66.6287 49.8437) 19-08-20 ## 5 -66.0765 49.8920 19 8 2012 269 POINT (-66.0765 49.892) 19-08-20 ## 6 -66.7442 49.9047 26 8 2014 170 POINT (-66.7442 49.9047) 26-08-20 ``` ```r # Drop the geometry column halibut$geometry <- NULL class(halibut) ``` ``` ## [1] "data.frame" ``` --- # Playing with the `stConstruct` function ```r library(spacetime) library(sp) spaceTime_air <- stConstruct(raw_air, c("lng","lat"), "dates", crs = CRS("+init=epsg:4326")) class(spaceTime_air) ``` ``` ## [1] "STIDF" ## attr(,"package") ## [1] "spacetime" ``` --- # Playing with the `stConstruct` function ```r raw_air_2005_2010 <- subset(raw_air, dates > as.Date("2005-01-01") & dates < as.Date("2010-12-31")) STIDF_air_2005_2010 <- stConstruct(raw_air_2005_2010, c("lng","lat"),"dates", crs = CRS("+init=epsg:4326")) STFDF_air_2005_2010 <- as(STIDF_air_2005_2010, "STFDF") ``` --- # Playing with the `stConstruct` function .small[ ```r summary(STFDF_air_2005_2010) ``` ``` ## Object of class STFDF ## with Dimensions (s, t, attr): (53, 1825, 2) ## [[Spatial:]] ## Object of class SpatialPoints ## Coordinates: ## min max ## lng 6.28107 14.78617 ## lat 47.80847 54.92497 ## Is projected: FALSE ## proj4string : ## [+init=epsg:4326 +proj=longlat +datum=WGS84 +no_defs +ellps=WGS84 ## +towgs84=0,0,0] ## Number of points: 53 ## [[Temporal:]] ## Index timeIndex ## Min. :2005-01-02 Min. : 1 ## 1st Qu.:2006-04-03 1st Qu.: 457 ## Median :2007-07-03 Median : 913 ## Mean :2007-07-03 Mean : 913 ## 3rd Qu.:2008-10-01 3rd Qu.:1369 ## Max. :2009-12-31 Max. :1825 ## [[Data attributes:]] ## station_id pm10 ## DETH026: 1821 Min. : 0.560 ## DENI063: 1815 1st Qu.: 9.275 ## DEBY047: 1815 Median : 13.854 ## DEMV017: 1814 Mean : 16.261 ## DEUB028: 1814 3rd Qu.: 20.333 ## (Other):65675 Max. :269.079 ## NA's :21971 NA's :21971 ``` ] --- class: clear, middle ##
Spatio-temporal variograms https://steveviss.github.io/PR-GSFE01/sp_time_vario/practice.html