Working with Raster Data


In this tutorial you will learn raster-specific pre-processing and analysis steps such as resampling and reclassification. You will learn how to execute local operations of map algebra in raster calculator and how to summarise raster data using zonal statistics. You will also learn how to create distance, slope and aspect rasters.

We will keep working on siting a solar power station in Zambia, this time using raster data – including the solar power potential.

We will use land cover, and elevation to further optimise the location of the facility: we’d like to avoid cutting trees or building in a swamp. We’d also like to avoid the steepest slopes.

Finally, using solar potential raster (Global Horizontal Irradiance in kWh/m2/year) we will be able to estimate the power which may be available in our selected zones.


Analysis Preparation

Imports

All of these libraries should have been previously installed during the environment set-up, if they have not been installed already you can use install.packages(c("sf", "ggplot2")). The exception is cartomisc which we’ll install now.

library(sf) # for handling spatial features
library(dplyr) # used for data manipulation
library(raster) # useful in some spatial operations
library(ggplot2) # for plotting
library(zeallot) # used for unpacking variables

source('../scripts/helpers.R') # helper script, note that '../' is used to change into the directory above the directory this notebook is in

Loading Data

We’ll start by again checking to see if we need to download any data

download_data()

Then by reading in a shapefile for Zambia

df_zambia <- read_sf('../data/zambia/zambia.shp')

df_zambia
A sf: 1 × 5
IDCODECOUNTRYareaQgeometry
<chr><chr><chr><dbl><POLYGON [m]>
761ZAMZambia777795418089POLYGON ((716452.6 8516873,...

We’ll also load land cover, solar radiance, and elevation raster data from .tif files

solar <- raster('../data/africa/solar.tif')
land_cover <- raster('../data/africa/land_cover.tif')
elevation <- raster('../data/africa/elevation.tif')

plot(solar)
../_images/04-Working-with-Raster-Data_7_0.png

Raster Cropping & Masking

In the same way you can clip vectors with a “cookie cutter” outline you can also clip rasters.

crop allows us to clip the raster to the extents of the vector geometry, similar to cropping an image.

df_zambia_4326 <- st_transform(df_zambia, crs=st_crs(solar))
solar_zambia <- crop(solar, df_zambia_4326)

plot(solar_zambia)
plot(df_zambia_4326['geometry'], add=TRUE)
../_images/04-Working-with-Raster-Data_9_0.png

If we want to remove all values that are outside of Zambia’s borders we can use mask

masked_solar_zambia <- mask(solar_zambia, df_zambia_4326)

plot(masked_solar_zambia)
../_images/04-Working-with-Raster-Data_11_0.png

It’s worth carrying out the crop before the mask as otherwise the returned raster will still go to the extent of the original data, e.g.

plot(mask(solar, df_zambia_4326))
../_images/04-Working-with-Raster-Data_13_0.png

Whilst the existing mask function is really useful it doesn’t give us all of the fine-grained control we might require. For example a common area for fine-tuning raster masks is deciding the cut-off point for when cells are inside/outside the border - should we include all cells that the border touches or only the ones that fall fully inside it?

To enable this flexibility we’ll create our own raster mask.

First we’ll rasterize the zambia polygon, passing an example raster which will be used to construct the grid, as well as the optional argument getCover. By setting getCover to TRUE the values of the returned raster will be the percentage of the cell that falls inside the polygon. We’ll also set all cells that do not touch the polygon to NA.

zambia_raster_mask <- rasterize(df_zambia_4326, solar_zambia, getCover=TRUE)
zambia_raster_mask[zambia_raster_mask==0] <- NA

plot(zambia_raster_mask) 
../_images/04-Working-with-Raster-Data_15_0.png

For plotting with Ggplot, we’ll convert our raster into a tibble and also remove any NA values.

tb_zambia_raster_mask <- gplot_data(zambia_raster_mask)
tb_zambia_raster_mask <- subset(tb_zambia_raster_mask, !is.na(tb_zambia_raster_mask$value))

head(tb_zambia_raster_mask)
A tibble: 6 × 4
xyvaluevariable
<dbl><dbl><dbl><chr>
30.44767-8.2620190.09layer
30.49581-8.2620190.19layer
30.54396-8.2620190.34layer
30.59210-8.2620190.58layer
30.64025-8.2620190.73layer
30.68840-8.2620190.97layer

We’re now ready to visualise the raster mask alongside the vector, we can see how the cells that are only slightly touched by the vector are darker with lower coverage percentages.

ggplot() +
    geom_tile(data=tb_zambia_raster_mask, aes(x=x, y=y, fill=value)) + 
    geom_sf(data=df_zambia_4326, fill='orange') +
    coord_sf(xlim=c(28, 30), 
             ylim=c(-13.5, -11.75))
../_images/04-Working-with-Raster-Data_19_0.png

We want to use all of the raster cells that touch the country so can directly use the raster mask we just created. However, if we only wanted cells that fully lay inside the border we could set all values less than one to NA, e.g. using zambia_raster_mask[zambia_raster_mask<1] <- NA.

N.b. we use the same mask function regardless of whether the masking feature is a raster or vector.

masked_solar_zambia <- mask(solar_zambia, zambia_raster_mask)

plot(masked_solar_zambia)
../_images/04-Working-with-Raster-Data_21_0.png

We’ll tie a number of these steps together in a single function:

  • Reprojecting the vector data to the same CRS as the raster

  • Cropping the raster to tthe vector extents

  • Rasterising the vector and returning the cover percentage value

  • Setting all cells below the defined coverage percentage to NA

  • Finally masking the raster

We’ll then apply these to both the land_cover and elevation raster datasets.

quick_mask_raster <- function(raster_data, masking_vector){
    masking_vector <- st_transform(masking_vector, st_crs(raster_data))
    cropped_raster_data <- crop(raster_data, masking_vector)
    masked_raster_data <- mask(cropped_raster_data, masking_vector)
    
    return(masked_raster_data)
}

mask_raster <- function(raster_data, masking_vector, min_mask_cover=1){
    # Ensuring masking_vector is in the same proj
    masking_vector <- st_transform(masking_vector, st_crs(raster_data))
    
    # Cropping raster to masking_vector extents
    cropped_raster <- crop(raster_data, masking_vector)
    
    # Creating raster mask according to specified cover percentage
    raster_mask <- rasterize(masking_vector, cropped_raster, getCover=TRUE)
    raster_mask[raster_mask<min_mask_cover] <- NA

    # Masking the raster
    masked_raster <- mask(cropped_raster, raster_mask)
    
    return(masked_raster)
}

masked_elevation <- quick_mask_raster(elevation, df_zambia)
masked_land_cover <- quick_mask_raster(land_cover, df_zambia)

plot(masked_land_cover)
../_images/04-Working-with-Raster-Data_23_0.png

To free up some RAM we can now delete the original unmasked raster objects

rm(solar, land_cover, elevation)

Questions

Masking Based on Grid Distance

Using the masking techniques from this tutorial and the buffering techniques from the vector tutorial you should mask the solar dataset based on whether the cells are within 10km of the Zambian electricity grid.

# 

Reprojecting Rasters

Now let’s transform the rasters to match our selected coordinate system. The raster package contains a useful function called projectRaster that lets us transform the raster from one CRS into another, in this case we’ll reproject to UTM 35S.

crs = '+proj=utm +zone=35 +south +a=6378249.145 +b=6356514.966398753 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs '
zambia_solar_reproj = projectRaster(masked_solar_zambia, crs=crs)

plot(zambia_solar_reproj)
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”
../_images/04-Working-with-Raster-Data_29_1.png

Whilst the country’s shape may not appear to change we can see that the axis definitely have compared to the original data.

plot(masked_solar_zambia) 
../_images/04-Working-with-Raster-Data_31_0.png

Note that we can use a number of different algorithms when reprojecting the data. ngb (nearest neighbor) is very useful when working with categorical variables as it ensures that all values returned are values that existed in the original dataset. bilinear (bilinear interpolation) is more useful when working with continuous data, it is carried out by first using linear interpolation in one direction, then using it again in the other direction. Note that by default bilinear interpolation will be used.

As land-use is a categorical raster we’ll use ngb and conversely for elevation we’ll use bilinear.

zambia_elevation_reproj = projectRaster(masked_elevation, crs=crs, method='bilinear')
zambia_land_use_reproj = projectRaster(masked_land_cover, crs=crs, method='ngb')
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”

Raster Reclassification

In this section we’ll reclassify the land cover raster, firstly to simplify it, then to create a binary raster layer for whether or not the land is suitable to build power plants on.

We’ll begin by reading in the land cover classes csv.

df_land_cover_classes <- read.csv('../data/africa/land_cover_classes.csv', header=TRUE)

head(df_land_cover_classes)
A data.frame: 6 × 4
IDreclasscategorysuitability
<int><int><chr><int>
1 00no_data0
2101crop 1
3112grass 1
4123tree 0
5201crop 1
6301crop 1

We want to map from the ID classes to the new reclass categories, we can do this by specifying them as by and which in the subs function.

We can see in the new zambia_land_use_reclass object that we only have values ranging from 1-7 now.

zambia_land_use_reclass <- subs(zambia_land_use_reproj, df_land_cover_classes, by='ID', which='reclass')

zambia_land_use_reclass
Warning message in if (is.factor(object)) {:
“the condition has length > 1 and only the first element will be used”
class      : RasterLayer 
dimensions : 3602, 4301, 15492202  (nrow, ncol, ncell)
resolution : 301, 307  (x, y)
extent     : -52854.47, 1241747, 7986847, 9092661  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs 
source     : memory
names      : reclass 
values     : 1, 7  (min, max)

When we carry out this reclassification we’re also telling R that we want to define the new RasterLayer as a categorical variable, these are linked to their values via a Raster Attribute Table (RAT). In order to plot these categorical raster layers we must ratify the objects before plotting.

plot(ratify(zambia_land_use_reclass))
../_images/04-Working-with-Raster-Data_39_0.png

Now let’s take the reclassification even further. Assume we want to locate our powerplant at a site which has no tree cover - water, built-up areas, and ice are also not the best places to build the facility.

We’ve included in the reclassfiication dataframe a column called suitability, for unsuitable land this value is 0, whereas it is 1 when the land is suitable for a power plant.

zambia_land_use_suitability <- subs(zambia_land_use_reproj, df_land_cover_classes, by='ID', which='suitability')

zambia_land_use_suitability
Warning message in if (is.factor(object)) {:
“the condition has length > 1 and only the first element will be used”
class      : RasterLayer 
dimensions : 3602, 4301, 15492202  (nrow, ncol, ncell)
resolution : 301, 307  (x, y)
extent     : -52854.47, 1241747, 7986847, 9092661  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs 
source     : memory
names      : suitability 
values     : 0, 1  (min, max)

When we visualise this the green areas are those that are suitable to site a plant on whilst the white areas are not.

plot(ratify(zambia_land_use_suitability))
../_images/04-Working-with-Raster-Data_43_0.png

Questions

Suitable Areas for Floating Solar/Offshore Wind

Plot the suitable areas for floating solar and offshore wind, you can use any land cover category that is specified as ‘water’.

# 

Raster Resampling

In order to combine data from two or more rasters their cells need to be perfectly aligned. This leads to the need for resampling of the rasters. The resampling operation has two components: resizing and realignment of raster cells.

Let’s resample the land cover suitability and solar potential to match the resolution and alignment of the altitude raster. To do this we can use the resample function which has the same algorithm method options as projectRaster.

zambia_land_use_suitability_regrid <- resample(zambia_land_use_suitability, zambia_elevation_reproj, method='ngb')
zambia_solar_reproj_regrid <- resample(zambia_solar_reproj, zambia_elevation_reproj, method='bilinear')

zambia_land_use_suitability_regrid
class      : RasterLayer 
dimensions : 1207, 1441, 1739287  (nrow, ncol, ncell)
resolution : 903, 922  (x, y)
extent     : -55913.23, 1245310, 7983172, 9096026  (xmin, xmax, ymin, ymax)
crs        : +proj=utm +zone=35 +south +a=6378249.145 +b=6356514.96639875 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs 
source     : memory
names      : suitability 
values     : 0, 1  (min, max)

One way to check that the resampling has worked is to confirm that the dimensions of each object are the same. For this we can use the dim function.

dim(zambia_elevation_reproj) == dim(zambia_land_use_suitability_regrid)
  1. TRUE
  2. TRUE
  3. TRUE

At this point stop and have a think about some of the issues with resampling rasters:

  • What would have happened if we’d used the default resampling method with the land_cover data?

    • Why wouldnt this have been ideal?

  • What are some of the drawbacks of resampling to a raster that is at a much higher resolution?

    • Is it objectively better to up or down-sample?

    • How does this change depending on the problem your solving?


Raster Algebra

In many ways we can view a raster as a normal mathematical matrix over which we can perform calculations and statistical operations. This is useful as we can use mathematical formulae to recalculate raster cell values.

A detailed list of the available operations can be found at rspatial.org

Many generic functions that allow for simple and elegant raster algebra have been implemented for Raster objects, including the normal algebraic operators such as +, -, *, /, logical operators such as >, >=, <, ==, ! and functions like abs, round, ceiling, floor, trunc, sqrt, log, log10, exp, cos, sin, atan, tan, max, min, range, prod, sum, any, all. In these functions you can mix raster objects with numbers, as long as the first argument is a raster object.

For example, if we wanted a raster with monthly solar power potential rather than annual we could divide all cell values by 12.

zambia_solar_monthly <- zambia_solar_reproj_regrid/12

When two rasters are using the same CRS and are placed on the same grid (as we did in the resampling step), we can do mathematical operations using both of them. A common area this is used is in raster masking, where we can multiply a binary raster (TRUE/FALSE) by another raster to return only the values where the binary raster was TRUE.

Here we’ll mask the solar data based on the land cover suitability. The output solar potential raster will be either 0 where power cannot be obtained, or the value of potential (in kWh/m2/year), where the land cover conditions are met.

zambia_suitable_solar <- zambia_solar_reproj_regrid * zambia_land_use_suitability_regrid
zambia_suitable_solar[zambia_suitable_solar==0] <- NA

plot(zambia_suitable_solar)
../_images/04-Working-with-Raster-Data_53_0.png

Questions

Estimating Resampling Errors

When doing spatial analysis it is often useful to compare the difference between your results depending on how a spatial method is used. In this task you must determine the percentage difference between the resampled solar potential depending on whether bi-linear or ngb is used.

# 

A GIS Tale: Rasterization & Back Again

When working with spatial data, typically, you will not work with only raster or only vector datasets. However, to carry out spatial operations between datasets we often require them to be in the same format. For this reason it is highly important that you udnerstand how we can convert from Raster -> Vectors and vice versa.

Before we do the rasterization we’ll load a dataset that will be more useful in explaining this visually - the country shapefiles. We’ll then create a column for the area of each country that will be used as the values in the final raster.

df_countrys <- read_sf('../data/africa/countries.shp')
df_countrys <- st_transform(df_countrys, st_crs(zambia_solar_reproj_regrid))

df_countrys$area <- st_area(df_countrys)

plot(df_countrys)
../_images/04-Working-with-Raster-Data_57_0.png

The raster package contains the function rasterize which we’ll use to convert from the country polygons to a raster. We provide it the vector objects, the raster grid we wish to rasterize onto, as well as the column that contains the values we wish to be used in the raster.

rasterized_country_areas <- rasterize(df_countrys, zambia_solar_reproj_regrid, 'area')

plot(rasterized_country_areas)
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +rf=293.466307699995 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”
Warning message in showSRID(SRS_string, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +rf=293.466307699995 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(SRS_string, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”
../_images/04-Working-with-Raster-Data_59_1.png

What happens if we want to go in the opposite direction though?

For that we can use the polygonize function which converts from rasters to polygons, we can set dissolve to TRUE which will group the resulting polygons based on the raster values (as long as the cells are adjacent).

polygonized_country_areas <- rasterToPolygons(rasterized_country_areas, dissolve=TRUE)

head(polygonized_country_areas)
A data.frame: 6 × 1
layer
<dbl>
11.202574e+11
21.282970e+12
32.343754e+12
43.907642e+11
55.800564e+11
67.535634e+11

When plotted we can see that Zambia has been grouped correctly.

plot(polygonized_country_areas)
../_images/04-Working-with-Raster-Data_63_0.png

What problems do you think rasterization could raise?


Zonal Statistics

Zonal statistics are statistics that are calculated for a given set of spatial zones and are typically defined by polygons/points based on values from a raster dataset.

For example, summarising the solar power potential available in Zambia can be done using using the extract method which can be passed a function (in this case the mean). We could alternatively ask for other statistics such as the sum or distribution, in fact we can even pass our own custom functions.

df_zambia_solar_mean <- extract(solar_zambia, df_zambia, fun=mean, df=TRUE)

df_zambia_solar_mean
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded ellps unknown in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +rf=293.4663077 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(uprojargs, format = "PROJ", multiline = "NO"):
“Discarded datum unknown in CRS definition,
 but +towgs84= values preserved”
Warning message in showSRID(SRS_string, format = "PROJ", multiline = "NO"):
“Discarded ellps Clarke 1880 (Arc) in CRS definition: +proj=utm +zone=35 +south +a=6378249.145 +rf=293.4663077 +towgs84=-143,-90,-294,0,0,0,0 +units=m +no_defs”
Warning message in showSRID(SRS_string, format = "PROJ", multiline = "NO"):
“Discarded datum Arc_1950 in CRS definition”
Warning message in .local(x, y, ...):
“Transforming SpatialPolygons to the CRS of the Raster”
A data.frame: 1 × 2
IDsolar
<int><dbl>
12331.507

Its not particularly interesting when we only have one Polygon to use for the zonal statistics, lets carry out some analysis of the solar potential at different cities. We’ll first load the Africa-wide cities dataset and then filter it for Zambia (as described in the vector data tutorial), then get the solar potential calues at those points.

df_cities <- read_sf('../data/africa/cities.shp')
df_zambia_cities <- df_cities[which(st_intersects(st_transform(df_cities, st_crs(df_zambia)), df_zambia, sparse=FALSE)), ]

df_zambia_cities_solar <- extract(solar_zambia, df_zambia_cities, df=TRUE)

head(df_zambia_cities_solar)
A data.frame: 6 × 2
IDsolar
<dbl><dbl>
112365
222382
332373
442417
552303
662330

Similarly we could extract the land cover classifications for each city as well.

head(extract(masked_land_cover, df_zambia_cities, df=TRUE))
A data.frame: 6 × 2
IDland_cover
<dbl><dbl>
11120
22190
33190
44 11
55190
66190