Module 9 Loops and conditionals
This module considers programming with loops and conditional statements in R.
A template project for this module is given on Posit Cloud (open it and use it while reading the notes).
Learning path diagram
9.1 Learning outcomes
By the end of this module, you are expected to be able to:
- Formulate conditional statements.
- Use functions
any
andall
. - Formulate loops in R using for and while statements.
- Use function
if_else
.
The learning outcomes relate to the overall learning goals number 2, 4 and 10 of the course.
9.2 Conditionals and control flow
An excellent introduction to conditionals and if statements is given in Chapter 1 of the interactive DataCamp course Intermediate R. Please complete the chapter before continuing.
Some functions are also useful for comparing logical data types. Consider this example:
x <- c(1, 3, 5, 10, 2, 17, 11, NA, 4)
x > 10 # are the elements greater that 10
#> [1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE NA FALSE
any(x > 10) # are any of the elements greater that 10
#> [1] TRUE
all(x > 10) # are all of the elements greater that 10
#> [1] FALSE
all(x < 20) # are all of the elements greater that 20
#> [1] NA
all(x < 20, na.rm = TRUE) # are all of the elements greater that 20
#> [1] TRUE
That is, functions any
and all
can be used to join logical values in vectors.
Some if statements can be written alternatively using function if_else
:
For example:
x <- c(-5:5, NA)
x
#> [1] -5 -4 -3 -2 -1 0 1 2 3 4 5 NA
## using if and for
res <- rep("", length(x))
for (i in seq_along(x)) {
if (is.na(x[i])) res[i] <- "missing"
else if (x[i] < 0) res[i] <- "negative"
else res[i] <- "positive"
}
res
#> [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#> [9] "positive" "positive" "positive" "missing"
## implicit if statement
res <- rep("", length(x))
res
#> [1] "" "" "" "" "" "" "" "" "" "" "" ""
res[x < 0] <- "negative"
res[x >= 0] <- "positive"
res[is.na(x)] <- "missing"
res
#> [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#> [9] "positive" "positive" "positive" "missing"
## using if_else
res <- if_else(x < 0, "negative", "positive", "missing")
res
#> [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#> [9] "positive" "positive" "positive" "missing"
9.3 Loops
An excellent introduction to conditionals and if statements is given in Chapter 2 of the interactive DataCamp course Intermediate R. Please complete the chapter before continuing (stop when Chapter 2 finishes).
Loops in R may be slow. However, not if you follow some golden rules:
- Do not use a loop when a vectorized alternative exists.
- Do not grow objects (via
c
,cbind
, etc) during the loop - R has to create a new object and copy across the information just to add a new element or row/column. Instead, allocate an object to hold the results and fill it in during the loop.
As an example, consider the for loop with 4 iterations:
i_val <- c(1,2,6,9)
res <- rep(NA,4)
res
#> [1] NA NA NA NA
for (idx in 1:length(i_val)) {
res[idx] <- 6 * i_val[idx] + 9
}
res
#> [1] 15 21 45 63
Note we allocate memory for the result vector before the loop so we do not have to grow the result object. Next, we calculate results \(6i+9\) using a loop. Be careful here! This is not the same:
In this example, however, we can use a vectorized alternative:
where the operation is applied to each element in the vector.
Nested for loops is also possible. A simple example of a nested loop:
for (i in 1:3) {
for (j in 1:2) {
cat(str_c("i =", i, " j = ",j, "\n"))
}
}
#> i =1 j = 1
#> i =1 j = 2
#> i =2 j = 1
#> i =2 j = 2
#> i =3 j = 1
#> i =3 j = 2
We here use the function cat
to print out a string (\n
indicates new line). Note how the nested loops are executed:
- Set
i = 1
(outer loop)- Set
j = 1
(inner loop),i
stays 1 - Set
j = 2
(inner loop),i
stays 1 - Inner loop finishes, proceed with outer loop.
- Set
- Increase
i = 2
(outer loop)- Set
j = 1
(inner loop),i
stays 2 - Set
j = 2
(inner loop),i
stays 2 - Inner loop finishes, proceed with outer loop.
- Set
- Increase
i = 3
(outer loop)- Set
j = 1
(inner loop),i
stays 3 - Set
j = 2
(inner loop),i
stays 3 - Inner loop finishes, proceed with outer loop.
- Set
- Outer loop finishes as well (we looped over
i in 1:3
). Job done.
Nested loops can be used to iterate over matrices or data frames:
mat <- matrix(NA, nrow = 2, ncol = 3)
mat
#> [,1] [,2] [,3]
#> [1,] NA NA NA
#> [2,] NA NA NA
for (i in 1:nrow(mat)) {
for (j in 1:ncol(mat)) {
mat[i,j] <- (i-1)*ncol(mat) + j
cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
}
}
#> Entry (1, 1) = 1
#> Entry (1, 2) = 2
#> Entry (1, 3) = 3
#> Entry (2, 1) = 4
#> Entry (2, 2) = 5
#> Entry (2, 3) = 6
mat
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 4 5 6
Often you can replace nested loops with a single loop by using expand_grid
:
library(tidyverse) # load function expand_grid
mat <- matrix(NA, nrow = 2, ncol = 3)
ite <- expand_grid(i = 1:2, j=1:3)
ite
#> # A tibble: 6 × 2
#> i j
#> <int> <int>
#> 1 1 1
#> 2 1 2
#> 3 1 3
#> 4 2 1
#> 5 2 2
#> 6 2 3
for (r in 1:nrow(ite)) { # iterate over rows
i <- ite$i[r]
j <- ite$j[r]
mat[i,j] <- (i-1)*ncol(mat) + j
cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
}
#> Entry (1, 1) = 1
#> Entry (1, 2) = 2
#> Entry (1, 3) = 3
#> Entry (2, 1) = 4
#> Entry (2, 2) = 5
#> Entry (2, 3) = 6
mat
#> [,1] [,2] [,3]
#> [1,] 1 2 3
#> [2,] 4 5 6
Note expand_grid
creates a data frame with all combinations. This way of looping is a more flexible approach since you can
- nest more loops by adding more columns to
ite
, - add different values in each column.
For instance, if you only want to calculate values for row 2 and columns 1 and 3 the code becomes:
mat <- matrix(NA, nrow = 2, ncol = 3)
ite <- expand_grid(i = 2, j = c(1,3))
ite
#> # A tibble: 2 × 2
#> i j
#> <dbl> <dbl>
#> 1 2 1
#> 2 2 3
for (r in 1:nrow(ite)) { # iterate over rows
i <- ite$i[r]
j <- ite$j[r]
mat[i,j] <- (i-1)*ncol(mat) + j
cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
}
#> Entry (2, 1) = 4
#> Entry (2, 3) = 6
mat
#> [,1] [,2] [,3]
#> [1,] NA NA NA
#> [2,] 4 NA 6
9.4 Recap
Comparison/relational operators known to R are:
<
for less than,>
for greater than,<=
for less than or equal to,>=
for greater than or equal to,==
for equal to each other (and not=
which is typically used for assignment!),!=
not equal to each other.
Logical operators known to R are:
&
and,|
or,!
not.
If you use &&
and ||
only the first element in vectors are compared. In general this is used rarely.
Useful functions that return a logical are any
and all
which can be used to join logical values in vectors.
Conditional Statements can be constructed using for instance if
and while
statements. Moreover, function if_else
is a vectorized alternative.
Loops can be created using for
and while
statements.
You can break out of a loop using break
and jump to the next iteration (skipping the remainder of the code in the loop) using next
.
Do not use a loop when a vectorized alternative exists.
Do not grow objects during the loop. Instead, allocate an object to hold the results and fill it in during the loop.
Nested loops are possible in R. However, often they can be converted into a single loop by defining a data frame having the values of the nested loops in each row. Here function expand_grid
may be useful to create the data frame.
9.5 Exercises
Below you will find a set of exercises. Always have a look at the exercises before you meet in your study group and try to solve them yourself. Are you stuck, see the help page. Some of the solutions to each exercise can be seen by pressing the button at each question. Beware, you will not learn by giving up too early. Put some effort into finding a solution! Always practice using shortcuts in RStudio (see Tools > Keyboard Shortcuts Help).
Go to the Tools for Analytics workspace and download/export the TM9 project. Open it on your laptop and have a look at the files in the exercises
folder which can be used as a starting point.
9.5.1 Exercise (conditional expressions)
Solve this exercise using a script file
Consider object
x
:What will this conditional expression return?
What will the following expressions return?
Which of the expressions above is always
FALSE
when at least one entry of a logical vectorx
is TRUE?
Consider vector:
- Use the
if_else
function to set elements with value below 7 to 0.
- Use the
if_else
function to set elements with value below 7 or above 10 toNA_integer_
(which is the NA/missing value of an integer).
Consider code
which generates a number from the vector
c(1:10,NA,5.5)
.Write code which set object
y
equal to “even” ifx
is even, “odd” ifx
is odd, “decimal” ifx
has a decimal not zero and “missing” ifx
isNA
. Hint: have a look at?'%%'
(the modulo operator).
9.5.2 Exercise (loops)
- Using a
for
loop, create a vector having values \(2i + 4\) given \(i=1\ldots 4\).
- Using a
for
loop, create a vector having values \(2i + 4\) given \(i=2,5,6,12\).
- Solve Question 2 using a
while
loop.
- Solve Questions 1 and 2 using a vectorized alternative.
9.5.3 Exercise (search vector)
This exercise is a slightly modified version an exam assignment (reexam 2021-A1).
Consider the vector:
- Is any of the entries in
v
below or equal to 2?
- Is all of the entries in
v
above or equal to 2?
- Does
v
have missing values?
- Which entries in
v
are above 10? You must return the indices, e.g. the index ofv[3]
is 3.
- Create a vector
res
whereres[i]
is equal tov[i]
ifv[i]
is less than 10 and otherwise zero (also ifv[i]
isNA
).
9.5.4 Exercise (calculating distances)
Consider zip codes in Jutland:
# remotes::install_github("bss-osca/tfa-package", upgrade = FALSE) # run to upgrade
library(tidyverse)
data(zips, package = "tfa") # load the zips data from the tfa package
zips
#> # A tibble: 376 × 2
#> Zip Area
#> <dbl> <chr>
#> 1 5320 "Agedrup"
#> 2 6753 "Agerb\xe6k"
#> 3 6534 "Agerskov"
#> 4 8961 "Alling\xe5bro"
#> 5 6051 "Almind"
#> 6 8592 "Anholt"
#> 7 8643 "Ans By"
#> 8 6823 "Ansager"
#> 9 9510 "Arden"
#> 10 5466 "Asperup"
#> # ℹ 366 more rows
We want to calculate distances between a subset of zip areas:
idx <- 1:5
dat <- zips[idx,]
dat
#> # A tibble: 5 × 2
#> Zip Area
#> <dbl> <chr>
#> 1 5320 "Agedrup"
#> 2 6753 "Agerb\xe6k"
#> 3 6534 "Agerskov"
#> 4 8961 "Alling\xe5bro"
#> 5 6051 "Almind"
distanceMat <- matrix(NA, nrow = length(idx), ncol = length(idx))
colnames(distanceMat) <- str_c(dat$Zip[idx], dat$Area[idx], sep = " ")
rownames(distanceMat) <- colnames(distanceMat)
distanceMat
#> 5320 Agedrup 6753 Agerb\xe6k 6534 Agerskov 8961 Alling\xe5bro 6051 Almind
#> 5320 Agedrup NA NA NA NA NA
#> 6753 Agerb\xe6k NA NA NA NA NA
#> 6534 Agerskov NA NA NA NA NA
#> 8961 Alling\xe5bro NA NA NA NA NA
#> 6051 Almind NA NA NA NA NA
We can find average distances between two zip codes (here rows 1 and 2 in dat
) using Bing maps:
key <- "AlUJdApmvPe8y2_IMrC4j4x8fzytbD2M0SvlmpemL09ae_CWS6-IuNSgrAtXoyeP"
url <- str_c("http://dev.virtualearth.net/REST/V1/Routes/Driving?wp.0=",
dat$Zip[1], ",Denmark",
"&wp.1=",
dat$Zip[2], ",Denmark",
"&avoid=minimizeTolls&key=", key)
library(jsonlite)
lst <- jsonlite::fromJSON(url)
dist <- lst$resourceSets$resources[[1]]$travelDistance
dist
#> [1] 139
lst$statusCode
#> [1] 200
lst$statusDescription
#> [1] "OK"
Note we call the Bing maps API with the two zip codes. A json file is returned and stored in a list. To get the average travel distance we access travelDistance
. The status code should be 200 if the calculation returned is okay.
Use nested for loops to fill distanceMat
with distances. Assume that the distance from a to b is the same as from b to a. That is, you only have to call the API once for two zip codes. Use an if statement to check if the status code is okay.
9.5.5 Exercise (expand_grid)
Consider the solution of Exercise 9.5.4 and assume that you only want to calculate the distance from rows 1 and 5 to rows 2 and 3 in dat
. Modify the solution using expand_grid
so only one loop is used.