Module 9 Loops and conditionals

This module considers programming with loops and conditional statements in R.

A template project for this module is given on Posit Cloud (open it and use it while reading the notes).

Learning path diagram

It is recommended that you follow the green learning path; however, you may like a different learning style. In the learning path diagram, there are links to alternative online content (video or reading). Note this is an alternative to the standard learning path that you may use instead (you should not do both). The learning path may also have extra content, that is NOT a part of syllabus (only look at it if you want more info)!

9.1 Learning outcomes

By the end of this module, you are expected to be able to:

  • Formulate conditional statements.
  • Use functions any and all.
  • Formulate loops in R using for and while statements.
  • Use function if_else.

The learning outcomes relate to the overall learning goals number 2, 4 and 10 of the course.

9.2 Conditionals and control flow

An excellent introduction to conditionals and if statements is given in Chapter 1 of the interactive DataCamp course Intermediate R. Please complete the chapter before continuing.

Some functions are also useful for comparing logical data types. Consider this example:

x <- c(1, 3, 5, 10, 2, 17, 11, NA, 4)
x > 10  # are the elements greater that 10
#> [1] FALSE FALSE FALSE FALSE FALSE  TRUE  TRUE    NA FALSE
any(x > 10)  # are any of the elements greater that 10
#> [1] TRUE
all(x > 10)  # are all of the elements greater that 10
#> [1] FALSE
all(x < 20)  # are all of the elements greater that 20
#> [1] NA
all(x < 20, na.rm = TRUE)  # are all of the elements greater that 20
#> [1] TRUE

That is, functions any and all can be used to join logical values in vectors.

Some if statements can be written alternatively using function if_else:

if_else(condition, true, false, missing = NULL)

For example:

x <- c(-5:5, NA)
x
#>  [1] -5 -4 -3 -2 -1  0  1  2  3  4  5 NA

## using if and for
res <- rep("", length(x))
for (i in seq_along(x)) {
  if (is.na(x[i])) res[i] <- "missing"
  else if (x[i] < 0) res[i] <- "negative"
  else res[i] <- "positive"
}
res
#>  [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#>  [9] "positive" "positive" "positive" "missing"

## implicit if statement
res <- rep("", length(x))
res
#>  [1] "" "" "" "" "" "" "" "" "" "" "" ""
res[x < 0] <- "negative"
res[x >= 0] <- "positive"
res[is.na(x)] <- "missing"
res
#>  [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#>  [9] "positive" "positive" "positive" "missing"

## using if_else
res <- if_else(x < 0, "negative", "positive", "missing")
res
#>  [1] "negative" "negative" "negative" "negative" "negative" "positive" "positive" "positive"
#>  [9] "positive" "positive" "positive" "missing"

9.3 Loops

An excellent introduction to conditionals and if statements is given in Chapter 2 of the interactive DataCamp course Intermediate R. Please complete the chapter before continuing (stop when Chapter 2 finishes).

Loops in R may be slow. However, not if you follow some golden rules:

  • Do not use a loop when a vectorized alternative exists.
  • Do not grow objects (via c, cbind, etc) during the loop - R has to create a new object and copy across the information just to add a new element or row/column. Instead, allocate an object to hold the results and fill it in during the loop.

As an example, consider the for loop with 4 iterations:

i_val <- c(1,2,6,9)
res <- rep(NA,4)
res
#> [1] NA NA NA NA
for (idx in 1:length(i_val)) {
  res[idx] <- 6 * i_val[idx] + 9
}
res
#> [1] 15 21 45 63

Note we allocate memory for the result vector before the loop so we do not have to grow the result object. Next, we calculate results \(6i+9\) using a loop. Be careful here! This is not the same:

res <- rep(NA,4)
for (i in i_val) {
  res[i] <- 6 * i + 9
}
res
#> [1] 15 21 NA NA NA 45 NA NA 63

In this example, however, we can use a vectorized alternative:

res <- 6 * i_val + 9
res
#> [1] 15 21 45 63

where the operation is applied to each element in the vector.

Nested for loops is also possible. A simple example of a nested loop:

for (i in 1:3) {
  for (j in 1:2) {
    cat(str_c("i =", i, " j = ",j, "\n"))
  }
}
#> i =1 j = 1
#> i =1 j = 2
#> i =2 j = 1
#> i =2 j = 2
#> i =3 j = 1
#> i =3 j = 2

We here use the function cat to print out a string (\n indicates new line). Note how the nested loops are executed:

  • Set i = 1 (outer loop)
    • Set j = 1 (inner loop), i stays 1
    • Set j = 2 (inner loop), i stays 1
    • Inner loop finishes, proceed with outer loop.
  • Increase i = 2 (outer loop)
    • Set j = 1 (inner loop), i stays 2
    • Set j = 2 (inner loop), i stays 2
    • Inner loop finishes, proceed with outer loop.
  • Increase i = 3 (outer loop)
    • Set j = 1 (inner loop), i stays 3
    • Set j = 2 (inner loop), i stays 3
    • Inner loop finishes, proceed with outer loop.
  • Outer loop finishes as well (we looped over i in 1:3). Job done.

Nested loops can be used to iterate over matrices or data frames:

mat <- matrix(NA, nrow = 2, ncol = 3)
mat
#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]   NA   NA   NA
for (i in 1:nrow(mat)) {
  for (j in 1:ncol(mat)) {
     mat[i,j] <- (i-1)*ncol(mat) + j
     cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
  }
}
#> Entry (1, 1) = 1
#> Entry (1, 2) = 2
#> Entry (1, 3) = 3
#> Entry (2, 1) = 4
#> Entry (2, 2) = 5
#> Entry (2, 3) = 6
mat
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6

Often you can replace nested loops with a single loop by using expand_grid:

library(tidyverse)  # load function expand_grid
mat <- matrix(NA, nrow = 2, ncol = 3)
ite <- expand_grid(i = 1:2, j=1:3)
ite
#> # A tibble: 6 × 2
#>       i     j
#>   <int> <int>
#> 1     1     1
#> 2     1     2
#> 3     1     3
#> 4     2     1
#> 5     2     2
#> 6     2     3
for (r in 1:nrow(ite)) { # iterate over rows
   i <- ite$i[r]
   j <- ite$j[r]
   mat[i,j] <- (i-1)*ncol(mat) + j
   cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
}
#> Entry (1, 1) = 1
#> Entry (1, 2) = 2
#> Entry (1, 3) = 3
#> Entry (2, 1) = 4
#> Entry (2, 2) = 5
#> Entry (2, 3) = 6
mat
#>      [,1] [,2] [,3]
#> [1,]    1    2    3
#> [2,]    4    5    6

Note expand_grid creates a data frame with all combinations. This way of looping is a more flexible approach since you can

  • nest more loops by adding more columns to ite,
  • add different values in each column.

For instance, if you only want to calculate values for row 2 and columns 1 and 3 the code becomes:

mat <- matrix(NA, nrow = 2, ncol = 3)
ite <- expand_grid(i = 2, j = c(1,3))
ite
#> # A tibble: 2 × 2
#>       i     j
#>   <dbl> <dbl>
#> 1     2     1
#> 2     2     3
for (r in 1:nrow(ite)) { # iterate over rows
   i <- ite$i[r]
   j <- ite$j[r]
   mat[i,j] <- (i-1)*ncol(mat) + j
   cat(str_c("Entry (", i, ", ", j, ") = ", mat[i,j], "\n"))
}
#> Entry (2, 1) = 4
#> Entry (2, 3) = 6
mat
#>      [,1] [,2] [,3]
#> [1,]   NA   NA   NA
#> [2,]    4   NA    6

9.4 Recap

Comparison/relational operators known to R are:

  • < for less than,
  • > for greater than,
  • <= for less than or equal to,
  • >= for greater than or equal to,
  • == for equal to each other (and not = which is typically used for assignment!),
  • != not equal to each other.

Logical operators known to R are:

  • & and,
  • | or,
  • ! not.

If you use && and || only the first element in vectors are compared. In general this is used rarely.

Useful functions that return a logical are any and all which can be used to join logical values in vectors.

Conditional Statements can be constructed using for instance if and while statements. Moreover, function if_else is a vectorized alternative.

Loops can be created using for and while statements.

You can break out of a loop using break and jump to the next iteration (skipping the remainder of the code in the loop) using next.

Do not use a loop when a vectorized alternative exists.

Do not grow objects during the loop. Instead, allocate an object to hold the results and fill it in during the loop.

Nested loops are possible in R. However, often they can be converted into a single loop by defining a data frame having the values of the nested loops in each row. Here function expand_grid may be useful to create the data frame.

You may also have a look at the slides for this module .

9.5 Exercises

Below you will find a set of exercises. Always have a look at the exercises before you meet in your study group and try to solve them yourself. Are you stuck, see the help page. Some of the solutions to each exercise can be seen by pressing the button at each question. Beware, you will not learn by giving up too early. Put some effort into finding a solution! Always practice using shortcuts in RStudio (see Tools > Keyboard Shortcuts Help).

Go to the Tools for Analytics workspace and download/export the TM9 project. Open it on your laptop and have a look at the files in the exercises folder which can be used as a starting point.

9.5.1 Exercise (conditional expressions)

Solve this exercise using a script file

  1. Consider object x:

    x <- c(1,2,-3,4)

    What will this conditional expression return?

    if(all(x>0)){
     print("All Postives")
    } else {
     print("Not all positives")
    }
  2. What will the following expressions return?

    x <- c(TRUE, FALSE, TRUE, TRUE)
    all(x)
    any(x)
    any(!x)
    all(!x)
  3. Which of the expressions above is always FALSE when at least one entry of a logical vector x is TRUE?

Consider vector:

library(tidyverse)
x <- 1:15
x
#>  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
  1. Use the if_else function to set elements with value below 7 to 0.
  1. Use the if_else function to set elements with value below 7 or above 10 to NA_integer_ (which is the NA/missing value of an integer).
  1. Consider code

    x <- sample(c(1:10,NA,5.5), 1)
    x
    #> [1] 7

    which generates a number from the vector c(1:10,NA,5.5).

    Write code which set object y equal to “even” if x is even, “odd” if x is odd, “decimal” if x has a decimal not zero and “missing” if x is NA. Hint: have a look at ?'%%' (the modulo operator).

9.5.2 Exercise (loops)

  1. Using a for loop, create a vector having values \(2i + 4\) given \(i=1\ldots 4\).
  1. Using a for loop, create a vector having values \(2i + 4\) given \(i=2,5,6,12\).
  1. Solve Question 2 using a while loop.
  1. Solve Questions 1 and 2 using a vectorized alternative.

9.5.3 Exercise (search vector)

This exercise is a slightly modified version an exam assignment (reexam 2021-A1).

Consider the vector:

v <- c(9, 19, 2, 8, NA, 12, 9, 23, NA, 34)
v
#>  [1]  9 19  2  8 NA 12  9 23 NA 34
  1. Is any of the entries in v below or equal to 2?
  1. Is all of the entries in v above or equal to 2?
  1. Does v have missing values?
  1. Which entries in v are above 10? You must return the indices, e.g. the index of v[3] is 3.
  1. Create a vector res where res[i] is equal to v[i] if v[i] is less than 10 and otherwise zero (also if v[i] is NA).

9.5.4 Exercise (calculating distances)

Consider zip codes in Jutland:

# remotes::install_github("bss-osca/tfa-package", upgrade = FALSE)  # run to upgrade
library(tidyverse)
data(zips, package = "tfa")  # load the zips data from the tfa package
zips
#> # A tibble: 376 × 2
#>      Zip Area           
#>    <dbl> <chr>          
#>  1  5320 "Agedrup"      
#>  2  6753 "Agerb\xe6k"   
#>  3  6534 "Agerskov"     
#>  4  8961 "Alling\xe5bro"
#>  5  6051 "Almind"       
#>  6  8592 "Anholt"       
#>  7  8643 "Ans By"       
#>  8  6823 "Ansager"      
#>  9  9510 "Arden"        
#> 10  5466 "Asperup"      
#> # ℹ 366 more rows

We want to calculate distances between a subset of zip areas:

idx <- 1:5
dat <- zips[idx,]
dat
#> # A tibble: 5 × 2
#>     Zip Area           
#>   <dbl> <chr>          
#> 1  5320 "Agedrup"      
#> 2  6753 "Agerb\xe6k"   
#> 3  6534 "Agerskov"     
#> 4  8961 "Alling\xe5bro"
#> 5  6051 "Almind"
distanceMat <- matrix(NA, nrow = length(idx), ncol = length(idx))
colnames(distanceMat) <- str_c(dat$Zip[idx], dat$Area[idx], sep = " ") 
rownames(distanceMat) <- colnames(distanceMat)
distanceMat
#>                    5320 Agedrup 6753 Agerb\xe6k 6534 Agerskov 8961 Alling\xe5bro 6051 Almind
#> 5320 Agedrup                 NA              NA            NA                 NA          NA
#> 6753 Agerb\xe6k              NA              NA            NA                 NA          NA
#> 6534 Agerskov                NA              NA            NA                 NA          NA
#> 8961 Alling\xe5bro           NA              NA            NA                 NA          NA
#> 6051 Almind                  NA              NA            NA                 NA          NA

We can find average distances between two zip codes (here rows 1 and 2 in dat) using Bing maps:

key <- "AlUJdApmvPe8y2_IMrC4j4x8fzytbD2M0SvlmpemL09ae_CWS6-IuNSgrAtXoyeP"
url <- str_c("http://dev.virtualearth.net/REST/V1/Routes/Driving?wp.0=",
             dat$Zip[1], ",Denmark",
             "&wp.1=",
             dat$Zip[2], ",Denmark",
             "&avoid=minimizeTolls&key=", key)
library(jsonlite)
lst <- jsonlite::fromJSON(url)
dist <- lst$resourceSets$resources[[1]]$travelDistance
dist
#> [1] 139
lst$statusCode
#> [1] 200
lst$statusDescription
#> [1] "OK"

Note we call the Bing maps API with the two zip codes. A json file is returned and stored in a list. To get the average travel distance we access travelDistance. The status code should be 200 if the calculation returned is okay.

Use nested for loops to fill distanceMat with distances. Assume that the distance from a to b is the same as from b to a. That is, you only have to call the API once for two zip codes. Use an if statement to check if the status code is okay.

9.5.5 Exercise (expand_grid)

Consider the solution of Exercise 9.5.4 and assume that you only want to calculate the distance from rows 1 and 5 to rows 2 and 3 in dat. Modify the solution using expand_grid so only one loop is used.