Module 10 Functions
To understand computations in R, two slogans are helpful:
Everything that exists is an object.
Everything that happens is a function call.
John Chambers
Writing functions is a core activity of an R programmer. It represents the key step of the transition from a user to a programmer. Functions have inputs and outputs. Functions (and control structures) are what makes your code more dynamic.
Functions are often used to encapsulate a sequence of expressions that needs to be executed numerous times, perhaps under slightly different conditions. In programming, functional programming is a programming paradigm, a style of how code is written. Rather than repeating the code, functions and control structures allow one to build code in blocks. As a result, your code becomes more structured, more readable and much easier to maintain and debug (find errors).
A template project for this module is given on Posit Cloud (open it and use it while reading the notes).
Learning path diagram
10.1 Learning outcomes
By the end of this module, you are expected to be able to:
- Call a function.
- Formulate a function with different input arguments.
- Describe why functions are important in R.
- Set defaults for input arguments.
- Return values from functions.
- Explain how variable scope and precedence works.
- Document functions.
The learning outcomes relate to the overall learning goals number 2, 3, 4 and 10 of the course.
10.2 DataCamp course
An excellent introduction to functions is given in Chapter 3 in the DataCamp course Intermediate R. Please complete the chapter before continuing.
10.3 Functions returning multiple objects
Functions in R only return a single object. However, note that the object may be a list. That is, if you want to return multiple arguments, store them in a list. A simple example:
test <- function() {
# the function does some stuff and calculate some results
res1 <- 45
res2 <- "Success"
res3 <- c(4, 7, 9)
res4 <- list(cost = 23, profit = 200)
lst <- list(days = res1, run = res2, id = res3, money = res4)
return(lst)
}
test()
#> $days
#> [1] 45
#>
#> $run
#> [1] "Success"
#>
#> $id
#> [1] 4 7 9
#>
#> $money
#> $money$cost
#> [1] 23
#>
#> $money$profit
#> [1] 200
10.4 The ...
argument
The special argument ...
indicates a variable number of arguments and is usually used to pass arguments to nested functions used inside the function. Consider example:
my_name <- function(first = "Lars", last = "Nielsen") {
str_c(first, last, sep = " ")
}
my_name()
#> [1] "Lars Nielsen"
cite_text <- function(text, ...) {
str_c(text, ', -', my_name(...))
}
cite_text("Learning by doing is the best way to learn how to program!")
#> [1] "Learning by doing is the best way to learn how to program!, -Lars Nielsen"
cite_text("Learning by doing is the best way to learn how to program!", last = "Relund")
#> [1] "Learning by doing is the best way to learn how to program!, -Lars Relund"
cite_text("To be or not to be", first = "Shakespeare", last = "")
#> [1] "To be or not to be, -Shakespeare "
Note in the first function run, we use the defaults in my_name
. In the second run, we change the default last name and in the last run, we change both arguments.
If you need to retrieve/capture the content of the ...
argument, put it in a list:
10.5 Documenting your functions
It is always a good idea to document your functions. This is in fact always done in functions of a package. For instance try ?mutate
and see the documentation in the Help tab.
Assume that you have written a function
In RStudio you can insert a Roxygen documentation skeleton by having the cursor at the first line of the function and go to Code > Insert Roxygen Skeleton (Ctrl+Alt+Shift+R):
#' Title
#'
#' @param x
#' @param y
#' @return
#' @export
#' @examples
subtract <- function(x, y) {
return(x-y)
}
You now can modify your documentation to
#' Subtract two vectors
#'
#' @param x First vector.
#' @param y Vector to be subtracted.
#' @return The difference.
#' @export
#' @examples
#' subtract(x = c(5,5), y = c(2,3))
subtract <- function(x, y) {
return(x-y)
}
Note
- Parameters/function arguments are documented using the
@param
tag. - Return value is documented using the
@return
tag. - Under the
@examples
tag you can insert some examples. - Ignore the
@export
tag. This is used if you include your function in your own package. Package development is beyond the scope of this course. If you are interested, have a look at the book Hadley Wickham (2015).
A list of further tags can be seen in the vignette Rd (documentation) tags.
10.6 Example - Job sequencing
Recall the job sequencing problem in Section 5.8 that consider a problem of determining the best sequencing of jobs on a machine. A set of startup costs are given for 5 machines:
Moreover, when changing from one job to another job, the setup costs are given as:
setup_costs <- matrix(c(
NA, 35, 22, 44, 12,
49, NA, 46, 38, 17,
46, 12, NA, 29, 41,
23, 37, 31, NA, 26,
17, 23, 28, 34, NA),
byrow = T, nrow = 5)
setup_costs
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NA 35 22 44 12
#> [2,] 49 NA 46 38 17
#> [3,] 46 12 NA 29 41
#> [4,] 23 37 31 NA 26
#> [5,] 17 23 28 34 NA
For instance, the setup cost from Job 2 to Job 4 is 38.
The goal of the problem is to determine a sequence of jobs which minimizes the total setup cost including the startup cost.
One possible way to find a sequence is the use a greedy strategy:
Greedy Algorithm
Step 0: Start with the job which has minimal startup cost.
Step 1: Select the next job as the job not already done
with minimal setup cost given current job.
Step 2: Set next job in Step 1 to current job and
go to Step 1 if not all jobs are done.
In R the greedy algorithm can be implemented as:
#' Calculate a job sequence based on a greedy algorithm
#'
#' @param startup Startup costs.
#' @param setup Setup costs.
#' @return A list with the job sequence and total setup costs.
greedy <- function(startup, setup) {
jobs <- nrow(setup)
cur_job <- which.min(startup)
cost <- startup[cur_job]
# cat("Start job:", cur_job, "\n")
job_seq <- cur_job
setup[, cur_job] <- NA
for (i in 1:(jobs-1)) {
next_job <- which.min(setup[cur_job, ])
# cat("Next job:", next_job, "\n")
cost <- cost + setup[cur_job, next_job]
job_seq <- c(job_seq, next_job)
cur_job <- next_job
setup[, cur_job] <- NA
}
# print(setup)
return(list(seq = job_seq, cost = cost))
}
greedy(startup_costs, setup_costs)
#> $seq
#> [1] 5 1 3 2 4
#>
#> $cost
#> [1] 115
First, the job with minimum startup cost is found using function which.min
and we define cost as the startup cost. We use cat
to make some debugging statements and initialize job_seq
with the first job. Next, we have to find a way of ignoring jobs already done. We do that here by setting the columns of setup cost equal to NA for jobs already done. Hence, they will not be selected by which.min
. The for
loop runs 4 times and selects jobs and accumulate the total cost. Finally, the job sequence and the total cost is returned as a list.
A well-known better strategy is to:
Better Algorithm
Step 0: Subtract minimum of startup and setup cost for each job from setup and
startup costs (that is columnwise)
Step 1: Call the greedy algorithm with the modified costs. Note that the total
cost returned has to be modified a bit.
The better strategy implemented in R:
#' Calculate a job sequence based on a better (greedy) algorithm
#'
#' @param startup Startup costs.
#' @param setup Setup costs.
#' @return A list with the job sequence and total setup costs.
better <- function(startup, setup) {
jobs <- nrow(setup)
min_col_val <- apply(rbind(startup, setup), 2, min, na.rm = T)
startup <- startup - min_col_val
min_mat <- matrix(rep(min_col_val, jobs), ncol = jobs, byrow = T)
setup <- setup - min_mat
lst <- greedy(startup, setup)
lst$cost <- lst$cost + sum(min_col_val)
return(lst)
}
better(startup_costs, setup_costs)
#> $seq
#> [1] 4 1 3 2 5
#>
#> $cost
#> [1] 109
First the number of jobs are identified.
Next, we need to find the minimum value in each column. Here we use the apply
function. The first argument is the setup matrix with the startup costs added as a row. The second argument is 2 indicating that we should apply the third argument to each column (if was equal 1 then to each row). The third argument is the function to apply to each column (here min
). The last argument is an optional argument passed to the min
function. With the current values min_col_val
equals 17, 12, 22, 29, and 12.
Afterwards the minimum values are subtracted in each column. Note for subtracting the minimum values from the setup cost, we first need to create a matrix with the minimum values (min_mat
).
Finally, we call the greedy algorithm with the new costs and correct the returned result with the minimum values.
10.7 Recap
Writing functions is a core activity of an R programmer. It represents the key step of the transition from a user to a programmer. Functions have inputs and outputs. Functions (and control structures) are what makes your code more dynamic.
Functions are often used to encapsulate a sequence of expressions that need to be executed numerous times, perhaps under slightly different conditions. In programming, functional programming is a programming paradigm, a style of how code is written. Rather than repeating the code, functions and control structures allow one to build code in blocks. As a result, your code becomes more structured, more readable and much easier to maintain and debug (find errors).
Functions can be defined using the function()
directive.
The named arguments (input values) can have default values. Moreover, R passes arguments by value. That is, an R function cannot change the variable that you input to that function.
A function can be called using its name and its arguments can be specified by name or by position in the argument list.
Functions always return the last expression evaluated in the function body or when you use the return
flow control statement (good coding practice).
Scoping refers to the rules R use to look up the value of variables. A function will first look inside the body of the function to identify all the variables. If all variables exist, no further search is required. Otherwise, R will look one level up to see if the variable exists.
Functions can be assigned to R objects just like any other R object.
Document your functions using the Roxygen skeleton!
10.8 Exercises
Below you will find a set of exercises. Always have a look at the exercises before you meet in your study group and try to solve them yourself. Are you stuck, see the help page. Some of the solutions to each exercise can be seen by pressing the button at each question. Beware, you will not learn by giving up too early. Put some effort into finding a solution! Always practice using shortcuts in RStudio (see Tools > Keyboard Shortcuts Help).
Go to the Tools for Analytics workspace and download/export the TM10 project. Open it on your laptop and have a look at the files in the exercises
folder which can be used as a starting point.
10.8.1 Exercise (defining functions)
Solve this exercise using a script file.
- Create a function
sum_n
that for any given value, say \(n\), computes the sum of the integers from 1 to n (inclusive). Use the function to determine the sum of integers from 1 to 5000. Document your function too.
- Write a function
compute_s_n
that for any given \(n\) computes the sum \(S_n = 1^2 + 2^2 + 3^2 + \dots + n^2\). Report the value of the sum when \(n=10\).
- Define an empty numerical vector
s_n
of size 25 usings_n <- vector("numeric", 25)
and store in the results of \(S_1, S_2, \dots S_{25}\) using a for-loop. Confirm that the formula for the sum is \(S_n= n(n+1)(2n+1)/6\) for \(n = 1, \ldots, 25\).
- Write a function
biggest
which takes two integers as arguments. Let the function return 1 if the first argument is larger than the second and return 0 otherwise.
- Write a function that returns the shipping cost as 10% of the total cost of an order (input argument).
- Given Question 5, rewrite the function so the percentage is an input argument with a default of 10%.
- Given Question 5, the shipping cost can be split into parts. One part is gasoline which is 50% of the shipping cost. Write a function that has total cost as input argument and calculate the gasoline cost and use the function defined in Question 5 inside it.
- Given Question 6, the shipping cost can be split into parts. One part is gasoline which is 50% of the shipping cost. Write a function that has total cost a input argument and calculate the gasoline cost and use the function defined in Question 6 inside it. Hint: Use the
...
argument to pass arguments toshipping_cost
.
- Given Question 8, write a function
costs
that, given total cost, returns the total cost, shipping cost and gasoline cost.
10.8.2 Exercise (euclidean distances)
This exercise is a slightly modified version an exam assignment (exam 2021-A1).
The euclidean distance between two points \(p = (p_1,p_2)\) and \(q = (q_1,q_2)\) can be calculated using formula \[ d(p,q) = \sqrt{(p_1-q_1)^2 + (p_2-q_2)^2}.\]
- Calculate the distance between points \(p = (10,10)\) and \(q = (4,3)\) using the formula.
Consider 4 points in a matrix (one in each row):
p_mat <- matrix(c(0, 7, 8, 2, 10, 16, 8, 12), nrow = 4) p_mat #> [,1] [,2] #> [1,] 0 10 #> [2,] 7 16 #> [3,] 8 8 #> [4,] 2 12
The distance matrix of
p_mat
is a 4 times 4 matrix where entry (i,j) contains the distance from the point in rowi
to the point in rowj
.Calculate the distance matrix of
p_mat
.
Create a function
calc_distances
with the following features (implement as many as you can):- Takes a matrix
p_mat
with a point in each row as input argument. - Takes two additional input arguments
from
andto
with default values1:nrow(p_mat)
- Return the distance matrix with values calculated for rows in the
from
input argument and columns in theto
input argument. The other entries equalsNA
. - The function should work for different
p_mat
(you may assume that the matrix always have two columns).
You may test your code using:
- Takes a matrix
10.8.3 Exercise (scope)
- After running the code below, what is the value of variable
x
?
- Is there any problems with the following code?
- Have a look at the documentation for operator
<<-
(run?'<<-'
). After running the code below, what is the value of variablex
?
- After running the code below, what is the value of variable
x
and output of the function call?
10.8.4 Exercise (time conversion)
This exercise is a slightly modified version an exam assignment (exam 2022-A1).
- Make functions:
SecToMin
which takes an input argumentsec
in seconds and return the number converted to minutes.SecToHours
which takes an input argumentsec
in seconds and return the number converted to hours.MinToSec
which takes an input argumentmin
in minutes and return the number converted to seconds.MinToHours
which takes an input argumentmin
in minutes and return the number converted to hours.HoursToMin
which takes an input argumenthours
in hours and return the number converted to minutes.HoursToSec
which takes an input argumenthours
in hours and return the number converted to seconds.
All numbers may be decimal numbers, e.g. 90 seconds is 1.5 minutes and 1.5 hours is 90 minutes.
- Make a function
ConvertTime
which takes two input arguments:
val
A number.unit
A string that can take values “sec”, “min” and “hours”.
The function should return val
converted to seconds, minutes and hours with features:
- works for all possible values for
unit
, - uses the functions in Question 1,
- returns a vector with 3 numbers (seconds, minutes and hours) or
NA
ifunit
does not equals “sec”, “min” or “hours”.