Loops are a fundamental element of computer programming. They facilitate iteration by running a piece of code repeatedly and provide a flexible approach to solving many problems that you may encounter.
Loops have gained something of a bad reputation amongst the R community, largely owing to the fact that poor implementation of loops in R can result in code that is slow and inefficient. However, with proper use loops can be a powerful and effective tool for the R programmer.
for loops
Lets start with a simple example, a loop to print “a”, “b” and “c” to the console.
To understand what’s happening with our loop we can break it down into the component parts.
The header
for (x in c("a", "b", "c"))
The header of our for loop is used to define our loop variable, what is going to change during each iteration. During each iteration x
will take on a different value from c("a", "b", "c")
. In effect, executing the for loop assigns a temporary value to x
for each iteration.
1st iteration
x = "a"
2nd iteration
x = "b"
3rd iteration
x = "c"
The body
print(x)
The body of our for loop is used to define the action or actions to be performed during each iteration. The body will always use the current value of x
, which changes with each iteration. Therefore, our for loop actually produces 3 different outputs.
1st iteration
[1] "a"
2nd iteration
[1] "b"
3rd iteration
[1] "c"
Constructing the header
The syntax used in the header of a for loop is a little different from typical R syntax in terms of the way in which it is structured and this can cause confusion. The simplest way to think about writing the loop is to consider the left hand side (LHS) and right hand side (RHS) of in
. The value that you place on the LHS of in
will change with each iteration to the next element of the vector passed to the RHS of in
.
There are no rules as to what the LHS value should be named. All the below examples would produce the same result.
The RHS can realistically be any R object that can be iterated over. The following code examples produce the same output.
Constructing the body
The body will contain more traditional R code, but should (almost always) be wrapped in curly braces ({ }
). If we try to write a simple loop without enclosing the body with { }
, things may not work as intended. Let’s look at a loop with 2 print statements written with and without the curly braces.
In the example with the enclosed body R has executed all of the code contained within the curly braces for each operation, thus our letters have been printed twice. In the example without an enclosed body, R has only included the next line following the header in the loop iterations, our second print statement has therefore only ran once, after the loop has completed. The extra “c” that was printed came from the final value assigned to x
.
Information: It is important to note that unlike function calls, loops do not execute within an enclosed environment. This means that any temporary assignments made in the body will exist in your workspace once the loop completes. We can see this by running print(x)
in isolation.
print(x)
#' [1] "c"
x
now exists in our global environment and its value is as per the last element of the vector we iterated over in our loop (c("a", "b", "c")
).
Storing an output
Often we will want to use a loop to apply the same manipulation to multiple elements of a vector. Let’s start with an example of a loop that adds 1 to another number and prints the result to the console.
If we want to store the outputs as a vector rather than print them to the console, however, we can’t create a new object within the loop itself to capture all of the iterations. If we do, the output might not be what you initially expect.
In the above example each iteration overwrites the previous value of output
, so we only capture the value from the final loop iteration. We actually need to create the output
vector before we execute the loop. Let’s create an empty vector of the correct type named output
with the vector()
function. We can then use c()
inside the body to append the output of each iteration to output
.
It worked! We managed to store all of our outputs in the output
vector. However, just because we can do something doesn’t mean that we should. Its important to understand that this method can be used and it is typically the solution most people will come up with first, but in practice NEVER DO THIS.
Instead lets look at pre-allocation.
Pre-allocation
Whilst the example above achieves our desired outcome it is inefficient and should be avoided. When used with toy data to provide a simple example the approach won’t produce any noticeable performance issues, however, if we scaled this to a larger number of iterations or increased the complexity we would have slow and memory hungry code. This issue is a key driver of negativity within the R community around the use of loops.
To understand why this approach doesn’t work well, we need to look at R’s copy on modify behaviours. Every R object exists in memory (the computers RAM specifically), the object has an address so that R knows where to find it, and it has an amount of space allocated to it. The space an object is given is always precisely the amount it requires in its current form.
When we add more elements to a vector using c()
the new object requires more space than has been assigned to the original one; c(1,2,3,4)
won’t fit in the same space as c(1,2,3,4,5)
. R therefore has to move the object to a new space each time that the vector increases in length and moving is time consuming as it involves making a copy of the original object. There is also a compounding effect in that the bigger the object, the more time it takes to move, so each iteration takes longer.
Thankfully, there is a simple solution to the problem in the form of pre-allocation.
Pre-allocation allows us to create an object to store our output up front, ensuring that it has enough memory assigned to it from the beginning, and then populating it. Whilst it may sound quite technical the implementation is rather simple.
Firstly, we need to identify the size that we require output
to be. We know that our loop will produce one value during each iteration, and the number of iterations will be equal to the length of numbers
. Let’s start by capturing the length of numbers
.
We can now create our vector, output
, ensuring that it is sufficient in size to capture all of out loop outputs. We need to ensure that our vector is of the correct type using the mode
argument of the vector()
function. In this case we are working with numeric values so we specify mode = "numeric"
. We can pass the len_numbers
object that we created earlier to the length
argument.
We now need to rewrite our loop head a little to make use of pre-allocation. Our original for (n in numbers)
allows us to work on the elements of numbers
directly, however, we are now going to want to access the elements of numbers with the index system which will allow us to assign the output of each iteration to our output
vector. We can use the seq_len()
function for this.
seq_len(len_numbers)
returns a vector of the numbers 1 to len_numbers
(5).
So n
s value during the 1st iteration of the loop will be 1, then 2 during the second iteration and so on. We use the value of n
along with R’s indexing system to access the individual elements of numbers
during each iteration and assign the output to the corresponding element of output
.
For example;
In our 1st iteration;
-
n
equals 1 -
element 1 of
numbers
(numbers[1]
) has 1 added to it (1 + 1 = 2) -
the value is assigned to element 1 of
output
(output[1]
)
In our 2nd iteration;
-
n
equals 2 -
element 2 of
numbers
(numbers[2]
) has 2 added to it (2 + 1 = 3) -
the value is assigned to element 2 of
output
(output[2])
Benchmarking pre-allocation
To demonstrate the difference between growing a vector and utilising pre-allocation we can use a benchmarking tool. I recommend the microbenchmark
package.
Lets compare 2 approaches to adding 1 to a vector of length 10,000, one using pre-allocation, and one which grows the output with every iteration.
The pre-allocation method is 38 times quicker for this example, and that ratio increases further the larger the data involved.
while loops
Where a for loop typically performs an iteration for each element of a predetermined input, a while loop keeps executing until a condition is met.
Let’s write a loop that prints numbers, starting at 1, increases the number by 1 in each iteration, and stops at 10.
Our loop first prints the current value of it
to the console, it then adds 1 to it
and reassigns the new value. The test it <= 10
is performed before the code block runs on each iteration. Once the while condition evaluates to TRUE
, when it
is greater than or equal to 10, the loop ends.
break
We can use the break
key word inside a loop to cause it to stop executing if a condition is met. Lets write a for loop to find the factors of 42, but (rather inefficiently) we are going to check whether numbers 1 through to 100 are factors of 42. We can include an if
statement to ensure only factors of 42 are printed to the console.
It worked! But our loop executed 100 times. We can check this by looking at the value of ii
.
We know we won’t find any factors once we pass the value of 42 itself, so lets add a break condition. We will use an if
statement to check whether ii
is equal to 42 during each iteration and if it is we will break the loop. To use break
we simply include the key word, in this example, inside the body of the if
statement.
The output is unchanged, but our loop ‘broke’ once ii
was equal to 42 and no more iterations were performed. Again, we can check this by looking at the value of ii
once the the loop has completed.
next
The next
key word causes the loop to skip the remainder of a single iteration, but unlike break
it continues to perform the rest of the remaining iterations. To demonstrate the functionality lets print the numbers 1 to 10, but skip any numbers that are multiples of 3.
Whenever ii
is divisible by 3 the condition within the if
statement is triggered and the next
command causes the current iteration to end and the next one to begin. When next
is triggered the print()
function is never executed within that iteration.
Using loops - examples
In the absence of vectorisation
The examples above demonstrate how loops function, but don’t represent very efficient real world uses. In R many functions are ‘vectorised’ which means that operations are performed in parallel. Loops offer a solution for when vectorisation isn’t available. Lets explore this with some simple vectors, x
, y
, and z
.
Vectorisation is the reason that x + y
gives us 7 9 11 13 15
.
In essence x + y
is actually returning;
z + y
is a little different, the value of z
is recycled and used against each element of y
, so we actually get the return of;
Vectorisation is also the reason why vectorised functions like sqrt()
act on each element of a vector.
However, when a function isn’t vectorised we have to proceed a little differently. The digest package provides the function digest()
which can be used to generate hash digests of R objects. Given its intended usage it is not vectorised. Lets run digest()
over a vector of names.
digest()
has taken the entire vector and returned a hash for it (as intended). If we want to get a hash for each element of the vector then we will have to try something else, like a loop.
Using a simple loop we can hash each element of the vector.
To perform the same operation on multiple columns
We can use loops to perform the same operation multiple times on a different input. Consider the following data frame.
If we want to specify the type of col_a
, col_c
, and col_e
as double
we could do something like this;
Or in the tidyverse;
However, with 3 columns to transform the code is already cumbersome, in a larger data set we could easily find the approach becoming difficult to maintain. The tidyverse does offer a solution to reduce the amount of code required here but it involves learning additional functions and adds complexity.
Loops provide an alternative approach here that is well suited to the task at hand, compact, and simple to understand.
Next steps
Loops provide a powerful and flexible way to perform iteration. Correct usage can help to reduce the amount of code you need to write to achieve a desired outcome. Try the tasks below to put your knowledge into practice.
-
Write a for loop to calculate the mean value of all columns in the built in
mtcars
dataset. Store the means in a vector namedmtcars_means
. What is the sum ofmtcars_means
? -
Create 2 objects,
x
andy
, and assign the value of1
to each. Write a while loop that;- doubles the value of
x
in each iteration. - adds 1 to the value of
y
in each iteration. - ends when
y
is equal to 20
What is the value of
x
when the loop ends? - doubles the value of
-
Write a loop to print all the numbers between 1 and 100 (inclusive) which are divisible by 33. Try to include the
next
keyword. Which numbers printed to the console?
Answers
1. 435.69382. 524,288
3. 33, 66, and 99