In R, a vector is a series of elements with the same type. There are 4 important types to be aware of, that you will commonly encounter;
- Character
- Logical
- Double
- Integer
Atomic vectors
You may hear these being referred to as atomic vectors. It’s also useful to note that the term ‘numeric’ is often used interchangeably for both double and integer types.
c()
c()
is a built-in R function that combines the elements it is passed into a vector. You can provide any number of elements to c()
.
It is important to be aware that c()
coerces elements to a common type. This is a common occurrence in R and understanding the implications can help to avoid errors. R is what is referred to as a weakly typed language, this means that sometimes a type will be coerced into another when code is executed.
Thinking of the 4 common types in R, type coercion is performed according to a hierarchy;
Logical >>> Integer >>> Double >>> Character
You can test how this works using typeof()
.
Assignment
In order to be able to refer to or use an object in R after it’s creation, we first need to assign it a name. The R assignment operator <-
is the best way of achieving this. It is also possible to use =
exclusively for this task, however, this goes against the community consensus so should be avoided.
Functions - arguments and use
In R we use functions extensively, in their simplest form a function takes inputs (referred to as arguments) and returns an output. The sum()
function returns the sum of all the values present in its arguments.
Most functions allow for more than one argument, sum()
for example has an argument na.rm
. The na.rm
argument accepts a logical (TRUE
or FALSE
) value. Setting the argument to TRUE
will remove missing values.
Indicies in R
In R indices start at 1. Its useful to know that you can define a range of numbers using the :
operator. Instead of writing c(1, 2, 3, 4, 5)
to generate a vector of the numbers 1 to 5, you can write 1:5
.
You can use the index system to access elements of a vector. letters
is a constant built into R, it is in effect a vector of the letters of the alphabet in lower case.
To access a single element of a vector, for example the 13th letter of the alphabet, you can use the index system.
If you want to get the first 5 letters of the alphabet you can also use the index system with a range.
The next 5 letters can be accessed in the same way.
To get the 2nd, 4th, and 6th letters of the alphabet, you can use c()
.
If you need to know the length of a vector, the length()
function is particularly useful.
Knowing how to check the length of a vector can be quite useful, it allows you to do things like get everything after the 20th element (inclusive), or even the last 5 elements.
You can also use indexing to change the order of the elements returned, for example you can return the first 5 letters of the alphabet in reverse order.
The which()
function returns the indices of a vector that evaluate to TRUE
. You can use it to find the index of j
in the letters
constant.
You can also use which()
to create a subset from the letters vector.
However, the which function is actually unnecessary here, as it’s much simpler to do;
Working with vectors
The Nile
data-set, a time-series of length 100, is bundled with R. It provides measurements of the annual flow of the river Nile at Aswan between 1871 and 1970.
You can get the mean measurement with the mean()
function.
The mean of the most recent 10 years worth of data can be found by using the index to get a subset of the vector.
You could check the number of years in which the flow was over 1000.
The TRUE
values indicate elements greater than 1000. However, counting these manually wouldn’t be very efficient. Helpfully, logical values can be interpreted as numbers, with TRUE
being 1 and FALSE
being 0. With that in mind, the sum()
function can be used to get the number of observations >
1000.
You might have noticed that in Nile > 1000
, Nile
is a vector of 100 numeric elements, doubles to be precise, whereas the ‘1000’ is a single element. In this case R ‘recycles’ the ‘1000’ and uses it against each of the elements of Nile
.
Next Steps
WWWusage
is another time-series data-set that is bundled with R, this time recording the numbers of users connected to the Internet through a server every minute.
-
What is the length of
WWWusage
? -
What is the mean number of connected users?
-
How many observations recorded less than 100 users?
-
What is the sum of observations of greater than 100 users?
-
Calculate the sum of the means of the first and last 20 observations?
Answers
1. 1002. 137.08
3. 27
4. 11,313
5. 292.25