2 Basics
2.1 Some Basic Commands
Before typing commands into the script editor, it might be useful for you to know some of the following basic commands.
- To place a comment in the script file, begin the line with
#
. - To run (execute) a line in the script file, place the cursor on the line, hit
Ctrl/Cmd + Enter
.- If we typed the command directly into the console, we only need to hit
Enter
- If we typed the command directly into the console, we only need to hit
- To run multiple lines in the script file, select the lines, hit
Ctrl/Cmd + Enter
. - To run all lines in the script file, hit
Ctrl + Shift + Enter
- To clear the console when it becomes too messy, hit
Ctrl/Cmd + L
. - To request help, type a question mark in front of the command or the package name (e.g.,
?cor
). Information about the command will appear in the Help tab (lower right pane).
2.2 Typing and Executing Commands
Now, let’s actually get R to do stuff for us! Open a script file (Ctrl/CMD + Shift + N
). Then copy and paste the following R codes into the script file. Run each line (Ctrl/Cmd + Enter
) to see what they do. Note that #
denotes a comment, and therefore it will run as a line of text. Experiment and have fun!
# Assign a single value (e.g., 9) to an object, say x.
x <- 9 # this means "x gets the value of 9".
# Get the value for x.
x # remember that R is case-sensitive. If you typed X, you'll get an error message.
X # see how R complains here that it can't find the object?
# If you want to know what objects are in the workspace (i.e., R's "short term" memory), look at the Environment tab or type ls().
ls()
# You may remove an object (e.g., x) from the workspace using rm(), where rm stands for remove.
rm(x)
# If there are too many objects in the workspace, you may remove all objects from the workspace using rm(list=ls()). To remember this command, I think of it as telling R to remove the list of objects in the Environment tab.
rm(list = ls())
# Assign a non-numerical value by putting the value in quotation marks.
y <- "hello!"
# Get the value of y. Notice the value of y is in quotation marks, indicating it is a non-numerical value.
y
# Perform the following mathematical operations in R.
11 + 10
11 - 10
11 * 10
11 / 10
11 ^ 10
11 ^ (1/2)
sqrt(11) # this number should be the same as above line
log(11) # taking natural log (log base e also known as ln)
log10(11) # taking log base 10
exp(11) # taking the exponential
# Perform mathematical operations in R with an object (e.g., a).
a <- 11 # a gets the value of 11
a + 10
a - 10
a * 10
a / 10
a ^ (1/2)
sqrt(a)
log(a)
log10(a)
exp(a)
# Perform mathematical operations with more than one object.
y <- 2 # notice that 2 now replaces the value “hello”.
a + y
a - y
a * y
a / y
a ^ (1/y)
# You cannot perform mathematical operations with non-numerical objects.
b <- "1" # recall, putting things in between quotation marks makes it non-numerical, even if 1 is a number.
a + b # you get an error here because you cannot perform mathematical operations with non-numerical objects.
# An object can store more than one value, such as a set of numbers or a set of characters. This is known as a vector and can be created using c().
num_vector <- c(1, 2, 3, 4, 5) # numeric vector
fruits <- c("apple", "banana", "orange") # non-numeric vector
# Get the values stored in the vectors
num_vector
fruits
# You can do mathematical operations with numerical vectors but not with non-numerical vectors.
num_vector * 5 # each value in num_vector is multiplied by 5
fruits * 5 # this throws an error that tells you that you cannot use non-numeric values for this operation
2.3 Entering Data Directly Into R
In reality, we rarely use R to do such simple mathematical calculations; we use R for data analyses. But before we can conduct any analysis, we need to give R some data. One way is to key the data directly into R.
Let’s say we have three people, Bob, Andrea, and Calvin. We have their ages, the number of children each of them has, and their gender.
Name | Age | No. of Children | Gender |
---|---|---|---|
Bob | 48 | 1 | Male |
Andrea | 47 | 3 | Female |
Calvin | 49 | 2 | Male |
Let’s figure out how to give R this set of data.
First, we create an object name
, and assign the three people’s names to it.
name <- c("Bob", "Andrea", "Calvin")
# the c in c() stands for combine
# here, we're telling R to combine the three names into a list of names
# notice the names are between quotation marks
# this tells R that name is a string (non-numerical) variable
If we type name
into the console, we should get:
## [1] "Bob" "Andrea" "Calvin"
Next, we create the object age
, with the three people’s ages.
age <- c(48, 47, 49)
# here, we're creating a list of ages
# the order of age should match the order of the names
If we type age
into the console, we should get:
## [1] 48 47 49
Now, let’s combine the two variables into a data frame. You can think of a data frame as a spreadsheet or as a table, where each row represents the information for one person, and name
and age
are in side-by-side columns.
We will use the data.frame()
function and call this data frame dataset
.
dataset <- data.frame(Name = name, Age = age)
# Name = name tells R that the column should have the header Name and the data for the column should be copied from the object name.
# Age = age tells R that the column should have the header Age and the data for the column should be copied from the object age.
Type dataset
into the console. You should see:
## Name Age
## 1 Bob 48
## 2 Andrea 47
## 3 Calvin 49
Notice that the output has two columns: Name
and Age
, where Name
lists the names of the individuals and Age
lists their ages.
The values of Name
and Age
in the data frame were copied from the original objects. This means that the original objects, name
and age
, are still in R’s memory. You can see that this is the case from the Environment tab or when you use the ls() function.
## [1] "age" "dataset" "name"
Let’s remove the original variables name
and age
.
Now, when you type ls() into the console, you’ll see that all is left is the data frame, dataset
.
## [1] "dataset"
If you wish, you can directly type the data into a data frame instead of first creating separate objects for each variable.
dataset2 <- data.frame(Name = c("Bob", "Andrea", "Calvin"), Age = c(48, 47, 49))
# dataset and dataset2 will look exactly the same
Now, suppose we want to use the variables in the data frame dataset
. We will need to attach dataset$
(a dollar sign after the name of the data frame) before the variable name. This will tell R that that variable comes from the data frame dataset
.
For example, if we want to know the ages of the three participants. We should type:
## [1] 48 47 49
# this tells R to go to the object called dataset, locate the column Age and then produce the values in that column
# remember that R is case-sensitive. So if you'd typed dataset$age, you'll get an error.
# typing age or Age won't work either because there isn't an object called age or Age.
From the original table, we know that there are more variables (columns) we need to add to the data frame dataset
: the number of children the person has and the person’s gender. Let’s label the number of children each person has as Children
and the gender of each person as Gender
.
dataset$Children <- c(1, 3, 2)
# this tells R to go to the object dataset and create a column called Children. Then, assign the column the values 1, 3, and 2.
dataset$Gender <- c("male", "female", "male")
# this tells R to go to the object dataset and create a column called Gender. Then, assign the column the values "male", "female", and "male"
# again, notice the order of dataset$Children and dataset$Gender match the order of the names
Now, when you type dataset
into the console, you’ll get:
## Name Age Children Gender
## 1 Bob 48 1 male
## 2 Andrea 47 3 female
## 3 Calvin 49 2 male
Note that we created Children
and Gender
within the data frame dataset
. So if you type Children
and Gender
without dataset$
, you will get an error message in the console telling you that the object cannot be found because there isn’t an object (independent of the data frame) that is called Children
or Gender
.
Now, let’s say we made a mistake and need to remove Children
from the data frame, dataset
. We can type:
dataset$Children <- NULL
# this tells R to assign the value of NULL to dataset$Children
# NULL basically represents non-existence. So, when we assign NULL to dataset$Children, it means we want that column to be non-existent / removed.
dataset
## Name Age Gender
## 1 Bob 48 male
## 2 Andrea 47 female
## 3 Calvin 49 male
Finally, if we want to save the data frame into a .csv file, we use write.csv()
.
write.csv(dataset, "bobandreacalvin.csv")
# this tells R to save the object, dataset, as a .csv file, and to name the .csv file as bobandreacalvin.csv.
# remember to specify both the name and the extension of the file (e.g., bobandreacalvin.csv).
The .csv file will now be saved as bobandreacalvin.csv
within your current directory. If you used an R project, it will be saved in your project directory. If you didn’t use an R project but set the working directory, it will be in the working directory. Otherwise, it will be in the default directory. If you don’t know what the directory is, type getwd()
into the console.
As an aside, some students might wonder why we don’t use write.csv2
instead. write.csv2
uses a comma for the decimal point whereas write.csv
uses a full stop / period for the decimal point. Since it’s more usual to use the full stop / period for the decimal point, we’ll stick to write.csv
.
Phew! That was a lot of information. Now… Try to apply what you learnt in this section to the exercise in the following section!