-
Notifications
You must be signed in to change notification settings - Fork 0
/
Swirl
519 lines (396 loc) · 16.7 KB
/
Swirl
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
install.packages("swirl")
library(swirl)
swirl()
install_from_swirl("R Programming")
sudo dnf install R
sudo dnf install R-devel
sudo dnf install libxml2-devel
sudo dnf install libcurl-devel
sudo dnf groupinstall "C Development Tools and Libraries"
Installing package into ‘/usr/lib64/R/library’
(as ‘lib’ is unspecified)
also installing the dependencies ‘memoise’, ‘stringi’, ‘magrittr’, ‘crayon’, ‘jsonlite’,
’mime’, ‘curl’, ‘R6’, ‘bitops’, ‘stringr’, ‘testthat’, ‘httr’, ‘yaml’, ‘RCurl’, ‘digest’
| When you are at the R prompt (>):
| -- Typing skip() allows you to skip the current question.
| -- Typing play() lets you experiment with R on your own; swirl will ignore what you do...
| -- UNTIL you type nxt() which will regain swirl's attention.
| -- Typing bye() causes swirl to exit. Your progress will be saved.
| -- Typing main() returns you to swirl's main menu.
| -- Typing info() displays these options again.
| If at any point you'd like more information on a particular topic related to R, you can type help.start() at the prompt, which will open a menu of resources (either within RStudio
| or your default web browser, depending on your setup). Alternatively, a simple web search often yields the answer you're looking for.
| Anytime you have questions about a particular function, you can access R's built-in help files via the `?` command. For example, if you want more information on the c() function,
| type ?c without the parentheses that normally follow a function name. Give it a try.
The output type is determined from the highest type of the components in the hierarchy NULL < raw < logical < integer < double < complex < character < list < expression. Pairlists are treated as lists, but non-vector components (such names and calls) are treated as one-element lists which cannot be unlisted even if recursive = TRUE.
getwd()
ls() # memoria
list.files()
dir()
?list.files
args(list.files)
dir.create("testdir")
setwd("testdir")
file.create("mytest.R")
file.exists("mytest.R")
file.info("mytest.R")
file.info("mytest.R")$mode
file.rename("mytest.R", "mytest2.R")
file.copy("mytest2.R","mytest3.R")
file.path("mytest3.R")
| You can use file.path to construct file and directory paths that are independent of the operating system your R code is
| running on. Pass 'folder1' and 'folder2' as arguments to file.path to make a platform-independent pathname.
> file.path("folder1", "folder2")
[1] "folder1/folder2"
?dir.create
dir.create(file.path("testdir2", "testdir3"), recursive=TRUE)
unlink("testdir2", recursive = TRUE)
### Sequence of numbers
1:10
pi:10
15:1
?`:`
seq(1,20)
seq(0, 10, by=0.5)
seq(5, 10, length=30)
my_seq <- seq(5, 10, length=30)
length(my_seq)
1:length(my_seq)
seq(along.with = my_seq)
seq_along(my_seq)
rep(0, times=40)
rep(c(0, 1, 2), times = 10)
rep(c(0, 1, 2), each = 10)
### Vectors
| In previous lessons, we dealt entirely with numeric vectors, which are one type of atomic vector. Other types of atomic
| vectors include logical, character, integer, and complex. In this lesson, we'll take a closer look at logical and
| character vectors.
num_vect <- c(0.5, 55, -10, 6)
tf <- num_vect < 1
| If we have two logical expressions, A and B, we can ask whether at least one is TRUE with A | B (logical 'or' a.k.a.
| 'union') or whether they are both TRUE with A & B (logical 'and' a.k.a. 'intersection'). Lastly, !A is the negation of
| A and is TRUE when A is FALSE and vice versa.
my_char <- c("My", "name", "is")
paste(my_char, collapse = " ")
my_name <- c(my_char, "Ray")
paste("Hello", "world!", sep = " ")
> paste(c(1:3), c("X", "Y", "Z"), sep="")
[1] "1X" "2Y" "3Z"
| Vector recycling! Try paste(LETTERS, 1:4, sep = "-"), where LETTERS is a predefined variable in R containing a
| character vector of all 26 letters in the English alphabet.
> paste(LETTERS, 1:4, sep = "-")
[1] "A-1" "B-2" "C-3" "D-4" "E-1" "F-2" "G-3" "H-4" "I-1" "J-2" "K-3" "L-4" "M-1" "N-2" "O-3" "P-4" "Q-1" "R-2" "S-3"
[20] "T-4" "U-1" "V-2" "W-3" "X-4" "Y-1" "Z-2"
### Missing values
x <- c(44, NA, 5, NA)
| To make things a little more interesting, lets create a vector containing 1000 draws from a standard normal
| distribution with y <- rnorm(1000).
> y <- rnorm(1000)
| Next, let's create a vector containing 1000 NAs with z <- rep(NA, 1000).
> z <- rep(NA, 1000)
| Finally, let's select 100 elements at random from these 2000 values (combining y and z) such that we don't know how
| many NAs we'll wind up with or what positions they'll occupy in our final vector -- my_data <- sample(c(y, z), 100).
> my_data <- sample(c(y, z), 100)
| Let's first ask the question of where our NAs are located in our data. The is.na() function tells us whether each
| element of a vector is NA. Call is.na() on my_data and assign the result to my_na.
> my_na <- is.na(my_data)
| So, back to the task at hand. Now that we have a vector, my_na, that has a TRUE for every NA and FALSE for every
| numeric value, we can compute the total number of NAs in our data.
| The trick is to recognize that underneath the surface, R represents TRUE as the number 1 and FALSE as the number 0.
| Therefore, if we take the sum of a bunch of TRUEs and FALSEs, we get the total number of TRUEs.
| Let's give that a try here. Call the sum() function on my_na to count the total number of TRUEs in my_na, and thus the
| total number of NAs in my_data. Don't assign the result to a new variable.
> sum(my_na)
[1] 51
INf - Inf --> NaN
0 / 0 --> NaN
# Subsetting Vectors
x[1:10]
x[is.na(x)]
y <- x[!is.na(x)]
y[y>0]
x[!is.na(x) & x>0]
x[c(3,5,7)]
| Luckily, R accepts negative integer indexes. Whereas x[c(2, 10)] gives us ONLY the 2nd and 10th elements of x, x[c(-2,
| -10)] gives us all elements of x EXCEPT for the 2nd and 10 elements. Try x[c(-2, -10)] now to see this.
> x[c(-2, -10)]
| Create a numeric vector with three named elements using vect <- c(foo = 11, bar = 2, norf = NA).
> vect <- c(foo = 11, bar = 2, norf = NA)
names(vect)
vect2 <- c(11, 2, NA)
names(vect2) <- c("foo", "bar", "norf")
> identical(vect, vect2)
[1] TRUE
vect["bar"]
# 7: Matrices and Data Frames
| In this lesson, we'll cover matrices and data frames. Both represent 'rectangular' data types, meaning that they are
| used to store tabular data, with rows and columns.
| The main difference, as you'll see, is that matrices can only contain a single class of data, while data frames can
| consist of many different classes of data.
my_vector <- 1:20
> dim(my_vector)
NULL
> length(my_vector)
[1] 20
dim(my_vector) <- c(4, 5)
> dim(my_vector)
[1] 4 5
> attributes(my_vector)
$dim
[1] 4 5
> my_vector
[,1] [,2] [,3] [,4] [,5]
[1,] 1 5 9 13 17
[2,] 2 6 10 14 18
[3,] 3 7 11 15 19
[4,] 4 8 12 16 20
> class(my_vector)
[1] "matrix"
my_matrix <- my_vector
?matrix
my_matrix2 <- matrix(1:20, 4, 5)
> identical(my_matrix, my_matrix2)
[1] TRUE
patients <- c("Bill", "Gina", "Kelly", "Sean")
cbind(patients, my_matrix)
> cbind(patients, my_matrix)
patients
[1,] "Bill" "1" "5" "9" "13" "17"
[2,] "Gina" "2" "6" "10" "14" "18"
[3,] "Kelly" "3" "7" "11" "15" "19"
[4,] "Sean" "4" "8" "12" "16" "20"
my_data <- data.frame(patients, my_matrix)
> my_data
patients X1 X2 X3 X4 X5
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
> class(my_data)
[1] "data.frame"
cnames <- c("patient", "age", "weight", "bp", "rating", "test")
colnames(my_data) <- cnames
> my_data
patient age weight bp rating test
1 Bill 1 5 9 13 17
2 Gina 2 6 10 14 18
3 Kelly 3 7 11 15 19
4 Sean 4 8 12 16 20
#
ints <- sample(10)
> ints
[1] 8 7 2 1 5 6 9 10 3 4
> ints > 5
[1] TRUE TRUE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE
> which(ints > 7)
[1] 1 7 8
| Like the which() function, the functions any() and all() take logical vectors as their argument. The any() function
| will return TRUE if one or more of the elements in the logical vector is TRUE. The all() function will return TRUE if
| every element in the logical vector is TRUE.
Sys.Date()
| The mean() function takes a vector of numbers as input, and returns the average of all of the numbers in the input
| vector. Inputs to functions are often called arguments. Providing arguments to a function is also sometimes called
| passing arguments to that function. Arguments you want to pass to a function go inside the function's parentheses. Try
| passing the argument c(2, 4, 5) to the mean() function.
| To understand computations in R, two slogans are helpful: 1. Everything that exists is an object. 2. Everything that
| happens is a function call.
sum(my_vector)/length(my_vector)
Modulus : num %% divisor
---
# You can pass functions as arguments to other functions just like you can pass
# data to functions. Let's say you define the following functions:
#
# add_two_numbers <- function(num1, num2){
# num1 + num2
# }
#
# multiply_two_numbers <- function(num1, num2){
# num1 * num2
# }
#
# some_function <- function(func){
# func(2, 4)
# }
#
# As you can see we use the argument name "func" like a function inside of
# "some_function()." By passing functions as arguments
# some_function(add_two_numbers) will evaluate to 6, while
# some_function(multiply_two_numbers) will evaluate to 8.
#
# Finish the function definition below so that if a function is passed into the
# "func" argument and some data (like a vector) is passed into the dat argument
# the evaluate() function will return the result of dat being passed as an
# argument to func.
#
# Hints: This exercise is a little tricky so I'll provide a few example of how
# evaluate() should act:
# 1. evaluate(sum, c(2, 4, 6)) should evaluate to 12
# 2. evaluate(median, c(7, 40, 9)) should evaluate to 9
# 3. evaluate(floor, 11.1) should evaluate to 11
evaluate <- function(func, dat){
# Write your code here!
# Remember: the last expression evaluated will be returned!
func(dat)
}
evaluate(function(x){x+1}, 6)
evaluate(sd, c(1.4, 3.6, 7.9, 8.8))
evaluate(function(x){x[1]}, c(8, 4, 0))
---
Standard Deviation : sd
?paste
paste("Programming", "is", "fun!")
# The ellipses can be used to pass on arguments to other functions that are
# used within the function you're writing. Usually a function that has the
# ellipses as an argument has the ellipses as the last argument. The usage of
# such a function would look like:
#
# ellipses_func(arg1, arg2 = TRUE, ...)
#
# In the above example arg1 has no default value, so a value must be provided
# for arg1. arg2 has a default value, and other arguments can come after arg2
# depending on how they're defined in the ellipses_func() documentation.
# Interestingly the usage for the paste function is as follows:
#
# paste (..., sep = " ", collapse = NULL)
#
# Notice that the ellipses is the first argument, and all other arguments after
# the ellipses have default values. This is a strict rule in R programming: all
# arguments after an ellipses must have default values. Take a look at the
# simon_says function below:
#
# simon_says <- function(...){
# paste("Simon says:", ...)
# }
#
# The simon_says function works just like the paste function, except the
# begining of every string is prepended by the string "Simon says:"
#
# Telegrams used to be peppered with the words START and STOP in order to
# demarcate the beginning and end of sentences. Write a function below called
# telegram that formats sentences for telegrams.
# For example the expression `telegram("Good", "morning")` should evaluate to:
# "START Good morning STOP"
telegram <- function(...){
paste("START", ..., "STOP")
}
---
# Let's explore how to "unpack" arguments from an ellipses when you use the
# ellipses as an argument in a function. Below I have an example function that
# is supposed to add two explicitly named arguments called alpha and beta.
#
# add_alpha_and_beta <- function(...){
# # First we must capture the ellipsis inside of a list
# # and then assign the list to a variable. Let's name this
# # variable `args`.
#
# args <- list(...)
#
# # We're now going to assume that there are two named arguments within args
# # with the names `alpha` and `beta.` We can extract named arguments from
# # the args list by used the name of the argument and double brackets. The
# # `args` variable is just a regular list after all!
#
# alpha <- args[["alpha"]]
# beta <- args[["beta"]]
#
# # Then we return the sum of alpha and beta.
#
# alpha + beta
# }
#
# Have you ever played Mad Libs before? The function below will construct a
# sentence from parts of speech that you provide as arguments. We'll write most
# of the function, but you'll need to unpack the appropriate arguments from the
# ellipses.
mad_libs <- function(...){
# Do your argument unpacking here!
args <- list(...)
place <- args[['place']]
adjective <- args[['adjective']]
noun <- args[['noun']]
# Don't modify any code below this comment.
# Notice the variables you'll need to create in order for the code below to
# be functional!
paste("News from", place, "today where", adjective, "students took to the streets in protest of the new", noun, "being installed on campus.")
}
---
# The syntax for creating new binary operators in R is unlike anything else in
# R, but it allows you to define a new syntax for your function. I would only
# recommend making your own binary operator if you plan on using it often!
#
# User-defined binary operators have the following syntax:
# %[whatever]%
# where [whatever] represents any valid variable name.
#
# Let's say I wanted to define a binary operator that multiplied two numbers and
# then added one to the product. An implementation of that operator is below:
#
# "%mult_add_one%" <- function(left, right){ # Notice the quotation marks!
# left * right + 1
# }
#
# I could then use this binary operator like `4 %mult_add_one% 5` which would
# evaluate to 21.
#
# Write your own binary operator below from absolute scratch! Your binary
# operator must be called %p% so that the expression:
#
# "Good" %p% "job!"
#
# will evaluate to: "Good job!"
"%p%" <- function(left, right){ # Remember to add arguments!
paste(left, right)
}
---
http://archive.ics.uci.edu/ml/datasets/Flags
head(flags)
dim(flags)
class(flags)
| The lapply() function takes a list as input, applies a function to each element of the list, then returns a list of the
| same length as the original one. Since a data frame is really just a list of vectors (you can see this with
| as.list(flags)), we can use lapply() to apply the class() function to each column of the flags dataset. Let's see it in
| action!
cls_list <- lapply(flags, class)
class(cls_list)
| You may remember from a previous lesson that lists are most helpful for storing multiple classes of data. In this case,
| since every element of the list returned by lapply() is a character vector of length one (i.e. "integer" and "vector"),
| cls_list can be simplified to a character vector. To do this manually, type as.character(cls_list).
as.character(cls_list)
| sapply() allows you to automate this process by calling lapply() behind the scenes, but then attempting to simplify
| (hence the 's' in 'sapply') the result for you. Use sapply() the same way you used lapply() to get the class of each
| column of the flags dataset and store the result in cls_vect. If you need help, type ?sapply to bring up the
| documentation.
cls_vect <- sapply(flags, class)
| Therefore, if we want to know the total number of countries (in our dataset) with, for example, the color orange on
| their flag, we can just add up all of the 1s and 0s in the 'orange' column. Try sum(flags$orange) to see this.
> sum(flags$orange)
flag_colors <- flags[, 11:17]
head(flag_colors)
lapply(flag_colors, sum)
sapply(flag_colors, sum)
sapply(flag_colors, mean)
flag_shapes <- flags[, 19:23]
| The range() function returns the minimum and maximum of its first argument, which should be a numeric vector. Use
| lapply() to apply the range function to each column of flag_shapes. Don't worry about storing the result in a new
| variable. By now, we know that lapply() always returns a list.
> lapply(flag_shapes, range)
$circles
[1] 0 4
$crosses
[1] 0 2
$saltires
[1] 0 1
$quarters
[1] 0 4
$sunstars
[1] 0 50
> sapply(flag_shapes, range)
circles crosses saltires quarters sunstars
[1,] 0 0 0 0 0
[2,] 4 2 1 4 50
| When given a vector, the unique() function returns a vector with all duplicate elements removed. In other words,
| unique() returns a vector of only the 'unique' elements. To see how it works, try unique(c(3, 4, 5, 5, 5, 6, 6)).
unique_vals <- lapply(flags, unique)
unique_vals
sapply(unique_vals, length)
lapply(unique_vals, function(elem) elem[2])