class: center, middle, inverse, title-slide .title[ # Introduction to Data Science ] .subtitle[ ## Week 2: Toolkit ] .author[ ### Ugur Aytun ] .institute[ ### METU, Department of Economics | ECON 413 ] --- # Before starting -- - In this week We learn the basics of R. -- - Before starting, I should recommend to use copilot. It is a great tool to learn R. It completes your code and gives you hints about the functions and packages. You can find information in the syllabus. -- - Second thing, creating a R project for this course makes your life easier. Do not forget to specify directory for your project. -- - Firstly open a folder titled "ECON413". Then create a project for this directory. File -> New Project -> Existing Directory -> Browse -> Select the directory -> Create Project -- - Organize the folders in your project. Create folders such as "R files", "Data", "Raw data", "Results". When beginning to new project, I always create these folders. --- # Basics -arithmetic operations -- R is great calculator. You can use it for basic arithmetic operations. ``` r # 1. Addition 2+3 ``` ``` ## [1] 5 ``` ``` r # 2. Subtraction 5-3 ``` ``` ## [1] 2 ``` ``` r # 3. Division 10/2 ``` ``` ## [1] 5 ``` ``` r # 4. Multiplication 3*4 ``` ``` ## [1] 12 ``` --- # Basics -arithmetic operations (cont.) -- R is great calculator. You can use it for basic arithmetic operations. ``` r # 5. Exponentiation 2^3 ``` ``` ## [1] 8 ``` ``` r # 6. Order of operations (2+3)*4 ``` ``` ## [1] 20 ``` ``` r # 7. Modulo 100 %/% 60 ## How many whole hours in 100 minutes? ``` ``` ## [1] 1 ``` --- # Basics -logical operations -- R is great for logical operations. ``` r # 1. Greater than 2>3 ``` ``` ## [1] FALSE ``` ``` r # 1 > 2 & 1 > 0.5 ## The "&" stands for "and" ``` ``` ## [1] FALSE ``` ``` r # 1 > 2 | 1 > 0.5 ## The "|" stands for "or" ``` ``` ## [1] TRUE ``` --- # Basics -logical operations (cont.) -- R is great for logical operations. ``` r # 2. Negation (!) -this helps us to filter the data. !(2>3) ``` ``` ## [1] TRUE ``` ``` r # 3. Value matching (==) "apple" == "orange" ``` ``` ## [1] FALSE ``` ``` r # %in% is a great operator to check if a value is in a vector. 4 %in% c(1,2,3,4,5) ``` ``` ## [1] TRUE ``` ``` r # "Not-in" operator `%ni%` = Negate(`%in%`) 4 %ni% 5:10 ``` ``` ## [1] TRUE ``` --- # Basics -logical operations (cont.) -- R is great for logical operations. ``` r # 4. Floating-point numbers 0.1 + 0.2 == 0.3 ``` ``` ## [1] FALSE ``` ``` r # all.equal() is a great function to compare floating-point numbers. all.equal(0.1 + 0.2, 0.3) ``` ``` ## [1] TRUE ``` --- # Basics -assignment -- Use <- or = to assign values to variables. ``` r # 1. Assign a value to a variable x <- 5 # 2. Print the value of x x ``` ``` ## [1] 5 ``` ``` r # We can also use -> to assign values to variables. But we do not recommend it. 4 -> y ``` -- - We can also use = but it it has special role in functions. So, it is better to use <- for assignment. --- # Basics -help -- Use ? to get help about a function. ``` r # 1. Get help about a linear model function ?lm ``` ``` ## starting httpd help server ... done ``` ``` r # 2. typing "help(lm)" also gives the same result. help(lm) ``` --- # Basics -vignettes -- Use vignette() to get help about a package. ``` r # 1. Get help about a ggplo2 package vignette("ggplot2") ``` --- # Objects -- - R is an object-oriented language. -- - Vectors, matrices, data frames (like data.table), lists, functions, etc. are objects. -- - Each objects has rules and properties. For example, a vector can only contain one type of data. A matrix can contain multiple types of data. ``` r # Create a data frame called "d" d <- data.frame(x = 1:2, y = 3:4) d ``` ``` ## x y ## 1 1 3 ## 2 2 4 ``` ``` r # Convert it to (i.e. create) a matrix call "m" m <- as.matrix(d) m ``` ``` ## x y ## [1,] 1 3 ## [2,] 2 4 ``` --- # Object class, type and structure. ``` r d <- data.frame(x = 1:2, y = 3:4) class(d) # class of d ``` ``` ## [1] "data.frame" ``` ``` r typeof(d) # type of d ``` ``` ## [1] "list" ``` ``` r str(d) # structure of d ``` ``` ## 'data.frame': 2 obs. of 2 variables: ## $ x: int 1 2 ## $ y: int 3 4 ``` --- # Global environment ``` r # Create a data frame called "d" d <- data.frame(x = 1:2, y = 3:4) # Let's try regress y on x lm(y ~ x) # Error in eval(predvars, data, env): object 'y' not found ``` ``` ## ## Call: ## lm(formula = y ~ x) ## ## Coefficients: ## (Intercept) x ## 4 NA ``` ``` r lm(y ~ x, data = d) ``` ``` ## ## Call: ## lm(formula = y ~ x, data = d) ## ## Coefficients: ## (Intercept) x ## 2 1 ``` --- # Reserved words - if - else - repeat - while - function - for - in - next - break - TRUE - FALSE - NULL - Inf - NaN - NA --- # Indexing -- - We can use "[" to index a vector, matrix, or data frame. ``` r a <- c(1:10) a[4] # forth element of a ``` ``` ## [1] 4 ``` ``` r my_list = list(a = "hello", b = c(1,2,3), c = data.frame(x = 1:5, y = 6:10)) my_list[[1]] ``` ``` ## [1] "hello" ``` ``` r my_list[[2]][3] ``` ``` ## [1] 3 ``` --- # Indexing -- - We can also use "$" to index a list or data frame. ``` r my_list = list(a = "hello", b = c(1,2,3), c = data.frame(x = 1:5, y = 6:10)) my_list$a ``` ``` ## [1] "hello" ``` ``` r my_list$b ``` ``` ## [1] 1 2 3 ``` ``` r my_list$c ``` ``` ## x y ## 1 1 6 ## 2 2 7 ## 3 3 8 ## 4 4 9 ## 5 5 10 ``` ``` r my_list$b[3] ``` ``` ## [1] 3 ``` --- # Indexing -- - We can also use "$" to index a list or data frame. ``` r my_list = list(a = "hello", b = c(1,2,3), c = data.frame(x = 1:5, y = 6:10)) my_list$b[3] ``` ``` ## [1] 3 ``` ``` r my_list$c$x ``` ``` ## [1] 1 2 3 4 5 ``` --- # Indexing -- - We can also use "$" to index a list or data frame. ``` r d <- data.frame(x = 1:2, y = 3:4) lm(d$y ~ d$x) ``` ``` ## ## Call: ## lm(formula = d$y ~ d$x) ## ## Coefficients: ## (Intercept) d$x ## 2 1 ``` --- # Functions -- - Functions are the building blocks of R. Examples of functions are mean(), sd(), lm(), etc. ``` r example_function = function (a, b) { output = a + b return (output) } example_function(1,2) ``` ``` ## [1] 3 ``` --- # Libraries -- - Functions are the building blocks of R. Examples of functions are mean(), sd(), lm(), etc. ``` r library(data.table) ``` ``` ## Warning: package 'data.table' was built under R version 4.3.3 ```