Data Science

Introduction to Data Analysis with R

Using Basic Data Analysis functions on the mtcars dataset

Let’s Start

# Copying mtcars data frame to our new data frame myCars
myCars <- mtcars

Which car has the highest horsepower (hp) ? 

#find and display the car with the highest horsepower index <- which.max(myCars$hp)
# Display the car name along with the rest of the row myCars[index,]
##                mpg cyl disp hp drat  wt  qsec vs am  gear carb ## Maserati Bora  15   8  301 335 3.54 3.57 14.6  0  1    5    8

Maserati Bora has the highest horsepower at 335

Exploring miles per gallon (mpg) of the cars

# find and display the car with the highest mpg
index<-which.max(myCars$mpg)
myCars[index,]
##                 mpg cyl disp hp drat    wt qsec vs am gear carb ## Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.9  1  1    4    1
# Creating a sorted dataframe, based on mpg
highMPGcars <- myCars[ order(-myCars$mpg),]
head(highMPGcars)
mpg cyl  disp  hp drat    wt  qsec vs am gear carb ## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 ## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 ## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 ## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2

Which car has the “best” combination of mpg and hp?

# Best car combination of mpg and hp, where mpg and hp must be given equal # weight
bestCombo<- myCars$hp / myCars$mpg
myCars[which.max(bestCombo),]
##                mpg cyl disp  hp drat   wt qsec vs am gear carb ## Maserati Bora  15   8  301 335 3.54 3.57 14.6  0  1    5    8

The Maserati Bora hp to mpg is ~ 22hp per gallon 

Manipulating Data Frames in R

Learn To Manipulate Data Frames Using The “mtcars” Dataset

Task 1: Create a new column to find Displacement per Cylinder 

Create a new variable (DisplacementPerCylinder), to calculate the total displacement per cylinder in cubic inches for each vehicle from the mtcars dataset.

# "str" allows you to display the internal structure of an R object
str(mtcars) 
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
# As a backup we can copy the original data frame into a new one to work with
# That way if there is any issues we can go back

my_mtcars <- mtcars
# Calculate Displacement Per Cylinder by dividing the values (disp) and (cyl)

my_mtcars$DisplacementPerCylinder <- my_mtcars$disp / my_mtcars$cyl

# Report a summary of the variable
summary(my_mtcars$DisplacementPerCylinder)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.77   26.92   34.48   35.03   43.19   59.00

Task 2: Create your own data frame

Gather data from family & friends on the number of pets they have, the birth order they are in their family and the number of siblings. 

# Family/Friends ID
friendID  <- c(1, 2, 3, 4, 5)

# Number of pets they have
Pets <- c(4, 4, 2, 3, 1)

# The birth order they are in their family
Order <- c(1, 2, 2, 1, 1)

# Number of Siblings 
Siblings <- c(2, 2, 1, 2, 0)

# Binding the vectors into a data frame called myFriends
myFriends <- data.frame(friendID, + Pets, + Order, + Siblings)

# Command to report the structure of the data frame myFriends
str(myFriends)

## 'data.frame':    5 obs. of  4 variables:
##  $ friendID  : num  1 2 3 4 5
##  $ X.Pets    : num  4 4 2 3 1
##  $ X.Order   : num  1 2 2 1 1
##  $ X.Siblings: num  2 2 1 2 0
# Rename the columns to get rid of the "x." in front of the names
colnames(myFriends) <- c("FriendID", "Pets", "Order", "Siblings")
str(myFriends)
## 'data.frame':    5 obs. of  4 variables:
##  $ FriendID: num  1 2 3 4 5
##  $ Pets    : num  4 4 2 3 1
##  $ Order   : num  1 2 2 1 1
##  $ Siblings: num  2 2 1 2 0
# Listing the values of the vector friendID from the data frame myFriends
myFriends$FriendID 
## [1] 1 2 3 4 5
# Listing the values of the vector Pets from the data frame myFriends
myFriends$Pets
## [1] 4 4 2 3 1
# Listing the values of the vector Order from the data frame myFriends
myFriends$Order
## [1] 1 2 2 1 1
# Listing the values of the vector Siblings from the dataframe myFriends
myFriends$Siblings
# [1] 2 2 1 2 0
# Report a summary of the dataframe
summary(myFriends)
##     FriendID      Pets         Order        Siblings  
##  Min.   :1   Min.   :1.0   Min.   :1.0   Min.   :0.0  
##  1st Qu.:2   1st Qu.:2.0   1st Qu.:1.0   1st Qu.:1.0  
##  Median :3   Median :3.0   Median :1.0   Median :2.0  
##  Mean   :3   Mean   :2.8   Mean   :1.4   Mean   :1.4  
##  3rd Qu.:4   3rd Qu.:4.0   3rd Qu.:2.0   3rd Qu.:2.0  
##  Max.   :5   Max.   :4.0   Max.   :2.0   Max.   :2.0