Introduction to Data Analysis with R

Using Basic Data Analysis functions on the mtcars dataset

Let’s Start

# Copying mtcars data frame to our new data frame myCars
myCars <- mtcars

Which car has the highest horsepower (hp) ? 

#find and display the car with the highest horsepower index <- which.max(myCars$hp)
# Display the car name along with the rest of the row myCars[index,]
##                mpg cyl disp hp drat  wt  qsec vs am  gear carb ## Maserati Bora  15   8  301 335 3.54 3.57 14.6  0  1    5    8

Maserati Bora has the highest horsepower at 335

Exploring miles per gallon (mpg) of the cars

# find and display the car with the highest mpg
index<-which.max(myCars$mpg)
myCars[index,]
##                 mpg cyl disp hp drat    wt qsec vs am gear carb ## Toyota Corolla 33.9   4 71.1 65 4.22 1.835 19.9  1  1    4    1
# Creating a sorted dataframe, based on mpg
highMPGcars <- myCars[ order(-myCars$mpg),]
head(highMPGcars)
mpg cyl  disp  hp drat    wt  qsec vs am gear carb ## Toyota Corolla 33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
## Fiat 128       32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1 ## Honda Civic    30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
## Lotus Europa   30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2 ## Fiat X1-9      27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1 ## Porsche 914-2  26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2

Which car has the “best” combination of mpg and hp?

# Best car combination of mpg and hp, where mpg and hp must be given equal # weight
bestCombo<- myCars$hp / myCars$mpg
myCars[which.max(bestCombo),]
##                mpg cyl disp  hp drat   wt qsec vs am gear carb ## Maserati Bora  15   8  301 335 3.54 3.57 14.6  0  1    5    8

The Maserati Bora hp to mpg is ~ 22hp per gallon 

Manipulating Data Frames in R

Learn To Manipulate Data Frames Using The “mtcars” Dataset

Task 1: Create a new column to find Displacement per Cylinder 

Create a new variable (DisplacementPerCylinder), to calculate the total displacement per cylinder in cubic inches for each vehicle from the mtcars dataset.

# "str" allows you to display the internal structure of an R object
str(mtcars) 
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...
# As a backup we can copy the original data frame into a new one to work with
# That way if there is any issues we can go back

my_mtcars <- mtcars
# Calculate Displacement Per Cylinder by dividing the values (disp) and (cyl)

my_mtcars$DisplacementPerCylinder <- my_mtcars$disp / my_mtcars$cyl

# Report a summary of the variable
summary(my_mtcars$DisplacementPerCylinder)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   17.77   26.92   34.48   35.03   43.19   59.00

Task 2: Create your own data frame

Gather data from family & friends on the number of pets they have, the birth order they are in their family and the number of siblings. 

# Family/Friends ID
friendID  <- c(1, 2, 3, 4, 5)

# Number of pets they have
Pets <- c(4, 4, 2, 3, 1)

# The birth order they are in their family
Order <- c(1, 2, 2, 1, 1)

# Number of Siblings 
Siblings <- c(2, 2, 1, 2, 0)

# Binding the vectors into a data frame called myFriends
myFriends <- data.frame(friendID, + Pets, + Order, + Siblings)

# Command to report the structure of the data frame myFriends
str(myFriends)

## 'data.frame':    5 obs. of  4 variables:
##  $ friendID  : num  1 2 3 4 5
##  $ X.Pets    : num  4 4 2 3 1
##  $ X.Order   : num  1 2 2 1 1
##  $ X.Siblings: num  2 2 1 2 0
# Rename the columns to get rid of the "x." in front of the names
colnames(myFriends) <- c("FriendID", "Pets", "Order", "Siblings")
str(myFriends)
## 'data.frame':    5 obs. of  4 variables:
##  $ FriendID: num  1 2 3 4 5
##  $ Pets    : num  4 4 2 3 1
##  $ Order   : num  1 2 2 1 1
##  $ Siblings: num  2 2 1 2 0
# Listing the values of the vector friendID from the data frame myFriends
myFriends$FriendID 
## [1] 1 2 3 4 5
# Listing the values of the vector Pets from the data frame myFriends
myFriends$Pets
## [1] 4 4 2 3 1
# Listing the values of the vector Order from the data frame myFriends
myFriends$Order
## [1] 1 2 2 1 1
# Listing the values of the vector Siblings from the dataframe myFriends
myFriends$Siblings
# [1] 2 2 1 2 0
# Report a summary of the dataframe
summary(myFriends)
##     FriendID      Pets         Order        Siblings  
##  Min.   :1   Min.   :1.0   Min.   :1.0   Min.   :0.0  
##  1st Qu.:2   1st Qu.:2.0   1st Qu.:1.0   1st Qu.:1.0  
##  Median :3   Median :3.0   Median :1.0   Median :2.0  
##  Mean   :3   Mean   :2.8   Mean   :1.4   Mean   :1.4  
##  3rd Qu.:4   3rd Qu.:4.0   3rd Qu.:2.0   3rd Qu.:2.0  
##  Max.   :5   Max.   :4.0   Max.   :2.0   Max.   :2.0

Fak3 N3w5

Where do we get a large portion of our news and information from? Most likely a media source that makes money from advertising, donations, or some type of funding.

Other sources to think about are person to person or rumor. There is also personal experiences that many people can draw from. And what facilitates much of this consumption and/or interaction? The Internet, Television, and to a lesser extent print and radio broadcast.

The buzzword of the moment is of course “Fake News”. This is nothing new, in the old days we referred to propaganda, advertising, and fiction as “ Fake”. What’s considered fake to some people could be someone else’s reality. A lot of effort can put into supporting, persuading, or disproving what others believe. The current atmosphere is fertile for breeding distrust, especially in what we are presented with from multiple news and information sources. Can we believe anything anymore?

How do we differentiate between “Real News” (hopefully this refers to news based on truth),  and “Fake News” (possibly based on dis-information and dishonesty). What do you do, follow the crowd, follow your own beliefs, or roll the dice and hope for the best when we consume information?

A current trend is to remove, block, or label some sources of “Fake News”. One group decides another group’s information is incorrect because they believe they are right, and anything they judge to be wrong – is. This is truly an oversimplification of the dilemma, but my point is not to determine what is “Real News” and What is “Fake News”, but how can anyone navigate through all the noise? Usually when there is a problem, we look for solutions, and to produce solutions we often use tools. People used to use rational thought, but what you consider rational thought may not be what someone else considers rational thought. So let’s concentrate on tools for now.

I checked out a few Chrome extensions that are presented as capable of differentiating between what is “Real” and what is “Fake” or “click-bait”.  They also allow you to flag certain information as “Fake” or “Real”. So these flags are considered into the “Algorithm” that determines the “Authenticity” of the information presented. Links to “Fake” web sources are highlighted by some extensions. I tried out a few of these “tools” and wasn’t impressed. I’m not sure that I want to put too much faith into artificial intelligence and how many “flags” viewers assign to websites. This doesn’t mean I won’t do a little research when I come across something that doesn’t look or sound quite believable. A healthy dose of skepticism often works for me. I also rely on faith, not always a popular discourse, but sometimes faith is all some have to work with.

I shall abandon the search for an easy solution to filter out Real News from Fake News and leverage the power of the computer only to assist me in determining truth from fiction. We are not limited to a few popular (or formally popular) major news outlets, nor are we limited to only the most popular search engines. This is true for the moment, but will these options still be available in the future? Imagine if free speech was only allowed for the few and not for all. Imagine if one group of like minded individuals controlled all the information. Imagine if History was subjected to only one point of view, and if the future was subjected to restrictions instead of freedoms.

Imagine if we allowed a computer program to determine what was real and what was not.

Would it matter who wrote that program? Maybe what we currently believe to be real is actually fake, and what we perceive as fake is remarkably real. The problems show up when we create Fake News to facilitate an agenda. When propaganda is used as a means to inflict harm in order to build something up or tear something down. All this talk about Fake News sounds somewhat similar to malware, virus, and adware, yet an antivirus program is possibly more efficient in filtering out the bad stuff than some of the current Fake News filters. You would think a list of Fake News sites and a heuristic approach to identifying Fake News would be the most efficient way to weed out garbage, but even antivirus programs have their weaknesses.

How would such a program label religions, alternative medicine, or subjective reasoning?

If we include politics, we can see that what we are presented with often appears to lean one way or another. Not all news agencies appear to follow this objective, but most people can spot the entertainment value. Don’t forget “errors”. Even in the world of computers we deal with errors. Garbage in almost always results in garbage out. Just try not to process too much garbage in. The difference between humans and computers – at least presently, is not that you have the ability to think, but that you have the freedom to think. Don’t let that freedom slip away.

You may need it someday.

Chromebook Hooked

Too much “Distro-Hopping” looking for the perfect Linux setup can get to be a bore after a while. It sometimes feels like a great time wasting endeavor. I prefer Linux to Windows and OS X because I can usually get a lot of work done without needing a powerful or expensive laptop. I also don’t need all the free built in bloatware or the cost of purchasing all the actual programs I would use. Most Linux distributions have everything I need either rolled up into their base release, or available within their repositories. If your already invested in the Apple ecosystem, staying in that ecosystem usually makes financial sense, and your workflow doesn’t get flipped on it’s head too often. Windows is ingrained in a lot of “work” related use, especially if you work for a company that requires centrally managed devices via their IT department. You may also want a Windows OS available to support software that only works with that operating system. Also remember that there may be licensed applications, security concerns, and more factors that I really don’t need to go into.

I use a Windows OS, OS X, and Linux. I prefer Linux OS, Love the Mac Book Pro hardware, and usually load Linux onto a machine that comes with Microsoft. All these systems have their strengths and weaknesses. I’ve been thinking about simplifying down to only taking one laptop with me when I travel. Whether it’s local travel, or long distance, dragging multiple laptops is not fun. A powerful (expensive) laptop with multiple Operating systems partitioned, Virtual machines, or a low powered dedicated machine seems like a better option. Even these choices have their drawbacks. If I bought a new Mac Book Pro to take on the road, I’d be worried about losing it or having it stolen. I think I’d prefer to leave it at home and take cheap laptop on the road, but then I’m stuck with a less than stellar laptop, that might not have the battery power or speed to get much work done. I have relied on a Linux netbook as my preferred setup, but I’ve recently become interested in Chromebooks.

I can see the need for a powerful laptop computer as the preferred all in one mobile office.

Smartphones have put a big dent into that requirement. I have a nice little keyboard I can use with my iPhone, but sometimes I need a laptop with me as well. There will be times when I will need a Windows, Mac, or Linux machine.   There will be times when all I need to get some work done is my phone. I was thinking I should just stick with the Mac Book Pro, although it’s not the latest hardware. (still a good laptop despite its age) I thought about upgrading to a newer more powerful Mac Book Pro, but why if I can use a Chromebook which is a lot less expensive to serve the same function.

I don’t need a Chromebook with touch screen, or to double as a tablet. I prefer the laptop form. That really lowered the cost, and I was able to pick up a nice little Samsung 3 Chromebook with 4 gigs of ram and 32 Gb hard drive – on sale. It didn’t take long to get up and running. As someone who works with Linux more often than any other operating system, even I was impressed with the startup – almost instant on was even quicker than my Linux laptop. First thought of course was to replace the Chromebook OS with Linux, but after a few days of using the laptop for basic tasks, I decided to stay with the original OS for now. If the option to run Linux apps becomes available for this model, I would be even more reluctant to overwrite the system. I have a nice lite-weight Ideapad running Mint 19 (for now) if I need to bring a Linux laptop with me. The idea is to see how much I can do with the Chromebook alone for a few weeks – or longer.   If all you want a laptop for is basic content creation, then maybe the Chromebook is for you. To blog, it works great. The Samsung has a nice keyboard, feels rugged enough to survive the hazards of the road, and has a good screen. For the price, the Chromebooks make for a nice mobile office alternative. I don’t get the impression that I’m compromising on any level. I do believe that I saved a lot of money purchasing a Chromebook vs upgrading my main mobile laptop choices between a new Windows laptop or Apple device. I could still end up with a different opinion as time goes by. I don’t see myself abandoning Linux or Apple anytime soon.

If you need to work with software programs that require a lot of processing power, a specific operating system, then the Chromebooks might not work for your needs. If you require better than average battery life, budget friendly cost, have Internet access available, and usually work within a browser, than a Chromebook might be a nice alternative.  I find that I’m already rethinking my work habits to accommodate using the Chromebook.

Looking way down the road, I might not be too surprised to find myself purchasing another Chromebook. It’s too soon to actually say, but a lot will be determined by how one adapts to working with this type of system/ecosystem. It will be interesting to see if more people who decide to purchase a new computer rather than a tablet or major OS preinstalled  laptop choose to purchase a Chromebook. For blogging, this is a workable solution for travel. All my work gets backed up, and I don’t worry about losing any work I’ve done if the laptop fails while I’m writing – (everything gets backed up to the cloud).  I can still work offline, but I probably won’t need to very often. The Chromebook was a worthwhile investment that I don’t regret. The more I use it, the more I like it. I’m still learning what I can do with this Chromebook.

Waiting For Darkness

The nights will soon be growing longer as the dog days of summer draw to a close and we slowly ease closer to Fall. Cooler nights, rain, and wind will become the norm. We retreat to the inner sanctum with the radio (podcasts) playing in the background. A hot cup of coffee, a dim light on above the workbench, trusty low powered laptops displaying simple terminals which facilitate exploring the possibilities of learning some new tips and tricks. No GUIs to distract into a mindless point and click wandering.

Cold dark dreary weather is perfect for perusing through some deep technical papers, thick computer books (yes, actual books made of paper and ink), or “help” files often neglected but often associated with our favorite coding language, IDE, or debugging programs. Like Alchemists searching for the great enlightenment where all pseudo-code and real code become one……

Great conclusions to be sought from toiling away the hours exploring the possibilities of perfection from the command line cursor…..

A perfect setting for exploring the new 1.0 release of Julia. You can go back to the announcement “Why We Created Julia” on the julialang.org website and read a quick explanation on why the language was created. I hadn’t considered this language until I read the 1.0 release announcement posted on August 8a 2018. It wasn’t the possibility of the “speed of C” or the “Matlab like notation wandering. it was the possibility of using Julia as a “general programming language”. Sure we’ve got Python for that, but this is new to me, so it’s kind of cool (IMHO) to try something that is newer, yet somewhat familiar to learn about.

I won’t try to do a full description or review of the Julia language, because I haven’t used it enough yet. You can find a lot of useful info on their website, and download a version for your OS and start trying it out yourself. The bottom line is that you can use these dark dreary days and nights to enhance your old skills or learn some new ones. If reading boring technical manuals is your thing, then you’ll probably feel right at home reading every bit of documentation you can for whatever programming language you wish to work a. You can view the videos from “juliacon” on YouTube, there are some interesting ones from 2017 and 2018. There’s a large community, and a few “Julia Bloggers”, along with some very wandering. and specific tutorials, all available via https://julialang.org

So pour yourself a hot cup of coffee, get out of the sun, and enjoy the distant sound of thunder as the rain begins to fall.

..and the wind begins to howl……