Tag: Data Science

What is Data Science?

What is Data Science?

Recent Posts, Data Science
The data science field encompasses a wide scope, ranging from collecting data to data management, analysis, and visualization. Pulling all these areas together, a data scientist can gather information from obtained data and create visualizations to communicate results. Collect and organize data The collection and organization of data is arguably the most important factor within the data science field. You cannot do anything without having data to work with, so you must have a method of collecting data. This can be done independently/on your own, for example scraping the web or applications or even conducting a survey for respondents to take. You may also have access to data that has already been collected either by open source repositories, or sites such as Kaggle. You may get the d...
Renaming Columns with R

Renaming Columns with R

Recent Posts, Data Science Using R
Often data you’re working with has abstract column names, such as (x1, x2, x3…). Typically, the first step I take when renaming columns with r is opening my web browser.  For some reason no matter the amount of times doing this it’s just one of those things. (Hoping that writing about it will change that) The dataset cars is data from the 1920s on "Speed and Stopping Distances of Cars". There is only 2 columns shown below. colnames(datasets::cars) [1] "speed" "dist" If we wanted to rename the column "dist" to make it easier to know what the data is/means we can do so in a few different ways. Using dplyr: cars %>% rename("Stopping Distance (ft)" = dist) %>% colnames() [1] "speed" "Stopping Distance (ft)" cars %>% rename("Stopping Di

How To Select Multiple Columns Using Grep & R

Data Science Using R, Recent Posts
Why you need to be using Grep when programming with R. There's a reason that grep is included in most if not all programming language to this day 44 years later from creation. It's useful and simple to use. Below is an example of using grep to make selecting multiple columns in R simple and easy to read. The dataset below has the following column names. names(data) # Column Names [1] "fips" "state" "county" "metro_area" [5] "population" "med_hh_income" "poverty_rate" "population_lowaccess" [9] "lowincome_lowaccess" "no_vehicle_lowaccess" "s_grocery" "s_supermarket" [13] "s_convenience" "s_specialty" "s_farmers_market" "r_fastfood" [17] "r_full_servi...

Exploring Employee Attrition and Performance with R

Data Science Using R, Data Science
Based on IBM's fictional data set created by their data scientists. Introduction: Employee Attrition is when an employee leaves a company due to normal means, (loss of customers, retirement, and resignation), and there is not someone to fill the vacancy. Can a company identify employee’s that are likely to leave a company? A company with a high employee attrition rate is a good sign of underlying problems and can affect a company in a very negative way. One such way is the cost related to finding and training a replacement, as well as the possible strain it can put on other workers that in the meantime have to cover. Preprocessing: This dataset was produced by IBM and has just under 1500 observations of 31 different variables including attrition. 4 of the variables (EmployeeNumber, Over18

Introduction to Data Analysis with R

Data Science, Data Science Using R
Using Basic Data Analysis functions on the mtcars dataset Let's Start # Copying mtcars data frame to our new data frame myCarsmyCars <- mtcars Which car has the highest horsepower (hp) ?  #find and display the car with the highest horsepower index <- which.max(myCars$hp)# Display the car name along with the rest of the row myCars[index,] ##                mpg cyl disp hp drat  wt  qsec vs am  gear carb ## Maserati Bora  15   8  301 335 3.54 3.57 14.6  0  1    5    8 Maserati Bora has the highest horsepower at 335 Exploring miles per gallon (mpg) of the cars # find and display the car with the highest mpgind...