Terry

Tech Enthusiast with a passion for teaching others new technology.
How to rename columns in r

Rename Columns | R

Often data you’re working with has abstract column names, such as (x1, x2, x3…). Typically, the first step I take when renaming columns with r is opening my web browser. 

For some reason no matter the amount of times doing this it’s just one of those things. (Hoping that writing about it will change that)

The dataset cars is data from the 1920s on “Speed and Stopping Distances of Cars”. There is only 2 columns shown below.

colnames(datasets::cars)
[1] "speed" "dist" 

If we wanted to rename the column “dist” to make it easier to know what the data is/means we can do so in a few different ways.

Using dplyr:

cars %>% 
  rename("Stopping Distance (ft)" = dist) %>% 
  colnames()

[1] "speed"             "Stopping Distance (ft)"
cars %>%
  rename("Stopping Distance (ft)" = dist, "Speed (mph)" = speed) %>%
  colnames()

[1] "Speed (mph)"            "Stopping Distance (ft)"

Using Base r:

colnames(cars)[2] <-"Stopping Distance (ft)"

[1] "speed"                  "Stopping Distance (ft)"

colnames(cars)[1:2] <-c("Speed (mph)","Stopping Distance (ft)")

[1] "Speed (mph)"            "Stopping Distance (ft)"

Using GREP:

colnames(cars)[grep("dist", colnames(cars))] <-"Stopping Distance (ft)"

"speed"                  "Stopping Distance (ft)"

How To Select Multiple Columns Using Grep & R

Why you need to be using Grep when programming with R.

There’s a reason that grep is included in most if not all programming language to this day 44 years later from creation. It’s useful and simple to use. Below is an example of using grep to make selecting multiple columns in R simple and easy to read.

The dataset below has the following column names.

names(data) # Column Names
 [1] "fips"                 "state"                "county"               "metro_area"          
 [5] "population"           "med_hh_income"        "poverty_rate"         "population_lowaccess"
 [9] "lowincome_lowaccess"  "no_vehicle_lowaccess" "s_grocery"            "s_supermarket"       
[13] "s_convenience"        "s_specialty"          "s_farmers_market"     "r_fastfood"          
[17] "r_full_service"      

How can we select only the columns we need to work with?

  • metro_area
  • med_hh_income
  • poverty_rate
  • population_lowaccess
  • lowincome_lowaccess
  • no_vehicle_lowaccess
  • s_grocery
  • s_supermarket
  • s_convenience
  • s_specialty
  • s_farmers_market
  • r_fastfood
  • r_full_service

We can tell R exactly by listing each column as below

data[c("metro_area","med_hh_income", "poverty_rate", "population_lowaccess", "lowincome_lowaccess", "no_vehicle_lowaccess","s_grocery","s_supermarket","s_convenience","s_specialty","s_farmers_market", "r_fastfood", "r_full_service")]

OR

We can tell R where each column we want is.

data[c(4,6,7:17)]

First, writing out each individual column is time consuming and chances are you’re going to make a typo (I did when writing it). Second option we have to first figure out where the columns are located to then tell R. Well looking at the columns we are trying to access vs the others theirs a specific difference. All these columns have a “_” located in there name, and we can use regular expressions (grep) to select these.

data[grep("_", names(data))])

FYI… to get the column locations you can actually use…

grep("_", names(data))
[1]  4  6  7  8  9 10 11 12 13 14 15 16 17

You will rarely have a regular expression as easy at “_” to select multiple columns, a very useful resource to learn and practice is https://regexr.com

Data was obtained from https://www.ers.usda.gov/data-products/food-access-research-atlas/download-the-data/

Creating Excel Workbooks with multiple sheets in R

Create Excel Workbooks

Generally, when doing anything in R I typically work with .csv files, their fast and straightforward to use. However, I find times, where I need to create a bunch of them to output and having to go and open each one individually, can be a pain for anyone. In this case, it’s much better to create a workbook where each of the .csv files you would have created will now be a separate sheet.



Below is a simple script I use frequently that gets the job done. Also included is the initial process of creating dummy data to outline the process.

EXAMPLE CODE:

Libraries used

library(tidyverse)
library(openxlsx)

Creating example files to work with

products <- c("Monitor", "Laptop", "Keyboards", "Mice")
Stock <- c(20,10,25,50)
Computer_Supplies <- cbind(products,Stock)
products <- c("Packs of Paper", "Staples")
Stock <- c(100,35)
Office_Supplies <- cbind(products,Stock)
# Write the files to our directory
write.csv(Computer_Supplies, "Data/ComputerSupplies.csv", row.names = FALSE)
write.csv(Office_Supplies, "Data/OfficeSupplies.csv", row.names = FALSE)

Point to directory your files are located in (.csv here) and read each in as a list

# Get the file name read in as a column
read_filename <- function(fname) {
  read_csv(fname, col_names = TRUE) %>%
    mutate(filename = fname)
}
tbl <-
  list.files(path = "Data/",
             pattern ="*.csv",
             full.names = TRUE) %>%
  map_df(~read_filename(.))

Removing path from the file names

*Note: Max length of a Workbook’s name is 31 characters

tbl$filename <- gsub("Data/", "", tbl$filename)
tbl$filename <- gsub(".csv", "", tbl$filename)

Split the “tbl” object into individual lists

 mylist <- tbl %>% split(.$filename)
names(mylist)
## [1] "/ComputerSupplies" "/OfficeSupplies"

Creating an Excel workbook and having each CSV file be a separate sheet

wb <- createWorkbook()
lapply(seq_along(mylist), function(i){
  addWorksheet(wb=wb, sheetName = names(mylist[i]))
  writeData(wb, sheet = i, mylist[[i]][-length(mylist[[i]])])
})
#Save Workbook
saveWorkbook(wb, "test.xlsx", overwrite = TRUE

Reading in sheets from an Excel file

(The one we just created)

 df_ComputerSupplies <- read.xlsx("test.xlsx", sheet = 1)

Loading and adding a new sheet to an already existing Excel workbook

wb <- loadWorkbook("test.xlsx")
names(wb)
## [1] "/ComputerSupplies" "/OfficeSupplies"
addWorksheet(wb, "News Sheet Name")
names(wb)
## [1] "/ComputerSupplies" "/OfficeSupplies" "News Sheet Name"

Sort

The command sort is used to sort files line by line.  Lines starting with a number go first. Lines that come next in order go alphabetical with uppercase letters appearing before lowercase ones.

Use cat to create “testsort” for the example.

~/Test>cat testsort
A line 1
a line 2
8 line 3
line 4
5 line 5
~/Test>sort testsort
5 line 5
8 line 3
A line 1
a line 2
line 4

R sorts by using a random hash of keys

~/Test>sort -R testsort
a line 2
5 line 5
A line 1
8 line 3
line 4
~/Test>sort -R testsort
5 line 5
A line 1
a line 2
line 4
8 line 3

Egrep & Fgrep

EGREP:

            The Command egrep is the same as running grep –E. egrep is used to search for a pattern using extended regular expressions.

Terry@f:~/FinderDing>cat testsort
A line 1	
a line 2	
8 line 3	
line 4
5 line 5	
Terry@f:~/FinderDing>egrep '^[a-zA-Z]' testsort	
A line 1
a line 2
line 4

*Show lines that start with a letter from alphabet

Terry@f:~/FinderDing>cat html
<!DOCTYPE html>
<html>	
<body>
<h1>My First Heading</h1>
<p>My first paragraph.</p>
</body>
</html>
Terry@f:~/FinderDing>egrep "My|first" html
<h1>My First Heading</h1>
<p>My first paragraph.</p>

`*Find lines with pattern My first from html file

FGREP:

The command fgrep is the same as running grep –F. The Command searches for fixed character strings in a file, which means regular expressions can’t be used.

Terry@f:~/FinderDing>fgrep "My" html
<h1>My First Heading</h1>
<p>My first paragraph.</p>l
%d bloggers like this: