R Data Frame Cheat Sheet



subsetting rows

Data Wrangling with dplyr and tidyr Cheat Sheet RStudio® is a trademark of RStudio, Inc. CC BY RStudio. info@rstudio.com. 844-448-1212. rstudio.com Syntax - Helpful conventions for wrangling dplyr::tbldf(iris) w Converts data to tbl class. Tbl’s are easier to examine than data frames. R displays only the data that fits onscreen. When the object is a dataframe, the function returns the data type of each column in the data frame, the number of observations and the number and variables. Combing Data with R Data from multiple files can be combined into one data frame using the base R functions list.files and lappy, with readr’s readcsv and dplyr’s bindrows. This cheat sheet assumes you have RStudio downloaded and set up on your computer. RStudio has the mtcars dataset pre-loaded. This will be the example dataset for much of the cheat sheet. Print off a CSV file from a R data frame Use the write.csv function. The basic syntax is as follows: write.csv(dataframe,'yourfile.csv'). R cheat sheet 1. Basics Commands objects List of objects in workspace ls Same. Data frames Accessing data data.frame(height, weight). R Programming Cheat Sheet advanced Created By: arianne Colton and Sean Chen environments Access any environment on the search list as.environment('package:base') Find the environment where a name is defined pryr::where('func1') Function environments There are 4 environments for functions. Enclosing environment (used for lexical scoping).

Filtering / extracting / subsetting data frames based on attribute value

Filtering for rows where at least one column is missing

Exploring data

Prevent garbage characters when using read.csv on data exported from SQL

Count occurrences of unique values

Build and install vignettes

String filtering when using a database and dbplyr

The previous mentions won’t always work when filtering against data in a database.Reference: https://github.com/tidyverse/dplyr/issues/3090

Solution from: https://stackoverflow.com/questions/38962585/pass-sql-functions-in-dplyr-filter-function-on-database/47198795#47198795

This works when I test against a Microsoft SQL database but not SQLite. Returns error:Error in stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : object 'name' not found

Get class names of items in a dataframe/vector

Printing more than default rows/columns from a table

Search all objects, including functions, in global environment for string

Search key words or phrases in help pages, vignettes or task views

Timezone stuff

filter for records within last n years

works with dbplyr toothis is not a great example but I wanted it to work for at least a few more years.

Browse vignettes for a given package

Open function documentation from w/in RStudio

Use F2

Details about built-in data sets

Use Github to search for packages using a particular function

Put this into the GitHub search box to see how packages on CRAN use the llply() function from plyr

Conditional mutate

NOTE: should probably also look at recode here

extracting nested list into a tibble

SOURCE: https://cfss.uchicago.edu/webdata004_simplifying_lists.html

Interactively explore plots

Sort bar plot by counts - when using stat = “identity”

R Data Frame Cheat Sheet

Sort bar plots by counts, within facets, when using stat - “identity”

SOURCE: https://www.programmingwithr.com/how-to-reorder-arrange-bars-with-in-each-facet-of-ggplot/

filter correlations at a cutoff value

I dont really like the formatting here - too hard to match columns - let’s find a better way.

There is corrr::correlations() but not available on CRAN and there is also an open issue with the newest version of dplyr.

Using deprecrated reshape2 package because tidyr doesn’t handle matrixs. This is fine for now.

plot distribution of all variables

Just the numeric variables

SOURCE: (https://drsimonj.svbtle.com/quick-plot-of-all-variables)

RMarkdown: insert date document was knitted

add this to the header:

SOURCE: (https://stackoverflow.com/questions/23449319/yaml-current-date-in-rmarkdown)

Plot percentage of attributes that are NA for each outcome

Plot the pecentage of rows that has at least 1 NA attribute, by outcome

Plot the attributes (predictors) that are most likely to be missing

Plot the attributes (predictors) that are most likely to be missing, by outcome

Create a matrix that shows whether or not a particular combination of values is in the data

SOURCE: https://stackoverflow.com/a/37897416

Convert unix style epoch time to human readable time

Clean Data: remove columns where no rows contain a value

Print one plot for each data frame in a list column

There is probably a better way to meet the need.

My goals was to only show the values that are present in a given group.

ggplot2::facet_wrap() shows all values present in any group in the plots forall groups.

Step 1: make the list

Basic R Syntax Cheat Sheet

Step 2: give the list members names

Step 3: Create plotting function

Step 4: Use your plotting function to create the plots

One plot for each attribute in a data frame - different scale for each attribute

Get detailed information for each column in a database table

Get just the columns and datatypes for each column in a database table

R Data Cleaning Cheat Sheet

Table of contents for rmarkdown document

R Data Frame Cheat Sheet Template

Add this to the document header.