subsetting rows
Data Wrangling with dplyr and tidyr Cheat Sheet RStudio® is a trademark of RStudio, Inc. CC BY RStudio. info@rstudio.com. 844-448-1212. rstudio.com Syntax - Helpful conventions for wrangling dplyr::tbldf(iris) w Converts data to tbl class. Tbl’s are easier to examine than data frames. R displays only the data that fits onscreen. When the object is a dataframe, the function returns the data type of each column in the data frame, the number of observations and the number and variables. Combing Data with R Data from multiple files can be combined into one data frame using the base R functions list.files and lappy, with readr’s readcsv and dplyr’s bindrows. This cheat sheet assumes you have RStudio downloaded and set up on your computer. RStudio has the mtcars dataset pre-loaded. This will be the example dataset for much of the cheat sheet. Print off a CSV file from a R data frame Use the write.csv function. The basic syntax is as follows: write.csv(dataframe,'yourfile.csv'). R cheat sheet 1. Basics Commands objects List of objects in workspace ls Same. Data frames Accessing data data.frame(height, weight). R Programming Cheat Sheet advanced Created By: arianne Colton and Sean Chen environments Access any environment on the search list as.environment('package:base') Find the environment where a name is defined pryr::where('func1') Function environments There are 4 environments for functions. Enclosing environment (used for lexical scoping).
Filtering / extracting / subsetting data frames based on attribute value
Filtering for rows where at least one column is missing
Exploring data
Prevent garbage characters when using read.csv on data exported from SQL
Count occurrences of unique values
Build and install vignettes
String filtering when using a database and dbplyr
The previous mentions won’t always work when filtering against data in a database.Reference: https://github.com/tidyverse/dplyr/issues/3090
Solution from: https://stackoverflow.com/questions/38962585/pass-sql-functions-in-dplyr-filter-function-on-database/47198795#47198795
This works when I test against a Microsoft SQL database but not SQLite. Returns error:Error in stri_detect_regex(string, pattern, opts_regex = opts(pattern)) : object 'name' not found
Get class names of items in a dataframe/vector
Printing more than default rows/columns from a table
Search all objects, including functions, in global environment for string
Search key words or phrases in help pages, vignettes or task views
Timezone stuff
filter for records within last n years
works with dbplyr toothis is not a great example but I wanted it to work for at least a few more years.
Browse vignettes for a given package
Open function documentation from w/in RStudio
Use F2
Details about built-in data sets
Use Github to search for packages using a particular function
Put this into the GitHub search box to see how packages on CRAN use the llply() function from plyr
Conditional mutate
NOTE: should probably also look at recode here
extracting nested list into a tibble
SOURCE: https://cfss.uchicago.edu/webdata004_simplifying_lists.html
Interactively explore plots
Sort bar plot by counts - when using stat = “identity”
Sort bar plots by counts, within facets, when using stat - “identity”
SOURCE: https://www.programmingwithr.com/how-to-reorder-arrange-bars-with-in-each-facet-of-ggplot/
filter correlations at a cutoff value
I dont really like the formatting here - too hard to match columns - let’s find a better way.
There is corrr::correlations()
but not available on CRAN and there is also an open issue with the newest version of dplyr.
Using deprecrated reshape2
package because tidyr doesn’t handle matrixs. This is fine for now.
plot distribution of all variables
Just the numeric variables
SOURCE: (https://drsimonj.svbtle.com/quick-plot-of-all-variables)
RMarkdown: insert date document was knitted
add this to the header:
SOURCE: (https://stackoverflow.com/questions/23449319/yaml-current-date-in-rmarkdown)
Plot percentage of attributes that are NA for each outcome
Plot the pecentage of rows that has at least 1 NA attribute, by outcome
Plot the attributes (predictors) that are most likely to be missing
Plot the attributes (predictors) that are most likely to be missing, by outcome
Create a matrix that shows whether or not a particular combination of values is in the data
SOURCE: https://stackoverflow.com/a/37897416
Convert unix style epoch time to human readable time
Clean Data: remove columns where no rows contain a value
Print one plot for each data frame in a list column
There is probably a better way to meet the need.
My goals was to only show the values that are present in a given group.
ggplot2::facet_wrap()
shows all values present in any group in the plots forall groups.
Step 1: make the list
Basic R Syntax Cheat Sheet
Step 2: give the list members names
Step 3: Create plotting function
Step 4: Use your plotting function to create the plots
One plot for each attribute in a data frame - different scale for each attribute
Get detailed information for each column in a database table
Get just the columns and datatypes for each column in a database table
R Data Cleaning Cheat Sheet
Table of contents for rmarkdown document
R Data Frame Cheat Sheet Template
Add this to the document header.