Title: | Create Messy Data from Clean Data Frames |
---|---|
Description: | For the purposes of teaching, it is often desirable to show examples of working with messy data and how to clean it. This R package creates messy data from clean, tidy data frames so that students have a clean example to work towards. |
Authors: | Nicola Rennie [aut, cre, cph] |
Maintainer: | Nicola Rennie <[email protected]> |
License: | CC BY 4.0 |
Version: | 0.1.0.9000 |
Built: | 2025-01-03 04:59:05 UTC |
Source: | https://github.com/nrennie/messy |
Add special characters to strings
add_special_chars(data, cols = NULL, messiness = 0.1)
add_special_chars(data, cols = NULL, messiness = 0.1)
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
messiness |
Percentage of values to change. Must be between 0 and 1. Default 0.1. |
a dataframe the same size as the input data.
add_special_chars(mtcars)
add_special_chars(mtcars)
Randomly add whitespaces to the end of some values in all or a subset of columns.
add_whitespace(data, cols = NULL, messiness = 0.1)
add_whitespace(data, cols = NULL, messiness = 0.1)
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
messiness |
Percentage of values to change. Must be between 0 and 1. Default 0.1. |
a dataframe the same size as the input data.
add_whitespace(mtcars)
add_whitespace(mtcars)
Randomly switch between title case and lowercase for character strings
change_case(data, cols = NULL, messiness = 0.1, case_type = "word")
change_case(data, cols = NULL, messiness = 0.1, case_type = "word")
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
messiness |
Percentage of values to change. Must be between 0 and 1. Default 0.1. |
case_type |
Whether the case should change based on
the |
a dataframe the same size as the input data.
change_case(mtcars)
change_case(mtcars)
Duplicate rows and insert them into the dataframe in order or at random
duplicate_rows(data, messiness = 0.1, shuffle = FALSE)
duplicate_rows(data, messiness = 0.1, shuffle = FALSE)
data |
input dataframe |
messiness |
Percentage of rows to duplicate. Must be between 0 and 1. Default 0.1. |
shuffle |
Insert duplicated data underneath original data or insert randomly |
A dataframe with duplicated rows inserted
Philip Leftwich
duplicate_rows(mtcars, messiness = 0.1)
duplicate_rows(mtcars, messiness = 0.1)
Randomly make values missing in all data columns, or a subset of columns
make_missing(data, cols = NULL, messiness = 0.1, missing = NA)
make_missing(data, cols = NULL, messiness = 0.1, missing = NA)
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
messiness |
Percentage of values to change. Must be between 0 and 1. Default 0.1. |
missing |
A single value, vector, or list of what the
missing values will be replaced with. If length is greater
than 1, values will be replaced randomly.
Default |
a dataframe the same size as the input data.
make_missing(mtcars)
make_missing(mtcars)
Make a data frame messier.
messy(data, messiness = 0.1, missing = NA, case_type = "word")
messy(data, messiness = 0.1, missing = NA, case_type = "word")
data |
input dataframe |
messiness |
Percentage of values to change per function. Must be between 0 and 1. Default 0.1. |
missing |
A single value, vector, or list of what the
missing values will be replaced with. If length is greater
than 1, values will be replaced randomly.
Default |
case_type |
Whether the case should change based on
the |
a dataframe the same size as the input data.
messy(mtcars)
messy(mtcars)
Adds special characters and randomly capitalises characters in the column names of a data frame.
messy_colnames(data, messiness = 0.2)
messy_colnames(data, messiness = 0.2)
data |
data.frame to alter column names |
messiness |
Percentage of values to change per function. Must be between 0 and 1. Default 0.1. |
data.frame with messy column names
Athanasia Monika Mowinckel
messy_colnames(mtcars)
messy_colnames(mtcars)
Takes any date(times) column and transforms it into a character column, sampling from any number of random of valid character representations.
messy_datetime_formats( data, cols = NULL, formats = c("%Y/%m/%d %H:%M:%S", "%d/%m/%Y %H:%M:%S") ) messy_date_formats( data, cols = NULL, formats = c("%Y/%m/%d", "%d/%m/%Y") )
messy_datetime_formats( data, cols = NULL, formats = c("%Y/%m/%d %H:%M:%S", "%d/%m/%Y %H:%M:%S") ) messy_date_formats( data, cols = NULL, formats = c("%Y/%m/%d", "%d/%m/%Y") )
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
formats |
A vector of any number of valid |
a dataframe the same size as the input data.
Jack Davison
Other Messy date(time) functions:
messy_datetime_tzones()
,
split_datetimes()
data <- data.frame(dates = rep(Sys.Date(), 10)) messy_date_formats(data)
data <- data.frame(dates = rep(Sys.Date(), 10)) messy_date_formats(data)
Takes any number of datetime columns and changes their timezones either totally at random, or from a user-provided list of timezones.
messy_datetime_tzones(data, cols = NULL, tzones = OlsonNames(), force = FALSE)
messy_datetime_tzones(data, cols = NULL, tzones = OlsonNames(), force = FALSE)
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
tzones |
Valid time zones to sample from. By default samples from all
|
force |
By default ( |
a dataframe the same size as the input data.
Jack Davison
Other Messy date(time) functions:
messy_datetime_formats()
,
split_datetimes()
data <- data.frame(dates = rep(Sys.time(), 10)) data$dates attr(data$dates, "tzone") messy <- messy_datetime_tzones(data, tzones = "Poland") messy$dates attr(messy$dates, "tzone")
data <- data.frame(dates = rep(Sys.time(), 10)) data$dates attr(data$dates, "tzone") messy <- messy_datetime_tzones(data, tzones = "Poland") messy$dates attr(messy$dates, "tzone")
These functions can split the "date" and "time" components of POSIXt columns and the "hour", "month", and "day" components of Date columns into multiple columns.
split_datetimes(data, cols = NULL, class = c("character", "date")) split_dates(data, cols = NULL)
split_datetimes(data, cols = NULL, class = c("character", "date")) split_dates(data, cols = NULL)
data |
input dataframe |
cols |
set of columns to apply transformation to. If |
class |
For |
a dataframe
Jack Davison
Other Messy date(time) functions:
messy_datetime_formats()
,
messy_datetime_tzones()
# split datetimes data <- data.frame(today = Sys.time()) split_datetimes(data) # split dates data <- data.frame(today = Sys.Date()) data split_dates(data)
# split datetimes data <- data.frame(today = Sys.time()) split_datetimes(data) # split dates data <- data.frame(today = Sys.Date()) data split_dates(data)