Package 'messy' reference manual

Title:	Create Messy Data from Clean Data Frames
Description:	For the purposes of teaching, it is often desirable to show examples of working with messy data and how to clean it. This R package creates messy data from clean, tidy data frames so that students have a clean example to work towards.
Authors:	Nicola Rennie [aut, cre, cph]
Maintainer:	Nicola Rennie <[email protected]>
License:	CC BY 4.0
Version:	0.1.0.9000
Built:	2025-03-04 05:03:33 UTC
Source:	https://github.com/nrennie/messy

Add special characters to strings

Description

Add special characters to strings

Usage

add_special_chars(data, cols = NULL, messiness = 0.1)
add_special_chars(data, cols = NULL, messiness = 0.1)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all columns. Default `NULL`.
`messiness`	Percentage of values to change. Must be between 0 and 1. Default 0.1.

Value

a dataframe the same size as the input data.

Examples

add_special_chars(mtcars)
add_special_chars(mtcars)

Add whitespaces

Description

Randomly add whitespaces to the end of some values in all or a subset of columns.

Usage

add_whitespace(data, cols = NULL, messiness = 0.1)
add_whitespace(data, cols = NULL, messiness = 0.1)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all columns. Default `NULL`.
`messiness`	Percentage of values to change. Must be between 0 and 1. Default 0.1.

Value

a dataframe the same size as the input data.

Examples

add_whitespace(mtcars)
add_whitespace(mtcars)

Change case

Description

Randomly switch between title case and lowercase for character strings

Usage

change_case(data, cols = NULL, messiness = 0.1, case_type = "word")
change_case(data, cols = NULL, messiness = 0.1, case_type = "word")

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all columns. Default `NULL`.
`messiness`	Percentage of values to change. Must be between 0 and 1. Default 0.1.
`case_type`	Whether the case should change based on the `"word"` or `"letter"`.

Value

a dataframe the same size as the input data.

Examples

change_case(mtcars)
change_case(mtcars)

Duplicate rows and insert them into the dataframe in order or at random

Description

Duplicate rows and insert them into the dataframe in order or at random

Usage

duplicate_rows(data, messiness = 0.1, shuffle = FALSE)
duplicate_rows(data, messiness = 0.1, shuffle = FALSE)

Arguments

`data`	input dataframe
`messiness`	Percentage of rows to duplicate. Must be between 0 and 1. Default 0.1.
`shuffle`	Insert duplicated data underneath original data or insert randomly

Value

A dataframe with duplicated rows inserted

Author(s)

Philip Leftwich

Examples

duplicate_rows(mtcars, messiness = 0.1)
duplicate_rows(mtcars, messiness = 0.1)

Make missing

Description

Randomly make values missing in all data columns, or a subset of columns

Usage

make_missing(data, cols = NULL, messiness = 0.1, missing = NA)
make_missing(data, cols = NULL, messiness = 0.1, missing = NA)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all columns. Default `NULL`.
`messiness`	Percentage of values to change. Must be between 0 and 1. Default 0.1.
`missing`	A single value, vector, or list of what the missing values will be replaced with. If length is greater than 1, values will be replaced randomly. Default `NA`.

Value

a dataframe the same size as the input data.

Examples

make_missing(mtcars)
make_missing(mtcars)

Messy

Description

Make a data frame messier.

Usage

messy(data, messiness = 0.1, missing = NA, case_type = "word")
messy(data, messiness = 0.1, missing = NA, case_type = "word")

Arguments

`data`	input dataframe
`messiness`	Percentage of values to change per function. Must be between 0 and 1. Default 0.1.
`missing`	A single value, vector, or list of what the missing values will be replaced with. If length is greater than 1, values will be replaced randomly. Default `NA`.
`case_type`	Whether the case should change based on the `"word"` or `"letter"`.

Value

a dataframe the same size as the input data.

Examples

messy(mtcars)
messy(mtcars)

Make column names messy

Description

Adds special characters and randomly capitalises characters in the column names of a data frame.

Usage

messy_colnames(data, messiness = 0.2)
messy_colnames(data, messiness = 0.2)

Arguments

`data`	data.frame to alter column names
`messiness`	Percentage of values to change per function. Must be between 0 and 1. Default 0.1.

Value

data.frame with messy column names

Author(s)

Athanasia Monika Mowinckel

Examples

messy_colnames(mtcars)
messy_colnames(mtcars)

Make date(time) formats inconsistent

Description

Takes any date(times) column and transforms it into a character column, sampling from any number of random of valid character representations.

Usage

messy_datetime_formats(
  data,
  cols = NULL,
  formats = c("%Y/%m/%d %H:%M:%S", "%d/%m/%Y %H:%M:%S")
)

messy_date_formats(
  data,
  cols = NULL,
  formats = c("%Y/%m/%d", "%d/%m/%Y")
)
messy_datetime_formats(
  data,
  cols = NULL,
  formats = c("%Y/%m/%d %H:%M:%S", "%d/%m/%Y %H:%M:%S")
)

messy_date_formats(
  data,
  cols = NULL,
  formats = c("%Y/%m/%d", "%d/%m/%Y")
)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all POSIXt columns (for `messy_datetime_formats()`) or Date columns (for `messy_date_formats()`).
`formats`	A vector of any number of valid `strptime()` formats. Multiple formats will be sampled at random.

Value

a dataframe the same size as the input data.

Author(s)

Jack Davison

Examples

data <- data.frame(dates = rep(Sys.Date(), 10))
messy_date_formats(data)
data <- data.frame(dates = rep(Sys.Date(), 10))
messy_date_formats(data)

Change the timezone of datetime columns

Description

Takes any number of datetime columns and changes their timezones either totally at random, or from a user-provided list of timezones.

Usage

messy_datetime_tzones(data, cols = NULL, tzones = OlsonNames(), force = FALSE)
messy_datetime_tzones(data, cols = NULL, tzones = OlsonNames(), force = FALSE)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all POSIXt columns.
`tzones`	Valid time zones to sample from. By default samples from all `OlsonNames()`, but can be set to options more relevant to the data.
`force`	By default (`force = FALSE`) the datetimes will have their actual hour/minute values changed along with the timezones. If `force = TRUE`, which requires lubridate, the datetime values will remain the same and only the timezone will differ.

Value

a dataframe the same size as the input data.

Author(s)

Jack Davison

Examples

data <- data.frame(dates = rep(Sys.time(), 10))

data$dates
attr(data$dates, "tzone")

messy <- messy_datetime_tzones(data, tzones = "Poland")
messy$dates
attr(messy$dates, "tzone")

data <- data.frame(dates = rep(Sys.time(), 10))

data$dates
attr(data$dates, "tzone")

messy <- messy_datetime_tzones(data, tzones = "Poland")
messy$dates
attr(messy$dates, "tzone")

Splits date(time) column(s) into multiple columns

Description

These functions can split the "date" and "time" components of POSIXt columns and the "hour", "month", and "day" components of Date columns into multiple columns.

Usage

split_datetimes(data, cols = NULL, class = c("character", "date"))

split_dates(data, cols = NULL)
split_datetimes(data, cols = NULL, class = c("character", "date"))

split_dates(data, cols = NULL)

Arguments

`data`	input dataframe
`cols`	set of columns to apply transformation to. If `NULL` will apply to all POSIXt columns (for `split_datetimes()`) or Date columns (for `split_dates()`).
`class`	For `split_datetimes()`. The desired output of the separate "date" and "time" columns. `"character"` leaves the columns as character vectors. `"date"` will reformat the date as a "Date" and the time as a "POSIXct" object, with a dummy date appended to it. In `split_dates()`, all returned columns are integers.

Value

a dataframe

Author(s)

Jack Davison

Examples

# split datetimes
data <- data.frame(today = Sys.time())
split_datetimes(data)
# split dates
data <- data.frame(today = Sys.Date())
data
split_dates(data)

# split datetimes
data <- data.frame(today = Sys.time())
split_datetimes(data)
# split dates
data <- data.frame(today = Sys.Date())
data
split_dates(data)

Package 'messy'

Help Index

Add special characters to strings

Description

Usage

Arguments

Value

Examples

Add whitespaces

Description

Usage

Arguments

Value

Examples

Change case

Description

Usage

Arguments

Value

Examples

Duplicate rows and insert them into the dataframe in order or at random

Description

Usage

Arguments

Value

Author(s)

Examples

Make missing

Description

Usage

Arguments

Value

Examples

Messy

Description

Usage

Arguments

Value

Examples

Make column names messy

Description

Usage

Arguments

Value

Author(s)

Examples

Make date(time) formats inconsistent

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Change the timezone of datetime columns

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Splits date(time) column(s) into multiple columns

Description

Usage

Arguments

Value

Author(s)

See Also

Examples