One-Hot-Encode unordered factor columns of a data.table mltools. From ben519's "mltools" package.
one_hot( dt, cols = "auto", sparsifyNAs = FALSE, naCols = FALSE, dropCols = TRUE, dropUnusedLevels = FALSE )
dt | A data.table |
---|---|
cols | Which column(s) should be one-hot-encoded? DEFAULT = "auto" encodes all unordered factor columns. |
sparsifyNAs | Should NAs be converted to 0s? |
naCols | Should columns be generated to indicate the present of NAs? Will only apply to factor columns with at least one NA |
dropCols | Should the resulting data.table exclude the original columns which are one-hot-encoded? |
dropUnusedLevels | Should columns of all 0s be generated for unused factor levels? |
One-hot-encoding converts an unordered categorical vector (i.e. a factor) to multiple binarized vectors where each binary vector of 1s and 0s indicates the presence of a class (i.e. level) of the of the original vector.
library(data.table) dt <- data.table( ID = 1:4, color = factor(c("red", NA, "blue", "blue"), levels=c("blue", "green", "red")) ) one_hot(dt)#> ID color_blue color_green color_red #> 1: 1 0 0 1 #> 2: 2 NA NA NA #> 3: 3 1 0 0 #> 4: 4 1 0 0one_hot(dt, sparsifyNAs=TRUE)#> ID color_blue color_green color_red #> 1: 1 0 0 1 #> 2: 2 0 0 0 #> 3: 3 1 0 0 #> 4: 4 1 0 0one_hot(dt, naCols=TRUE)#> ID color_NA color_blue color_green color_red #> 1: 1 0 0 0 1 #> 2: 2 1 NA NA NA #> 3: 3 0 1 0 0 #> 4: 4 0 1 0 0one_hot(dt, dropCols=FALSE)#> ID color color_blue color_green color_red #> 1: 1 red 0 0 1 #> 2: 2 <NA> NA NA NA #> 3: 3 blue 1 0 0 #> 4: 4 blue 1 0 0one_hot(dt, dropUnusedLevels=TRUE)#> ID color_blue color_red #> 1: 1 0 1 #> 2: 2 NA NA #> 3: 3 1 0 #> 4: 4 1 0