Convert a continuous variable into groups
egen.Rmd
The egen
function is used to convert a continuous
variable into groups by discretizing it into intervals. It is a
deprecated function that has been replaced by the cut
function from the mStats
package. The main difference
between egen
and cut
is the input they
accept.
-
egen
works with data frames or tibbles, allowing variable grouping within the context of the entire dataset. -
cut
operates on a vector, performing grouping directly on that vector.
The egen
function is deprecated and serves as a wrapper
around the cut function. It issues a deprecation warning indicating that
the recommended approach is to use cut directly.
library(mStats)
#>
#> Attaching package: 'mStats'
#> The following objects are masked from 'package:base':
#>
#> append, cut
data <- data.frame(x = 1:10)
egen(data, x, at = c(3, 7), label = c("low", "medium", "high"))
#> Warning: `egen()` was deprecated in mStats 3.4.0.
#> ℹ Please use `cut()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#> x
#> 1 low
#> 2 low
#> 3 medium
#> 4 medium
#> 5 medium
#> 6 medium
#> 7 high
#> 8 high
#> 9 high
#> 10 high
egen
versus mutate
+ cut
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Example 1: Using egen() function
data <- data.frame(x = 1:10)
data <- egen(data, var = x, at = c(3, 7), label = c("low", "medium", "high"))
# Example 2: Using mutate() and cut() functions
data2 <- data.frame(x = 1:10)
data2 <- mutate(data2, x = cut(x, at = c(-Inf, 3, 7, Inf), label = c("low", "medium", "high")))
# Check if the results are the same
identical(data, data2) # Should be TRUE
#> [1] TRUE
In both examples, a data frame data
and
data2
with a single variable x
is created. The
goal is to group the values of x
into three categories:
“low”, “medium”, and “high”, based on the specified breakpoints.