Skip to contents

The egen function is used to convert a continuous variable into groups by discretizing it into intervals. It is a deprecated function that has been replaced by the cut function from the mStats package. The main difference between egen and cut is the input they accept.

  • egen works with data frames or tibbles, allowing variable grouping within the context of the entire dataset.
  • cut operates on a vector, performing grouping directly on that vector.

The egen function is deprecated and serves as a wrapper around the cut function. It issues a deprecation warning indicating that the recommended approach is to use cut directly.

library(mStats)
#> 
#> Attaching package: 'mStats'
#> The following objects are masked from 'package:base':
#> 
#>     append, cut

data <- data.frame(x = 1:10)

egen(data, x, at = c(3, 7), label = c("low", "medium", "high"))
#> Warning: `egen()` was deprecated in mStats 3.4.0.
#>  Please use `cut()` instead.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
#> generated.
#>         x
#> 1     low
#> 2     low
#> 3  medium
#> 4  medium
#> 5  medium
#> 6  medium
#> 7    high
#> 8    high
#> 9    high
#> 10   high

egen versus mutate + cut

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Example 1: Using egen() function
data <- data.frame(x = 1:10)
data <- egen(data, var = x, at = c(3, 7), label = c("low", "medium", "high"))

# Example 2: Using mutate() and cut() functions
data2 <- data.frame(x = 1:10)
data2 <- mutate(data2, x = cut(x, at = c(-Inf, 3, 7, Inf), label = c("low", "medium", "high")))

# Check if the results are the same
identical(data, data2)  # Should be TRUE
#> [1] TRUE

In both examples, a data frame data and data2 with a single variable x is created. The goal is to group the values of x into three categories: “low”, “medium”, and “high”, based on the specified breakpoints.