Skip to contents

The codebook function is a user-friendly tool designed to create informative summaries, known as codebooks, for data frames in R. It enables R users to explore and understand their datasets more easily. By providing essential details about the variables in a dataset, such as names, types, missing values, completeness, uniqueness, and variable labels (if available), the function empowers users to gain insights and make informed decisions during data analysis.

The codebook function in the mStats package is a valuable resource for generating user-friendly codebooks in R. This vignette will guide you through the process of using the codebook function to explore and summarize your datasets effectively, even if you’re not an expert in R programming.

To get started, make sure you have the mStats package installed and loaded in your R environment. If you haven’t done so already, you can install the package using the following command:

# install.packages("mStats")
library(mStats)
#> 
#> Attaching package: 'mStats'
#> The following objects are masked from 'package:base':
#> 
#>     append, cut
library(labelled)

Once the package is ready, you can begin using the codebook function. This function requires only one input, which is the data frame you want to create a codebook for. Here’s an example to help you understand how to use the function:

# Generate a codebook for the 'iris' dataset
codebook(iris)
#> $ Codebook
#>   dataset: iris
#>   Row: 150
#>   Col: 5
#>   name         type  miss complete unique label
#> 1 Sepal.Length <dbl> 0    1.00     35          
#> 2 Sepal.Width  <dbl> 0    1.00     23          
#> 3 Petal.Length <dbl> 0    1.00     43          
#> 4 Petal.Width  <dbl> 0    1.00     22          
#> 5 Species      <fct> 0    1.00      3

# Label their variables
labelled::var_label(iris) <- c(
  "sepal length", "sepal width", "petal length",
  "petal width", "species"
)

# Generate codebook again
codebook(iris)
#> $ Codebook
#>   dataset: iris
#>   Row: 150
#>   Col: 5
#>   name         type  miss complete unique label       
#> 1 Sepal.Length <dbl> 0    1.00     35     sepal length
#> 2 Sepal.Width  <dbl> 0    1.00     23     sepal width 
#> 3 Petal.Length <dbl> 0    1.00     43     petal length
#> 4 Petal.Width  <dbl> 0    1.00     22     petal width 
#> 5 Species      <fct> 0    1.00      3     species

Running this code will display a comprehensive codebook for the ‘iris’ dataset, providing useful information about each variable, such as its name, type, missing values, completeness, uniqueness, and variable labels (if available). The codebook will be presented in a neatly organized table format, making it easy for you to analyze and interpret the dataset.

If you’re working with a piped dataset, here’s an example:

# Generate a codebook for a piped dataset
mtcars %>% codebook()
#> $ Codebook
#>   dataset: <Piped Data>
#>   Row: 32
#>   Col: 11
#>    name type  miss complete unique label
#> 1  mpg  <dbl> 0    1.00     25          
#> 2  cyl  <dbl> 0    1.00      3          
#> 3  disp <dbl> 0    1.00     27          
#> 4  hp   <dbl> 0    1.00     22          
#> 5  drat <dbl> 0    1.00     22          
#> 6  wt   <dbl> 0    1.00     29          
#> 7  qsec <dbl> 0    1.00     30          
#> 8  vs   <dbl> 0    1.00      2          
#> 9  am   <dbl> 0    1.00      2          
#> 10 gear <dbl> 0    1.00      3          
#> 11 carb <dbl> 0    1.00      6

The function will recognize the piped data, represented by ., and label it as “” in the codebook.

Furthermore, the codebook function seamlessly integrates with the labelled package, allowing you to incorporate variable labels into the codebook. If your dataset contains variable labels, the function will automatically include them in the codebook. To ensure proper handling of variable labels, make sure you have installed and loaded the labelled package before using the codebook function.

If you prefer to use the native pipe operator |>, the codebook function will return the input name instead of “”. Here’s an example:

# Generate a codebook using the native pipe operator
mtcars |> codebook()
#> $ Codebook
#>   dataset: mtcars
#>   Row: 32
#>   Col: 11
#>    name type  miss complete unique label
#> 1  mpg  <dbl> 0    1.00     25          
#> 2  cyl  <dbl> 0    1.00      3          
#> 3  disp <dbl> 0    1.00     27          
#> 4  hp   <dbl> 0    1.00     22          
#> 5  drat <dbl> 0    1.00     22          
#> 6  wt   <dbl> 0    1.00     29          
#> 7  qsec <dbl> 0    1.00     30          
#> 8  vs   <dbl> 0    1.00      2          
#> 9  am   <dbl> 0    1.00      2          
#> 10 gear <dbl> 0    1.00      3          
#> 11 carb <dbl> 0    1.00      6