Fixed vs. random effects for browsing data - a simulation
When you work with trace data — data that emerge when people interact with technology — you will notice that such data often have properties that open up questions about statistical modelling. I currently work with browsing records, obtained at several times from the same users (i.e., a panel data set). A first typical characteristic of such data: Browsing behaviors are skewed. If you’re interested in people reading politically extreme web sites, you will find a few people doing it a lot, and most people doing little to none.
A second characteristic relates to the panel nature of the data. If you’re looking at people’s visits to online shops, they do not change their habits much over time — at least these within-person differences are not as great as the differences across people. Panel data lend itself to hierarchical modelling, for example with (1) fixed effects (FE) or (2) random-effects multilevel modelling (RE). For commonalities and differences between these two modelling approaches, see for example Gelman and Hill (2006).[1] The amount of within-person variation relative to between-person variation has important implications for these two approaches.
Below,
I simulate the performance of FE vs. RE models, with these data
characteristics in mind. I am mainly interested in the statistical power
of each model, although my code can be easily adapted to examine, for
example, bias and the RSME. The code builds on two sources:
Conceptually, on the paper “Should I Use Fixed or Random Effects?” by
Clark and Linzer (2015)[2]. In terms of code architecture, on the R
package simhelpers
, and its documentation.[3]
I consider a model with independent variable xᵢ and dependent variable yᵢ, for which we have i = 1, … m observations grouped in j = 1, … n units. Importantly, we model theunit effect (or unit intercept) aⱼ, which intuitively can be understood as explaining everything about y that cannot be explained by xᵢ at the unit level. The model formally: yᵢ = aⱼ+ bxᵢ + eᵢ. To understand how the unit effects are estimated differently by FE and RE, I again recommend Gelman and Hill (2007). One key difference between RE and FE is that for the former to be unbiased, it requires a zero correlation between unit effects aⱼ and xᵢ.
Let’s start with the required packages.
library(tidyverse)
library(broomExtra)
library(mnonr)
library(MASS)
library(plm)
library(lme4)
library(lmerTest)
library(simhelpers)
In the simulation, I assume a small true effect b that is kept constant across simulations. I vary the following four parameters: (1) the number of units (500, 1000 or 2000); (2) the number of observations per unit (2, 3 or 5); (3) the true correlation between unit effects aⱼ and xᵢ (0 or 0.8); (4) the within-unit standard deviation in xᵢ (between 0.1 and 1, while the standard deviation of xᵢ between units is fixed at 1).
design_factors <- list(
n = c(500, 1000, 2000),
n_perunit = c(2, 3, 5),
truecor = c(0, 0.8),
b_1 = c(0.2),
x_sd = seq(0.1, 1, 0.1))
I start by defining a data-generating function. It first creates a unit-level average of xᵢ, as well as the unit effects aⱼ. To account for the skewness of browsing data, I draw the unit-level averages of xᵢ from a skewed normal distribution (note that the values of 3 for skew and 33 for kurtosis are motivated by actual browsing data). I then create values of yᵢ according to the model outlined above, adding the normally distributed error eᵢ.
generate_dat <- function(n, n_perunit, truecor, b_1, x_sd) {
# Generate unit effects and unit-level means
v <- unonr(n, c(0, 0),
matrix(c(1, truecor, truecor, 1), 2, 2),
skewness = c(0, 3), kurtosis = c(0, 33))
b_0 <- v[, 1]
xmean <- v[, 2]
# Generate observations per unit
dat <- NULL
for (unit in 1:n) {
x <- rnorm(n_perunit, xmean[unit], x_sd)
y <- b_0[unit] + b_1*x + rnorm(n_perunit, 0, 1)
dat <- rbind(dat, cbind(x, y, unit = unit))
}
return(dat)
}
Next, I define a function that estimates the models , and one that pulls the estimate, standard error and p-value for b. Note that I also estimate a standard “pooled” model for reference.
pull_stats <- function(model, method) {
estimate <- broomExtra::tidy(model) %>%
filter(term == “x”) %>% pull(estimate)
p.value <- broomExtra::tidy(model) %>%
filter(term == “x”) %>% pull(p.value)
std.error <- broomExtra::tidy(model) %>%
filter(term == “x”) %>% pull(std.error)
return(tibble(method = method,
est = estimate,
se = std.error,
p = p.value))
}
estimate <- function(dat){
# Pooled model
reg_pool <- lm(y ~ x, data = dat)
# Random-effects model
reg_fe <- plm(y ~ x, data = dat, index = “unit”, model = “within”)
# Fixed-effects model
reg_re <- lmerTest::lmer(y ~ x + (1|unit), data = dat)
results <- bind_rows(
pull_stats(reg_pool, "pooled"),
pull_stats(reg_fe, “FE”),
pull_stats(reg_re, “RE”))
return(results)
}
Next, the function driving the simulation:
run_sim <- function(iterations, n, n_perunit, truecor, b_1, x_sd, seed = NULL) {
if (!is.null(seed)) set.seed(seed)
results <- rerun(iterations, {
dat <- generate_dat(n, n_perunit, truecor, b_1, x_sd)
estimate(dat)}) %>% bind_rows()
}
At last, run the simulation:
set.seed(20220516)
params <-
cross_df(design_factors) %>%
mutate(
iterations = 1000,
seed = round(runif(1) * 2³⁰) + 1:n()
)
system.time(
results <-
params %>%
mutate(
res = pmap(., .f = run_sim)
) %>%
unnest(cols = c(res))
)
Now you can inspect and plot the results.
power <- results %>%
group_by(method, n, n_perunit, truecor, x_sd, b_1) %>%
do(calc_rejection(., p_values = p))
power %>%
as.data.frame() %>%
filter(truecor == 0) %>%
rename(“Number of units” = n,
“Observations per unit” = n_perunit) %>%
ggplot(aes(x = x_sd, y = rej_rate,
group = method, color = method)) +
geom_smooth(se = FALSE) +
scale_y_continuous(name = “Rejection rate”) +
scale_x_continuous(name = “Within-unit SD”) +
scale_color_discrete(name = “Method”) +
theme_light() +
theme(legend.position = “bottom”) +
facet_wrap(~ `Number of units` + `Observations per unit`,
labeller = label_both)
The plot below shows that, as intuition would tell us, when the within-unit variation is small, FE models have a hard time detecting a true effect, especially when the number of units, and the number of observations per unit, are small.
Kommentare
Kommentar veröffentlichen