Mixed models vs multilevel SEM

keywords multilevel, path analysis, lavaan, lme4, jamovi, semlj, gamlj

Draft version, mistakes may be around

In this page we show how SEMLj (and thus R packagelavaan) and GAMLj jamovi module (and thus R package lme4) can be led to produce the same mixed model, and thus the same results. We choose a very simple model that can be estimated with all packages and thus the results can be compared.

Research data

For this example we use a simple simulated dataset available from jamovi GAMLj help page, that can be dwonloaded here

Mixed model

The data simulate an hypothetical study in which the number of beers and number of smiles are measured in a sample of bar customers. We immagine we sampled a number of bars (15 in this example) in a city, and in each bar we measured how many beers customers consumed that evening and how many smiles they were producing for a give time unit (say every minute). The aim of the analysis is to estimate the relationship between number of beers and number of smiles, expecting a positive relationship.

We want to estimate the fixed effect of beer on smile, allowing random intercepts and random slopes across bars, see GAMLj help page for details. GAMLj uses lmer() function to estimate the mixed model, so we know where the results come from.

Module

We need to use SEMLj syntax submodule, which allows for multilevel models.

Simple mixed regression

Mixed Model

We start with the simplest mixed model: smile is the dependent variable, beer the independent variable, bar is the clustering variable, and we set the intercept as the random coefficient across bar. This corresponds to the lmer() model: smile~1+beer+(1|bar). For the sake of comparison, in GAMLj we set the estimator equal to ML and the independent variable beer scaled as none, so no transformation is done on the variable values.

Results are the following:

We want to highlight the fixed effect intercept (5.854), the slope (0.549) and its standard error (0.080), and the variance of the intercepts (5.382), with corresponding ICC=.788.

Multilevel path analysis

We know estimate the same model with SEMLj, thus using R lavaan package. The syntax is the following:

What we are saying to the software is to estimate a two levels model, in which at the within model (i.e. the participants level) beer predicts smile, whereas at the between level (i.e. the bars level), we estimate only the intercept. In this way, the whole effect of beer on smile is captured by the first level, like in the standard mixed model estimated above. Results are:

We can appreciate that the effect of beer on smile is indeed 0.549. The intercept can be found in the Intercept table, in the row between smile, meaning the fixed intercept, equal to 5.845, as in the LMM.

The variance of the intercepts can be found in the Variances and Covariances table, at the between smile smile row. It is 5.382, as in the mixed model. In the multilevel SEM, you find the variance of the intercepts in this row because it is the variance of smile means across bars, conditioned to beer=0, which is indeed the variance of the intercepts.

Centering within cluster

People often like to center their level 1 variables and include cluster means as an additional predictor. This help disentangling the within cluster effect and the between clusters effects.

This means that our independent variable beer should be represented in the data with two variables. A cluster-centered version (cbeer) and a variabile which features the mean of the cluster the participant belongs to (meanbeer).

Here you can see the first rows of the data in which these new variables have been computed. The first three rows are participants from the same bar, so they share the same meanbeer (1.667) and their cbeer is the deviation beer-meanbeer. the same goes for the other participants.

Mixed Model

We simply need to set cbeer and meanbeer as independent variables, and leave the intercept as random coefficient. GAMLj results are the following:

The within bars effect of beer is now 0.603, and the between bars effect is -.863. This means that if bar had not effect, for one more beer drunken, smiles would increase of .603 units, whereas bars that are one unit apart in the average number of beers, would show .863 average smiles less. We can also notice that the variance of the intercepts is now 1.261.

Multilevel path analysis

To obtain the same results of the mixed model, we need to issue the following lavaan syntax:

The syntax is straightforward: At the participants level, the cluster-centered variable predicts the dependent variable, at the bar level (between), the bar mean of beers predicts the mean of smiling. Results, again, are the same as in the mixed model.

As expected, the regression coefficients are the same as in the mixed model, and the variance of the intercept (found in Variances and Covariances table, row between smile smile) is indeed 1.261.

What does not match

Parameters estimates are the same in the mixed model and the multilevel path analysis. A few indexes do not match. The first one is the ICC (intra-class correlation). In the last example, the mixed model gives a \(ICC=.466\), whereas the multilevel path analysis gives:

The reason of this discrepancy is that in the mixed model, the ICC is computed as \(\sigma_I^2/(\sigma_I^2+\sigma^2)\), where \(\sigma_I^2\) is the variance of the intercepts and \(\sigma^2\) is the residual variance. lavaan uses a different definition of the variances involved in the ICC computation. If one wants to extract the mixed model ICC from the multilevel path analysis output, one can simply compute: variance of smile between /(variance of smile between + variance of smile within). In our case: \(1.261/(1.261+1.448)=.465\), as in the mixed model.

The other index that appears to be different in the mixed model and the multilevel SEM is the \(R^2\). GAMLj computes two \(R^2\):

The marginal R-squared is the variance explained by the fixed effects over the total variance, the conditional one is the variance explained by the whole model (fixed and random effects).

The multilevel SEM computes two R-squared indexes (for endogenous variable). One for the within level, one for the between level.

For a very simple model like the ones we are analyzing, the \(R2\) corresponds to the squared of the standardized regression coefficient, \(R_w^2=.433^2=.187\) and \(R_b^2=-.749^2=.561\).

Examples

Some worked out practical examples can be found here

Comments?

Got comments, issues or spotted a bug? Please open an issue on SEMLj at github or send me an email