Estimating a psychometric network with qgraph
Learn how to estimate a psychometric network with qgraph
By Gabriel R. R. in network psychometrics tutorials qgraph
August 10, 2021
We’re gonna need data
To estimate a psychological network we’re going to need data. Let’s take the 25 item Big-Five questionnaire from the psych package.
load_libraries <- function(){
if (!require("dplyr"))
install.packages("dplyr"); library(dplyr)
if (!require("psych"))
install.packages("psych"); library(psych)
if(!require("qgraph"))
install.packages("qgraph"); library(qgraph)
}
load_libraries()
df <- bfi[,1:25]
glimpse(df)
## Rows: 2,800
## Columns: 25
## $ A1 <int> 2, 2, 5, 4, 2, 6, 2, 4, 4, 2, 4, 2, 5, 5, 4, 4, 4, 5, 4, 4, 5, 1, 1~
## $ A2 <int> 4, 4, 4, 4, 3, 6, 5, 3, 3, 5, 4, 5, 5, 5, 5, 3, 6, 5, 4, 4, 4, 6, 5~
## $ A3 <int> 3, 5, 5, 6, 3, 5, 5, 1, 6, 6, 5, 5, 5, 5, 2, 6, 6, 5, 5, 6, 2, 6, 6~
## $ A4 <int> 4, 2, 4, 5, 4, 6, 3, 5, 3, 6, 6, 5, 6, 6, 2, 6, 2, 4, 4, 5, 1, 1, 5~
## $ A5 <int> 4, 5, 4, 5, 5, 5, 5, 1, 3, 5, 5, 5, 4, 6, 1, 3, 5, 5, 3, 5, 2, 5, 6~
## $ C1 <int> 2, 5, 4, 4, 4, 6, 5, 3, 6, 6, 4, 5, 5, 4, 5, 5, 4, 5, 5, 1, 4, 5, 4~
## $ C2 <int> 3, 4, 5, 4, 4, 6, 4, 2, 6, 5, 3, 4, 4, 4, 5, 5, 4, 5, 4, 1, 6, 4, 3~
## $ C3 <int> 3, 4, 4, 3, 5, 6, 4, 4, 3, 6, 5, 5, 3, 4, 5, 5, 4, 5, 5, 1, 5, 4, 2~
## $ C4 <int> 4, 3, 2, 5, 3, 1, 2, 2, 4, 2, 3, 4, 2, 2, 2, 3, 4, 4, 4, 5, 5, 2, 4~
## $ C5 <int> 4, 4, 5, 5, 2, 3, 3, 4, 5, 1, 2, 5, 2, 1, 2, 5, 4, 3, 6, 6, 4, 3, 5~
## $ E1 <int> 3, 1, 2, 5, 2, 2, 4, 3, 5, 2, 1, 3, 3, 2, 3, 1, 1, 2, 1, 1, 3, 1, 2~
## $ E2 <int> 3, 1, 4, 3, 2, 1, 3, 6, 3, 2, 3, 3, 3, 2, 4, 1, 2, 2, 2, 1, 3, 2, 1~
## $ E3 <int> 3, 6, 4, 4, 5, 6, 4, 4, NA, 4, 2, 4, 3, 4, 3, 6, 5, 4, 4, 4, 5, 4, ~
## $ E4 <int> 4, 4, 4, 4, 4, 5, 5, 2, 4, 5, 5, 5, 2, 6, 6, 6, 5, 6, 5, 5, 5, 3, 5~
## $ E5 <int> 4, 3, 5, 4, 5, 6, 5, 1, 3, 5, 4, 4, 4, 5, 5, 4, 5, 6, 5, 6, 4, 4, 2~
## $ N1 <int> 3, 3, 4, 2, 2, 3, 1, 6, 5, 5, 3, 4, 1, 1, 2, 4, 4, 6, 5, 5, 1, 2, 2~
## $ N2 <int> 4, 3, 5, 5, 3, 5, 2, 3, 5, 5, 3, 5, 2, 1, 4, 5, 4, 5, 6, 5, 3, 5, 2~
## $ N3 <int> 2, 3, 4, 2, 4, 2, 2, 2, 2, 5, 4, 3, 2, 1, 2, 4, 4, 5, 5, 5, 3, 5, 2~
## $ N4 <int> 2, 5, 2, 4, 4, 2, 1, 6, 3, 2, 2, 2, 2, 2, 2, 5, 4, 4, 5, 1, 2, 4, 2~
## $ N5 <int> 3, 5, 3, 1, 3, 3, 1, 4, 3, 4, 3, NA, 2, 1, 3, 5, 5, 4, 2, 1, 1, 6, ~
## $ O1 <int> 3, 4, 4, 3, 3, 4, 5, 3, 6, 5, 5, 4, 4, 5, 5, 6, 5, 5, 4, 4, 6, 5, 6~
## $ O2 <int> 6, 2, 2, 3, 3, 3, 2, 2, 6, 1, 3, 6, 2, 3, 2, 6, 1, 1, 2, 1, 1, 1, 1~
## $ O3 <int> 3, 4, 5, 4, 4, 5, 5, 4, 6, 5, 5, 4, 4, 4, 5, 6, 5, 4, 2, 5, 3, 6, 5~
## $ O4 <int> 4, 3, 5, 3, 3, 6, 6, 5, 6, 5, 6, 5, 5, 4, 5, 3, 6, 5, 4, 3, 2, 6, 5~
## $ O5 <int> 3, 3, 2, 5, 3, 1, 1, 3, 1, 2, 3, 4, 2, 4, 5, 2, 3, 4, 2, 2, 4, 2, 2~
Note that we have ordinal data. This is not problematic and we’ll mostyl deal
with it using the cor_auto
function from qgraph. The cor_auto
function
estimates a correlation matrix with the appropriate method for our data. This
is done by automatically (hence the name) identifying our data type.
We have now three initial alternatives of estimation using the qgraph package:
- Estimate network with a correlation matrix
- Estimate network with a partial correlation matrix
- Estimate network using EBICglasso
Before going into the actual code, let’s prepare this hyperparameter that’ll separe our nodes by groups.
traits <- rep(c('Agreeableness',
'Conscientiousness',
'Extraversion',
'Neuroticism',
'Openness'),
each = 5)
1. Estimate network with a correlation matrix
network <- qgraph(cor_auto(df),
graph = 'cor',
layout = 'spring',
theme = 'colorblind',
groups = traits)
The correlation network plots the bivariate association between two variables. In this sense, when estimating an edge between two variables, it does not take into account the covariance existing between other variables. In my understanding, a graph like this would probably be used to show the bivariate association between a small number of variables.
2. Estimate network with a partial correlation matrix
network <- qgraph(cor_auto(df),
layout = 'spring',
graph = 'pcor', # note this new argument
theme = 'colorblind',
groups = traits)
One way to take into account other variables' influence while estimating the
unique association between two variables is by estimating a partial correlation
matrix. This can be done with the graph = 'pcor'
argument.
A graph that plots the unique association between two nodes is called a Gaussian Graphical Model (GGM). According to Burger et al. (in press), in a GGM the parameters are representing the unique association among two variables after conditioning on all other variables in the network.
3. Estimate network using EBICglasso
Note that the graph above is plotting the unique association between every two variables. A possible problem with the graph above is that we’re possibly plotting false positives since any partial correlation different than 0 is being shown. In that sense, a correlation of 0.02 would be plotted following the procedure above. The thing is: spurious correlations may not be present in the true model - they’re probably Type I errors.
We can prune and threshold our network to fix that, but for now I’d like to present another form of estimating a GGM while controling for Type I errors. This method is called EBICglasso - Extended Bayesian Information Criteria after estimating a graphical LASSO network (Foygel & Drton, 2010).
The code is fairly simple and, as you can see, the network is sparser (i.e.,
the network has less parameters, or edges). The only new argument should be
the sample size used in estimating the variance-covariance matrix. This could
be a number sampleSize = 2800
or you can directly pull this
number from the dataframe (if you don’t have missing values),
sampleSize = nrow(df)
.
network <- qgraph(cor_auto(df),
layout = 'spring',
graph = 'glasso',
sampleSize = nrow(df), # new argument!
theme = 'colorblind',
groups = traits)
Note that it’s sparser! How does it achieve that? The EBICglasso follows these steps:
- Estimate partial correlation network S
- Regularize S with the LASSO penalization n times using n different numbers of λ varying from 0 to 1.
- Among the n networks, use the EBIC criterion to select the final network.
One important thing is that the EBIC uses a parameter called γ, that varies from 0 to 1. A value of γ = 0 aims toward discovery and allows for more Type I errors. A value of γ = 1 is conservative and will probably commit more Type II errors. The recommended value of γ is 0.5 according to Epskamp and Fried (2018). This is also the default in qgraph.
References
Burger, J., Isvoranu, A. M., Lunansky, G., Haslbeck, J. M. B., Epskamp, S., Hoekstra, R. H. A., Fried, E. I., Borsboom, D., & Blanken, T. F. (in press). Reporting standards for psychological network analyses in cross-sectional data. https://psyarxiv.com/4y9nz/
Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mõttus, R., Waldorp, L. J., & Cramer, A. O. J. (2015). State of the aRt personality research: A tutorial on network analysis of personality data in R. Journal of Research in Personality, 54, 13–29. https://doi.org/10.1016/j.jrp.2014.07.003
Epskamp, S., Cramer, A. O. J, Waldorp, L. J., Schmittmann, V. D., Borsboom, D. (2012). qgraph: Network visualizations of relationships in psychometric data. Journal of Statistical Software, 48(4), 1–18. https://doi.org/10.18637/jss.v048.i04
Epskamp, S., & Fried, E. I. (2018). A tutorial on regularized partial correlation networks. Psychological Methods, 23(4), 617–634. https://doi.org/10.1037/met0000167
Foygel, R., & Drton, M. (2010). Extended Bayesian Information Criteria for gaussian graphical models. Proceedings of the 23rd International Conference on Neural Information Processing Systems, 604–612. arxiv.org/pdf/1011.6640.pdf
- Posted on:
- August 10, 2021
- Length:
- 8 minute read, 1502 words
- Categories:
- network psychometrics tutorials qgraph
- See Also: