Covariate-Varying Networks

Covariate-varying networks (CVNs) are high-dimensional dynamic graphical models that vary with multiple external covariates. The methodology is explained in detail in the pre-print by Dijkstra, Godt and Foraita (2024). In this vignette, we describe how to apply the method to data with the associated R package CVN.

Covariate-Varying Networks

A graphical model is a powerful tool for exploring complex dependency structures in high-throughput datasets, such as genomics, epigenomics, and proteomics. They allow for the investigation of, e.g. biologically, meaningful patterns that aid in understanding biological processes and generating new hypotheses. Despite the widespread application of Gaussian graphical models, there was an urgent need connecting graph structures to external covariates.

A CVN is a Gaussian graphical model for high-dimensional data that can change by multiple external discrete covariates. For each combination of the categories between the covariates, a graph is estimated. This is not done individually, since we allow for similarities between different graphs related to various covariates. The smoothness between the graphs in the CVN model is introduced by a meta-graph W. In this meta-graph, each node corresponds to a graph in the CVN model and an edge between two graphs enforces smoothing between these graphs.

More formally, a CVN is represented as graphical model with the quintupel

CVN = {X, U, 𝒰, f(X ∣ U), {G(u) = (V, E(u))}_{u ∈ 𝒰}}, where X = (X₁, X₂…, X_p)^⊤ is a p-dimensional random vector and U = (U₁, U₂, …, U_q)^⊤ is a random vector representing q external discrete covariates. The external discrete covariates are not included in the graph, but they determine the smoothing between all possible combinations of the covariates (K₁, …, K_q)^⊤ categories.

The vector U lies in the discrete space 𝒰 with cardinality $m \leq \prod_{k=1}^q K_k$. The joint density function of X conditioned on U is f(X ∣ U). The fifth element of the CVN is a set of m graphs, one for each value u in 𝒰. The vertices of G(u), V = {1, 2, …, p}, correspond to the variables X₁, X₂, …, X_p and do not change with U.

We assume that X ∣ U follows a multivariate normal distribution with μ(u) = 0 and covariance matrix Σ(u). To estimate the structure of the graph, we focus on the precision matrix Θ(u) = Σ(u)⁻¹ which has the following property under the normality assumption:

$$ \theta_{ij}(u) = 0 \Leftrightarrow X_i \not\!\perp\!\!\!\perp X_j \mid \mathbf{X}_{V \setminus \{i,j\}} \land U = u \Leftrightarrow \text{edge } \{i,j\} \notin E(u) $$ Hence, the (in)dependence structure of the CVN can be estimated by determining the zero entries of the precision matrices. Our goal is to enable a CVN to handle high-dimensional data and identify structural similarities between graphs. We therefore introduce regularization and smoothness in the definition of the CVN estimator $\mathbf{ \Theta}_{\text{CVN}} = \{ \widehat{\mathbf{\Theta}}_i \}_{i = 1}^m$ as follows:

$$ \mathbf{ \Theta}_{\text{CVN}} = \text{argmin}_{\{ \mathbf{\Theta}_{i}\}_{i = 1}^m} \Bigg[ \sum_{i = 1}^m \ell\left({\mathbf{\Theta}_i}\right) + \lambda_1 \sum_{i = 1}^m \left\rVert \mathbf{\Theta}_i \right\rVert_1 + \lambda_2 \sum_{i < j} w_{ij} \left\lVert \mathbf{\Theta}_i - \mathbf{\Theta}_j \right\rVert_1 \Bigg],$$

where Θ_i is the i-th precision matrix, ℓ(Θ_i) the log-likelihood function for precision matrix i, w_ij values of the symmetric weighted adjacency matrix of the meta-graph (see below) and λ₁, λ₂ are two tuning parameters.

Meta-Graph W

The CVN model might differ from other graphical model approaches especially through the concept of the (undirected) meta-graph. The adjacency matrix of the meta-graph equals the weight matrix W = (w_ij)_m × m, which is a m × m symmetric matrix. It encodes the smoothing structure between the graphs in a CVN and must be set a priori.

A weight of 0 does not smooth the structure between two graphs. Weights must be chosen between 0 and 1.

Here are some examples how W and the corresponding meta-graph for a CVN with 2 covariates can look like:

Meta-graph for a CVN with 2 covariates	W

Grid structure	Each covariate has 3 categories

Weighted grid structure	Each covariate has 3 categories

Arbitrary structure	Each covariate has 3 categories

GLASSO-type structure	No smoothing Each covariate has 3 categories

Full/saturated structure	Complete smoothing Covariates with 2 and 3 categories

Example

In this fictitious example, we are interested in how the independence structure of collected biomarkers differ by the 2 external variables Body mass index (BMI) and income level. The biomarker values were transformed to normally distributed z-score values. BMI has the three categories underweight, normal weight, overweight, and income level has the categories low, middle and high.

Data

For this example, we will use randomly simulated data where the variables in the graph are z-score values of various biomarkers, each following a normal distribution. We are interested in, if the graphs differ by the two external variables, each of them having K = L = 3 categories.

The CVN function requires as input data:

A list with K ⋅ L = m data sets
Each list element needs to be a matrix or a data object of Gaussian data, corresponding to each combination of the categories of the two external variables
The number of variables p must be equal in each list element, the number of observations may differ.

The input data is then a list of 3 × 3 = 9 list elements that corresponding to the combinations (underweight - low income), (underweight - middle income), …, (overweight - high income).

# Load required library
library(CVN)
library(dplyr)

# Simulate the dataset
set.seed(2024)  
n <- 300  

# Create 10 normally distributed variables for the graph
data <- as.data.frame(matrix(rnorm(n * 10), ncol = 10))
colnames(data) <- paste0("X", 1:10)

# Add two discrete external covariates
data$income <- sample(c("low","middle", "high"), n, replace = TRUE)  
data$bmi <- sample(c("underweight","normal weight", "overweight"), n, replace = TRUE)

# Split the dataset into subsets based on dosis and bmi
data_list <- data %>%
  group_by(income, bmi) %>%
  group_split() %>%
  lapply(function(df) df %>% select(-income, -bmi))

names(data_list) <- 
  apply(expand.grid(income = c("low","middle", "high"), 
                    bmi = c("underweight","normal weight", "overweight")), 
        1,
        function(x) paste0(x[1], "_", x[2]))

Weigth Matrix and Meta-Graph

We set the meta-graph W in our example to the following: Unweighted meta-graph showing a grid.

In case of two external covariates, the function create_weight_matrix might be used to generated simple weight matrices from type full (equals saturated graphs), grid, glasso and random structures using uniform-random.

# plots can be turned on by setting plot = TRUE. Requires the igraph package
W_grid <- create_weight_matrix(type = "grid", k = 3, l = 3, plot = FALSE)

Weight matrices can be plotted as a meta-graph or as heatmap plot: This can be done in several ways:

A graph output can be generated when creating the weight matrix using the function create_weight_matrix by setting the parameter plot=TRUE, or
by using the plot_weight_matrix function (see below).
A heatmap plot is generated using the function hd_weight_matrix.

The first two options require the igraph package.

# requires the igraph package
plot_weight_matrix(W_random, k = 3, l = 2)

A heatmap plot looks like the following. Note that it has a dimension of size m × m. Axis labeling is according to the order of your data_list containing the m data sets.

W_random <- round(create_weight_matrix(type = "uniform-random", k = 3, l = 2), 2)
hd_weight_matrix(W_random)  # randomly chosen weights

Tuning Parameter Space

As know from other methods, e.g., the GLASSO, we need to tune our model. A CVN requires to select two tuning parameters which control the regularization applied to the CVN. The choice of the tuning parameters affects how dense the graphs are and how much the edges between the graphs have been smoothed. These parameters are usually searched for in a predefined regularization path.

λ₁ governs the sparsity in the CVN, which gets sparser the larger λ₁ is chosen.
λ₂ is responsible for regulating the smoothness or similarity between the graphs by penalizing the differences between the precision matrices that are connected in the meta-graph. It encourages similar values in the precision matrices and hence also if an edge is included in the graph or not. The graphs in the CVN are getting more and more similar the larger λ₂ is selected.

The CVN function fits all combinations for a given selection of λ₁ and λ₂ values.

# Lets define a regularization path for each tuning parameter
lambda1 = seq(0.5, 2, length = 3)  # sparsity
lambda2 = c(1, 1.5)                # smoothing

Estimate a CVN

A CVN graphical model is fitted using an alternating direction method of multipliers (ADMM). By default, the CVN function will parallelize when multiple values of λ₁ or λ₂ are provided. However, the number of cores used can be adjusted by setting, e.g, n_cores = 1.

The CVN is simply estimated by the following.

cvn <- CVN(data = data_list, 
              W = W_grid, 
        lambda1 = lambda1, 
        lambda2 = lambda2, 
            eps = 1e-2,       # makes it faster but less precise; default = 1e-4
        maxiter = 500, 
        n_cores = 1,          # no parallizing
      warmstart = TRUE,       # uses the glasso
        verbose = FALSE)

# Print results
print(cvn)
#> Covariate-varying Network (CVN)
#> 
#> ✓ all converged
#> 
#> Number of graphs (m)    : 9
#> Number of variables (p) : 10
#> Number of lambda pairs  : 6
#> 
#> Weight matrix (W):
#> 9 x 9 sparse Matrix of class "dsCMatrix"
#>                        
#>  [1,] . 1 . 1 . . . . .
#>  [2,] 1 . 1 . 1 . . . .
#>  [3,] . 1 . . . 1 . . .
#>  [4,] 1 . . . 1 . 1 . .
#>  [5,] . 1 . 1 . 1 . 1 .
#>  [6,] . . 1 . 1 . . . 1
#>  [7,] . . . 1 . . . 1 .
#>  [8,] . . . . 1 . 1 . 1
#>  [9,] . . . . . 1 . 1 .
#> 
#>   id lambda1 lambda2      gamma1       gamma2 converged       value
#> 1  1    0.50     1.0 0.001234568 0.0006172840      TRUE 0.008176557
#> 2  2    1.25     1.0 0.003086420 0.0006172840      TRUE 0.008932883
#> 3  3    2.00     1.0 0.004938272 0.0006172840      TRUE 0.009769567
#> 4  4    0.50     1.5 0.001234568 0.0009259259      TRUE 0.009720175
#> 5  5    1.25     1.5 0.003086420 0.0009259259      TRUE 0.009410001
#> 6  6    2.00     1.5 0.004938272 0.0009259259      TRUE 0.009319307
#>   n_iterations      aic      bic      ebic edges_median edges_iqr
#> 1           16 5255.323 6071.667  8586.090           29         6
#> 2           19 4763.677 5258.571  6778.278           18         3
#> 3           22 4476.400 4769.034  5662.437           10         2
#> 4           19 5724.758 6922.427 10606.563           44         1
#> 5           22 4682.392 5122.325  6476.245           16         3
#> 6           26 4404.656 4651.844  5407.092            8         1

An Alternative Tuning Parameterization

In our manuscript, we also introduce an alternative tuning parameterization. Due to the algorithm’s computational complexity, performing an exhaustive naive search across a broad range of potential values is usually not feasible. We realized that introducing a different external covariate, thereby altering the number of graphs m, or considering different variables, which changes p, impacts the interpretation and meaning of the tuning parameters λ₁ and λ₂. In other words, if one selects optimal values for λ₁ and λ₂ and then slightly alters the dataset, these values may no longer be informative for the new dataset.

To address this issue, we propose a new parameterization, denoted as γ₁ and γ₂, replacing λ₁ and λ₂. Unlike the previous tuning parameters that penalize the entire precision matrix and differences between precision matrices, γ₁ and γ₂ are used to penalize individual edges.

This alternative tuning parameterization remains robust when changing the number of variables or graphs in the CVN model. This might also be beneficial for other applications as well.

The results table of the CVN model includes the gamma tuning parameters by default, which
are much smaller than the lambda values (see our manuscript for more information).

id	lambda1	lambda2	gamma1	gamma2	converged	value	n_iterations	aic	bic	ebic	edges_median	edges_iqr
1	0.50	1.0	0.0012346	0.0006173	TRUE	0.0081766	16	5255.323	6071.667	8586.090	29	6
2	1.25	1.0	0.0030864	0.0006173	TRUE	0.0089329	19	4763.677	5258.571	6778.278	18	3
3	2.00	1.0	0.0049383	0.0006173	TRUE	0.0097696	22	4476.400	4769.034	5662.437	10	2
4	0.50	1.5	0.0012346	0.0009259	TRUE	0.0097202	19	5724.758	6922.427	10606.563	44	1
5	1.25	1.5	0.0030864	0.0009259	TRUE	0.0094100	22	4682.392	5122.325	6476.245	16	3
6	2.00	1.5	0.0049383	0.0009259	TRUE	0.0093193	26	4404.656	4651.844	5407.092	8	1

Selecting Tuning Parameters

After fitting the CVN models, the next step involves selecting suitable values for the tuning parameters (λ₁, λ₂) or for (γ₁, γ₂). The choice of the tuning parameter determine the network structure and it is thus an important and challenging step.

Information Criteria

The CVN package provides the Akaike Information Criteria (AIC), the Bayesian Information Criteria (BIC) and the extended BIC, which can be used to select suitable values for the tuning parameters.

The calculation of the eBIC requires to set the parameter γ_eBIC ∈ [0, 1]. It is by default set to γ_ebic = 0.5. All three information criteria are returned in the CVN result table.

# print results
cvn$results
#>   id lambda1 lambda2      gamma1       gamma2 converged       value
#> 1  1    0.50     1.0 0.001234568 0.0006172840      TRUE 0.008176557
#> 2  2    1.25     1.0 0.003086420 0.0006172840      TRUE 0.008932883
#> 3  3    2.00     1.0 0.004938272 0.0006172840      TRUE 0.009769567
#> 4  4    0.50     1.5 0.001234568 0.0009259259      TRUE 0.009720175
#> 5  5    1.25     1.5 0.003086420 0.0009259259      TRUE 0.009410001
#> 6  6    2.00     1.5 0.004938272 0.0009259259      TRUE 0.009319307
#>   n_iterations      aic      bic      ebic edges_median edges_iqr
#> 1           16 5255.323 6071.667  8586.090           29         6
#> 2           19 4763.677 5258.571  6778.278           18         3
#> 3           22 4476.400 4769.034  5662.437           10         2
#> 4           19 5724.758 6922.427 10606.563           44         1
#> 5           22 4682.392 5122.325  6476.245           16         3
#> 6           26 4404.656 4651.844  5407.092            8         1

# which combination of lambda values shows the best AIC?
best.aic <- cvn$results[which.min(cvn$results$aic), "id"]
cvn$results[best.aic,]
#>   id lambda1 lambda2      gamma1       gamma2 converged       value
#> 6  6       2     1.5 0.004938272 0.0009259259      TRUE 0.009319307
#>   n_iterations      aic      bic     ebic edges_median edges_iqr
#> 6           26 4404.656 4651.844 5407.092            8         1

👍 Rule of thumb: The smaller the IC value, the better the fit.

The package contains also an option of plotting the results of the information criteria. The plot is a map of all IC values for every tuning parameter combination fitted by CVN. The yellow dot denotes the tuning parameter combination which shows the smallest IC value.

plot_information_criterion(cvn, criterion = "aic", use_gammas = FALSE)

It is also possible to change the γ-value for the eBIC. The default is γ = 0.5.

(dic <- determine_information_criterion_cvn(cvn, gamma = 0.9))
#>      [,1]     [,2]     [,3]     [,4]     [,5]     [,6]    
#> aic  5255.323 4763.677 4476.4   5724.758 4682.392 4404.656
#> bic  6071.667 5258.571 4769.034 6922.427 5122.325 4651.844
#> ebic 10597.63 7994.042 6377.159 13553.87 7559.381 6011.29

	IC
minimal AIC:	6
minimal BIC:	6
minimal eBIC with gamma = 0.5:	6
minimal eBIC with gamma = 0.9:	6

Based on the information criteria, we select CVN 6. Here, all IC measures decide in favour of ID 6. However, this is not always the case. In principle, the AIC selects slightly denser models than the BIC. For tuning graphs, many prefer the eBIC, although you have specify a value for γ_eBIC as well.

Our simulations showed better results in detecting the true underlying graph for using the AIC than the BIC (eBIC was not included in the comparison). But in general do both perform rather poorly in practice.

Other possibilities to select a CVN model

Although ICs are objective measures, they sometimes provide useless graphs (too sparse, to dense, …).

There are other approaches that can be used to find a suitable graph structure, like stability approaches. These, however, are based on resampling strategies which demand a lot of computational power.

Others prefer to assume a sparsity or density value and select the graph that comes closest to that threshold We output therefore also the median number of edges and the interquartile range (IQR) across the m estimated graphs of the CVN results output (see above).

Exploring a CVN

Once a graph has been selected, there is usually an interest in further investigation of its structure etc.

Extract one CVN model

The process becomes easier if the CVN model of interest is extracted as a single object from the list of fitted CVNs for further investigation.

In our example, we aim to examine the CVN with the lowest AIC value, which corresponds to the model with ID 6.

fit6 <- extract_cvn(cvn, 6)

Plot a CVN model

All subgraphs of a CVN model can be easily plotted. The function requires the visnet package to visualize the network structure of each CVN subgraphs.

plot_cvn <- plot(fit6, verbose = FALSE)

The plot shows edges with different color coding:

Red edges: The edges are present in all m subgraphs and are referred to as core edges.
Blue edges: These edges are absent from at least one of m subgraphs.

It is also possible to generate all plots for every tuning parameter constellation over all m fitted graphs in one function.

cvn <- plot(cvn)

Differences between CVN subgraphs

In order to examine the differences between the subgraphs of a CVN model, it is possible to determine the number of edges in each CVN subgraph and to analyze the Hamming distances between the subgraphs.

The Hamming distance between two graphs with the same set of nodes is the number of edges that differ between them. It is calculated by counting the edges that need to be added or removed to transform one graph into the other.

The following code shows the number of edges in each subgraph (E(g1), …, E(g9)), the number of edges that are shared by all graphs (E(core)) and the number of edges that are unique for each subgraph (E(g1_u),…, E(g9_u)).

cvn_edge_summary(cvn)
#>   E(g1) E(g2) E(g3) E(g4) E(g5) E(g6) E(g7) E(g8) E(g9) E(core) E(g1_u) E(g2_u)
#> 1    68    52    56    64    56    56    68    68    58      12       6       0
#> 2    44    32    36    38    32    30    44    38    36       1      14       0
#> 3    28    22    20    22    18    16    36    18    14       0      12       2
#> 4    90    88    88    88    88    88    90    90    90      44       0       0
#> 5    36    28    30    32    32    28    38    38    32       2      12       0
#> 6    28    18    16    18    16    16    22    16    14       1      12       0
#>   E(g3_u) E(g4_u) E(g5_u) E(g6_u) E(g7_u) E(g8_u) E(g9_u)
#> 1       0       0       0       0       0       0       0
#> 2       0       0       2       0       4       0       0
#> 3       4       0       2       0      10       0       2
#> 4       0       0       0       0       0       0       0
#> 5       4       0       2       0       4       2       2
#> 6       2       0       2       4       4       2       2

You can also print the summary for one extracted CVN (fit6 in the example). The ouptut shows than the number of edges in each subgraph (edges), the number of edges that are shared by all subgraphs (core_edges), and the number of edges that are unique for this particular subgraph (unique_edges).

cvn_edge_summary(fit6)
#>   edges core_edges unique_edges
#> 1    28          1           12
#> 2    18          1            0
#> 3    16          1            2
#> 4    18          1            0
#> 5    16          1            2
#> 6    16          1            4
#> 7    22          1            4
#> 8    16          1            2
#> 9    14          1            2

You have the option to return the result as a matrix or as a heatmap plot. The number in each cell reflects the number of different edges between two graphs.

# Calculate Hamming distance matrix and plot it as heat map
# plot_hamming_distances_cvn(fit3)

hd_matrix <- hamming_distance_adj_matrices(fit6$adj_matrices[[1]]) 
plot_hamming_distances(hd_matrix)

As expected, the heatmap shows that the graph structures change randomly along the two covariates. No pattern can be seen

Interpolation of subgraphs based an previously fitted CVN model

We have also proposed a method to interpolate a subgraph for which we do not have collected data. This can be done using an estimated CVN model based on m graphs. Let us assume that G_m + 1 represent the graph we want to interpolate.

For this, we need to set a vector of smoothing coefficients that determine the amount of smoothing between the new graph G_m + 1 and the m original graphs.

For example, suppose we want to interpolate a graph smoothed by graphs 8 and 9, where the structure is weighted by 0.5 each.

# interpolate generates objects of class 'cvn_interpolate'
interpolate6 <- interpolate(fit6, c(0,0,0,0,0,0,0,0.5,0.5), truncate = 0.05)
cvn6 <- combine_cvn_interpolated(fit6, interpolate6)
# plot with title
plot10 <- visnetwork_cvn(cvn6, verbose = FALSE, 
               title = lapply(1:cvn6$n_lambda_values, 
                         function(i){sapply(1:cvn6$m, function(j) paste("Graph", j))}
                         ))

Note, that the function is still work in progress and we did not yet investigate its performance in simulation study. It should therefore be used with caution.