Note
Go to the end to download the full example code.
Basic example
We demonstrate how to use GGLasso for a SGL problem.
First, we generate a sparse Erdos-Renyi network of 20 nodes. We generate an according precision matrix and sample from it. Here, we use a large
number of samples (N=1000) to demonstrate that it is possible to recover (approximately) the original graph if sufficiently many samples are available.
In many practical applications however, we face the situation of p>N.
# sphinx_gallery_thumbnail_number = 2
from gglasso.helper.data_generation import generate_precision_matrix, group_power_network, sample_covariance_matrix
from gglasso.problem import glasso_problem
from gglasso.helper.basic_linalg import adjacency_matrix
import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
p = 20
N = 1000
Sigma, Theta = generate_precision_matrix(p=p, M=1, style='erdos', prob=0.1, seed=1234)
S, sample = sample_covariance_matrix(Sigma, N)
print("Shape of empirical covariance matrix: ", S.shape)
print("Shape of the sample array: ", sample.shape)
Shape of empirical covariance matrix: (20, 20)
Shape of the sample array: (20, 1000)
Draw the graph of the true precision matrix.
A = adjacency_matrix(Theta)
G = nx.from_numpy_array(A)
pos = nx.drawing.layout.spring_layout(G, seed = 1234)
plt.figure()
nx.draw_networkx(G, pos = pos, node_color = "darkblue", edge_color = "darkblue", font_color = 'white', with_labels = True)

Basic usage of glasso_problem.
We now create an instance of glasso_problem. The problem formulation is derived automatically from the input shape of S.
P = glasso_problem(S, N, reg_params = {'lambda1': 0.05}, latent = False, do_scaling = False)
print(P)
SINGLE GRAPHICAL LASSO PROBLEM
Regularization parameters:
{'lambda1': 0.05, 'mu1': None}
Next, do model selection by solving the problem on a range of \(\lambda_1\) values.
lambda1_range = np.logspace(0, -3, 30)
modelselect_params = {'lambda1_range': lambda1_range}
P.model_selection(modelselect_params = modelselect_params, method = 'eBIC', gamma = 0.1)
# regularization parameters are set to the best ones found during model selection
print(P.reg_params)
ADMM terminated after 12 iterations with status: optimal.
ADMM terminated after 12 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 20 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 24 iterations with status: optimal.
ADMM terminated after 11 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 39 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 40 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 41 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 43 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 13 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 35 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 35 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 24 iterations with status: optimal.
ADMM terminated after 22 iterations with status: optimal.
ADMM terminated after 22 iterations with status: optimal.
ADMM terminated after 21 iterations with status: optimal.
ADMM terminated after 21 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
{'lambda1': 0.09236708571873861, 'mu1': 0}
Plotting the recovered graph and matrix
The solution (i.e. estimated precision matrix) is stored in the P.solution object. We calculate an adjacency matrix thresholding all entries smaller than 1e-4 in absolute value (optional) and draw the corresponding graph.
#tmp = P.modelselect_stats
sol = P.solution.precision_
P.solution.calc_adjacency(t = 1e-4)
fig, axs = plt.subplots(2,2, figsize=(10,8))
node_size = 100
font_size = 9
nx.draw_networkx(G, pos = pos, node_size = node_size, node_color = "darkblue", edge_color = "darkblue", \
font_size = font_size, font_color = 'white', with_labels = True, ax = axs[0,0])
axs[0,0].axis('off')
axs[0,0].set_title("True graph")
G1 = nx.from_numpy_array(P.solution.adjacency_)
nx.draw_networkx(G1, pos = pos, node_size = node_size, node_color = "peru", edge_color = "peru", \
font_size = font_size, font_color = 'white', with_labels = True, ax = axs[0,1])
axs[0,1].axis('off')
axs[0,1].set_title("Recovered graph")
sns.heatmap(Theta, cmap = "coolwarm", vmin = -0.5, vmax = 0.5, linewidth = .5, square = True, cbar = False, \
xticklabels = [], yticklabels = [], ax = axs[1,0])
axs[1,0].set_title("True precision matrix")
sns.heatmap(sol, cmap = "coolwarm", vmin = -0.5, vmax = 0.5, linewidth = .5, square = True, cbar = False, \
xticklabels = [], yticklabels = [], ax = axs[1,1])
axs[1,1].set_title("Recovered precision matrix")

Text(0.5, 1.0, 'Recovered precision matrix')
Total running time of the script: (0 minutes 3.083 seconds)