Basic example

We demonstrate how to use GGLasso for a SGL problem. First, we generate a sparse Erdos-Renyi network of 20 nodes. We generate an according precision matrix and sample from it. Here, we use a large number of samples (N=1000) to demonstrate that it is possible to recover (approximately) the original graph if sufficiently many samples are available.

In many practical applications however, we face the situation of p>N.

# sphinx_gallery_thumbnail_number = 2

from gglasso.helper.data_generation import generate_precision_matrix, group_power_network, sample_covariance_matrix
from gglasso.problem import glasso_problem
from gglasso.helper.basic_linalg import adjacency_matrix

import networkx as nx
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

p = 20
N = 1000

Sigma, Theta = generate_precision_matrix(p=p, M=1, style='erdos', prob=0.1, seed=1234)

S, sample = sample_covariance_matrix(Sigma, N)

print("Shape of empirical covariance matrix: ", S.shape)
print("Shape of the sample array: ", sample.shape)
Shape of empirical covariance matrix:  (20, 20)
Shape of the sample array:  (20, 1000)

Draw the graph of the true precision matrix.

A = adjacency_matrix(Theta)

G = nx.from_numpy_array(A)
pos = nx.drawing.layout.spring_layout(G, seed = 1234)

plt.figure()
nx.draw_networkx(G, pos = pos, node_color = "darkblue", edge_color = "darkblue", font_color = 'white', with_labels = True)
plot basic example

Basic usage of glasso_problem.

We now create an instance of glasso_problem. The problem formulation is derived automatically from the input shape of S.

P = glasso_problem(S, N, reg_params = {'lambda1': 0.05}, latent = False, do_scaling = False)
print(P)
SINGLE GRAPHICAL LASSO PROBLEM
Regularization parameters:
{'lambda1': 0.05, 'mu1': None}

Next, do model selection by solving the problem on a range of \(\lambda_1\) values.

lambda1_range = np.logspace(0, -3, 30)
modelselect_params = {'lambda1_range': lambda1_range}

P.model_selection(modelselect_params = modelselect_params, method = 'eBIC', gamma = 0.1)

# regularization parameters are set to the best ones found during model selection
print(P.reg_params)
ADMM terminated after 12 iterations with status: optimal.
ADMM terminated after 12 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 20 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 24 iterations with status: optimal.
ADMM terminated after 11 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 39 iterations with status: optimal.
ADMM terminated after 10 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 40 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 41 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 43 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 13 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 9 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 8 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 35 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 35 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 29 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 30 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 24 iterations with status: optimal.
ADMM terminated after 22 iterations with status: optimal.
ADMM terminated after 22 iterations with status: optimal.
ADMM terminated after 21 iterations with status: optimal.
ADMM terminated after 21 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
{'lambda1': 0.09236708571873861, 'mu1': 0}

Plotting the recovered graph and matrix

The solution (i.e. estimated precision matrix) is stored in the P.solution object. We calculate an adjacency matrix thresholding all entries smaller than 1e-4 in absolute value (optional) and draw the corresponding graph.

#tmp = P.modelselect_stats
sol = P.solution.precision_
P.solution.calc_adjacency(t = 1e-4)


fig, axs = plt.subplots(2,2, figsize=(10,8))
node_size = 100
font_size = 9

nx.draw_networkx(G, pos = pos, node_size = node_size, node_color = "darkblue", edge_color = "darkblue", \
                 font_size = font_size, font_color = 'white', with_labels = True, ax = axs[0,0])
axs[0,0].axis('off')
axs[0,0].set_title("True graph")

G1 = nx.from_numpy_array(P.solution.adjacency_)
nx.draw_networkx(G1, pos = pos, node_size = node_size, node_color = "peru", edge_color = "peru", \
               font_size = font_size, font_color = 'white', with_labels = True, ax = axs[0,1])
axs[0,1].axis('off')
axs[0,1].set_title("Recovered graph")

sns.heatmap(Theta, cmap = "coolwarm", vmin = -0.5, vmax = 0.5, linewidth = .5, square = True, cbar = False, \
            xticklabels = [], yticklabels = [], ax = axs[1,0])
axs[1,0].set_title("True precision matrix")

sns.heatmap(sol, cmap = "coolwarm", vmin = -0.5, vmax = 0.5, linewidth = .5, square = True, cbar = False, \
            xticklabels = [], yticklabels = [], ax = axs[1,1])
axs[1,1].set_title("Recovered precision matrix")
True graph, Recovered graph, True precision matrix, Recovered precision matrix
Text(0.5, 1.0, 'Recovered precision matrix')

Total running time of the script: (0 minutes 3.083 seconds)

Gallery generated by Sphinx-Gallery