Nonconforming Group Graphical Lasso experiment

Example script for Group Graphical Lasso with non-conforming dimension, i.e. some variables exist in some instances but not in all. We generate one underlying precision matrix and then drop one block of variables in each instance.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
import seaborn as sns

from gglasso.helper.ext_admm_helper import create_group_array, construct_indexer
from gglasso.helper.utils import sparsity
from gglasso.helper.data_generation import generate_precision_matrix, group_power_network, sample_covariance_matrix
from gglasso.helper.ext_admm_helper import check_G, consensus

from gglasso.problem import glasso_problem

K = 4
p = 50
M = 10
B = int(p/M)
N = 200

Generating the data

We generate one precision matrix, sample observations, and finally drop one block of variables in each instance. It is important that the observations for each instance is a DataFrame of shape (p_k,N_k) where the index has unique ids for each variable.

p_arr = (p-B)*np.ones(K, dtype = int)
num_samples = N*np.ones(K)

Sigma, Theta = generate_precision_matrix(p=p, M=M, style = 'powerlaw', gamma = 2.8, prob = 0.1, seed = 3456)

all_obs = dict()
S = dict()
for k in np.arange(K):

    _, obs = sample_covariance_matrix(Sigma, N, seed = 456)

    # drop the k-th block starting from the end
    all_obs[k] = pd.DataFrame(obs).drop(np.arange(p-(k+1)*B, p-k*B), axis = 0)
    S[k] = np.cov(all_obs[k], bias = True)

Creating the groups

In this section, we create two important objects for the non-conforming case using functionalities of GGLasso:

ix_location is a dataframe with every variable as index and each columns is one dataset. It contains the index of the variable in the respective instance (NaN when not present).

G is a bookeeping array which keeps count of the indices of each group of overlapping variables. It is needed in the solver later on.

Important: We only consider pairs of variables which appear at least in 3 instances here!

ix_exist, ix_location = construct_indexer(list(all_obs.values()))

G = create_group_array(ix_exist, ix_location, min_inst = K-1)

check_G(G, p)

print("Dimensions p_k: ", p_arr)

print("Sample sizes N_k: ", num_samples)

print("Number of groups found: ", G.shape[1])
Creation of bookeeping array...
10% finished
20% finished
30% finished
40% finished
50% finished
60% finished
70% finished
80% finished
90% finished
Dimensions p_k:  [45 45 45 45]
Sample sizes N_k:  [200. 200. 200. 200.]
Number of groups found:  1075

Visualizing

We visualize the case of non-conforming variables by plotting the given empirical covariance matrices. Missing variable observations are in white.

fig, axs = plt.subplots(2,2, figsize = (8,8))
for k in range(K):
    ind = ix_exist.index[ix_exist.loc[:,k]]
    S_k = pd.DataFrame(S[k], index = ind, columns = ind)

    # extend matrix by nonexistent variables
    S_k = S_k.reindex(columns = ix_exist.index, index = ix_exist.index)

    ax = axs.ravel()[k]
    sns.heatmap(S_k, ax = ax, cmap = plt.cm.coolwarm, linewidth = 0.005, linecolor = 'lightgrey',\
                cbar = False, vmin = -.5, vmax = .5, xticklabels = [], yticklabels = [])
    ax.set_title(f"Empirical covariance, instance {k}")
Empirical covariance, instance 0, Empirical covariance, instance 1, Empirical covariance, instance 2, Empirical covariance, instance 3

Defining the GGL problem

We now create the instance of Group Graphical Lasso problem. As we are in the non-conforming case, we need to spcify the array G which we created before.

P = glasso_problem(S = S, N = N, reg = "GGL", reg_params = None, latent = True, G = G, do_scaling = True)
print(P)
GROUP GRAPHICAL LASSO PROBLEM WITH LATENT VARIABLES
Regularization parameters:
{'lambda1': None, 'lambda2': None, 'mu1': None}

Model selection

Set the regularization parameter grids and do model selection.

l1 =    np.logspace(0,-2,7)
mu1 =   np.logspace(1,-1,3)
l2 =    np.logspace(0,-2,4)

modelselect_params = {'lambda1_range' : l1, 'mu1_range': mu1, 'lambda2_range': l2}

P.model_selection(modelselect_params = modelselect_params, method = 'eBIC', gamma = 0.1, tol = 1e-7, rtol = 1e-7)

print(P.reg_params)
------------Range search for instance 0------------
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 80 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 95 iterations with status: optimal.
ADMM terminated after 36 iterations with status: optimal.
ADMM terminated after 38 iterations with status: optimal.
ADMM terminated after 93 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 47 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
------------Range search for instance 1------------
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 82 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 82 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 100 iterations with status: optimal.
ADMM terminated after 37 iterations with status: optimal.
ADMM terminated after 39 iterations with status: optimal.
ADMM terminated after 90 iterations with status: optimal.
ADMM terminated after 22 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 49 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
------------Range search for instance 2------------
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 80 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 94 iterations with status: optimal.
ADMM terminated after 37 iterations with status: optimal.
ADMM terminated after 39 iterations with status: optimal.
ADMM terminated after 52 iterations with status: optimal.
ADMM terminated after 23 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 49 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
------------Range search for instance 3------------
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 82 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 82 iterations with status: optimal.
ADMM terminated after 27 iterations with status: optimal.
ADMM terminated after 25 iterations with status: optimal.
ADMM terminated after 81 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 95 iterations with status: optimal.
ADMM terminated after 37 iterations with status: optimal.
ADMM terminated after 39 iterations with status: optimal.
ADMM terminated after 98 iterations with status: optimal.
ADMM terminated after 24 iterations with status: optimal.
ADMM terminated after 26 iterations with status: optimal.
ADMM terminated after 51 iterations with status: optimal.
ADMM terminated after 33 iterations with status: optimal.
ADMM terminated after 31 iterations with status: optimal.
ADMM terminated after 34 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 18 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 19 iterations with status: optimal.
ADMM terminated after 21 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 32 iterations with status: optimal.
ADMM terminated after 20 iterations with status: optimal.
ADMM terminated after 28 iterations with status: optimal.
ADMM terminated after 45 iterations with status: optimal.
ADMM terminated after 43 iterations with status: optimal.
ADMM terminated after 144 iterations with status: optimal.
ADMM terminated after 143 iterations with status: optimal.
ADMM terminated after 271 iterations with status: optimal.
ADMM terminated after 214 iterations with status: optimal.
ADMM terminated after 140 iterations with status: optimal.
ADMM terminated after 144 iterations with status: optimal.
ADMM terminated after 254 iterations with status: optimal.
ADMM terminated after 176 iterations with status: optimal.
ADMM terminated after 208 iterations with status: optimal.
ADMM terminated after 219 iterations with status: optimal.
ADMM terminated after 225 iterations with status: optimal.
ADMM terminated after 165 iterations with status: optimal.
{'lambda1': 0.2154434690031884, 'lambda2': 0.01, 'mu1': array([10., 10., 10., 10.])}

Evaluation of results

We print the ratio of non-zero entries per instance. We plot the distribution of non-zero entries per group. With group sparsity regularization we aim for many groups with either no non-zero or many non-zero entries per group.

print("Solution sparsity (ratio of nonzero entries): ")

for k in np.arange(K):
    print(f"Instance {k}: ", sparsity(P.solution.precision_[k]))


stats = P.modelselect_stats.copy()
nnz,_,_ = consensus(P.solution.precision_, G)

fig, ax = plt.subplots()
sns.histplot(nnz, discrete = True, ax = ax)
ax.set_yscale('log')
ax.set_title('Nonzero entries per group')
Nonzero entries per group
Solution sparsity (ratio of nonzero entries):
Instance 0:  0.03636363636363636
Instance 1:  0.03636363636363636
Instance 2:  0.03737373737373737
Instance 3:  0.03737373737373737

Text(0.5, 1.0, 'Nonzero entries per group')

Total running time of the script: (0 minutes 14.486 seconds)

Gallery generated by Sphinx-Gallery