GSP_DEMO_LEARN_GRAPH_LARGE - Tutorial for graph learning using the GSPBox

Program code:

%GSP_DEMO_LEARN_GRAPH_LARGE Tutorial for graph learning using the GSPBox
%
%   This is a graph learning tutorial using GSP box. It deals with most
%   aspects needed to know in order to use graph learning:
%
%   1) standard graph learning [Kalofolias 2016]
%   2) setting params automatically for given sparsity [Kalofolias, Perraudin 2019]
%   3) setting zero-edges up-front [Kalofolias, Perraudin 2019]
%   4) large scale graph learning [Kalofolias, Perraudin 2019]
%
%   Let's create some artificial coordinates data:
%
%         gsp_reset_seed(1);
%         G = gsp_2dgrid(16);
%         n = G.N;
%         W0 = G.W;
%         G = gsp_graph(G.W, G.coords+.05*rand(size(G.coords)));
%         figure; gsp_plot_graph(G);
%
%   Figure 1: Artificial data
%
% 
%
%   Now suppose our data is the coordinates (x, y) of the points in space.
%   Note that the data could often be a signal on these points, for example
%   see GSP_DEMO_LEARN_GRAPH :
%
%         X = G.coords';
%
%   Right now the weights matrix W is all zeros and ones, but let's make it
%   follow the geometry by learning weights from the new coordinates:
% 
%         % First compute the matrix of all pairwise distances Z
%         Z = gsp_distanz(X).^2;
%         % this is a highly dense matrix
%         figure; imagesc(Z);
%         % Let's learn a graph, we need to know two parameters a and b according to
%         % [1, Kalofolias, 2016]
%         a = 1;
%         b = 1;
%         W = gsp_learn_graph_log_degrees(Z, a, b);
%         % this matrix is naturally sparse, but still has many connections unless we
%         % "play" with the parameters a and b to make it sparser
%         figure; imagesc(W); 
%         % update weights
%         W(W<1e-4) = 0;
%         G = gsp_update_weights(G, W);
%
%   Figure 2: Distance matrix
%
% 
%
%   Figure 3: Learned weights using $a=1$ and $b=1$
%
% 
%
%   A practical way of setting parameters is explained in our last
%   publication [2, Kalofolias, Perraudin 2017]. Instead of setting a and
%   b, we need one parameter theta changing sparsity. An automatic way to
%   find a good approximation given a desired sparsity level is with
%   function gsp_compute_graph_learning_theta :
% 
%         % suppose we want 4 edges per node on average
%         k = 4;
%         theta = gsp_compute_graph_learning_theta(Z, k); 
%         % then instead of playing with a and b, we k keep them equal to 1 and
%         % multiply Z by theta for graph learning:
%         t1 = tic;
%         [W, info_1] = gsp_learn_graph_log_degrees(theta  Z, 1, 1);
%         t1 = toc(t1);
%         % clean edges close to zero
%         W(W<1e-4) = 0; 
%         % indeed, the new adjacency matrix has around 4 non zeros per row!
%         figure; imagesc(W); title(['Average edges/node: ', num2str(nnz(W)/G.N)]); 
%         % update weights and plot
%         G = gsp_update_weights(G, W);
%         figure; gsp_plot_graph(G);
%
%
%   Figure 4: Learned weights using automatic parameter selection
%
% 
%
%   Figure 5: Learned graph using automatic parameter selection
%
%
%
%   An interesting feature is that we can give a pattern of allowed edges,
%   keeping all the others to zero. This is important to add contstraints
%   to some problems, or just to reduce computations for others:
% 
%         % Suppose for example that we know beforehand that no connection could or
%         % should be formed before pairs of nodes with distance more than 0.02. We
%         % create a mask with the pattern of allowed edges and pass it to the
%         % learning algorithm in a parameters structure:
%         params.edge_mask = zero_diag(Z < 0.02);
%         % we also set the flag fix_zeros to 1:
%         params.fix_zeros = 1;
%         [W2, info_2] = gsp_learn_graph_log_degrees(theta  Z, 1, 1, params);
%         % clean edges close to zero
%         W2(W2<1e-4) = 0;
%         % indeed, the new adjacency matrix has around 4 non zeros per row!
%         figure; imagesc(W2); title(['Average edges/node: ', num2str(nnz(W2)/G.N)]);
%         % update weights and plot
%         G = gsp_update_weights(G, W2);
%         figure; gsp_plot_graph(G);
%         fprintf('The computation was %.1f times faster than before!n', info_1.time / info_2.time);
%         fprintf('Relative difference between two solutions: %.4fn', norm(W-W2, 'fro')/norm(W, 'fro'));
%         % Note that the learned graph is sparser than the mask we gave as input.
%         figure; subplot(1, 2, 1); spy(W2); title('W_2'); subplot(1, 2, 2), spy(params.edge_mask); title('edge mask');
%
%   Figure 6: Learned weights using a pattern of allowed edges
%
% 
%
%   Figure 7: Learned graph using a pattern of allowed edges
%
%
%
%   Figure 8: Edges selected by the algorithm
%
%
%
%   Actually this feature is the one that allows us to work with big data.
%   Suppose the graph has n nodes. Classical graph learning [Kalofolias
%   2016] costs O(n^2) computations. If the mask we give has O(n) edges,
%   then the computation drops to O(n) as well. The question is, how can we
%   compute efficiently a mask of O(n) allowed edges for general data, even
%   if we don't have prior knowledge? Note that computing the full matrix Z
%   already costs O(n^2) and we want to avoid it! The solution is
%   Approximate Nearest Neighbors (ANN) that computes an approximate sparse
%   matrix Z with much less computations, roughly O(nlog(n)). This is the
%   idea behind [Kalofolias, Perraudin 2017] :
%
%         % We compute an approximate Nearest Neighbors graph (using the FLANN
%         % library through GSP-box)
%         params_NN.use_flann = 1;
%         % we ask for an approximate k-NN graph with roughly double number of edges.
%         % The +1 is because FLANN also counts each node as its own neighbor.
%         params_NN.k = 2  k + 1;
%         clock_flann = tic; 
%         [indx, indy, dist, ~, ~, ~, NN, Z_sorted] = gsp_nn_distanz(X, X, params_NN);
%         time_flann = toc(clock_flann);
%         fprintf('Time for FLANN: %.3f secondsn', toc(clock_flann));
%         % gsp_nn_distanz gives distance matrix in a form ready to use with sparse:
%         Z_sp = sparse(indx, indy, dist.^2, n, n, params_NN.k  n  2);
%         % symmetrize the distance matrix
%         Z_sp = gsp_symmetrize(Z_sp, 'full');
%         % get rid or first row that is zero (each node has 0 distance from itself)
%         Z_sorted = Z_sorted(:, 2:end).^2;   % first row is zero
%         % Note that FLANN returns Z already sorted, that further saves computation
%         % for computing the parameter theta.
%         Z_is_sorted = true;
%         theta = gsp_compute_graph_learning_theta(Z_sorted, k, 0, Z_is_sorted);
%         params.edge_mask = Z_sp > 0;
%         params.fix_zeros = 1;
%         [W3, info_3] = gsp_learn_graph_log_degrees(Z_sp  theta, 1, 1, params);
%         W3(W3<1e-4) = 0;
%         fprintf('Relative difference between two solutions: %.4fn', norm(W-W2, 'fro')/norm(W, 'fro'));
%         % Note that the learned graph is sparser than the mask we gave as input.
%         figure; subplot(1, 2, 1); spy(W3); title('W_3'); subplot(1, 2, 2), spy(params.edge_mask); title('edge mask');
%
%   Figure 9: Edges selected by the algorithm
%
%
%
%   Note that we can alternatively use the vector form of Z, W, and the
%   mask, but always AFTER symmetrizing the distance matrix Z. Make sure
%   you use our sparse version squareform_sp.m instead of Matlab's native
%   squareform.m, or for big graphs you might have memory issues:
% 
%         z_sp = squareform_sp(Z_sp);
%         params.edge_mask = z_sp > 0;
%         % the output will be also in vectorform if the distances were in vectorform
%         [w3, info_3_sp] = gsp_learn_graph_log_degrees(z_sp  theta, 1, 1, params);
%         w3(w3<1e-4) = 0;
%         norm(w3 - squareform_sp(W3)) / norm(w3)
%
%   Enjoy!
%
%   See also: gsp_demo_learn_graph
%
%   References:
%     V. Kalofolias. How to learn a graph from smooth signals. Technical
%     report, AISTATS 2016: proceedings at Journal of Machine Learning
%     Research (JMLR)., 2016.
%     
%     V. Kalofolias and N. Perraudin. Large scale graph learning from smooth
%     signals. International Conference on Learning Representations, 2019.
%     
%
%   Url: https://epfl-lts2.github.io/gspbox-html/doc/demos/gsp_demo_learn_graph_large.html

% Copyright (C) 2013-2016 Nathanael Perraudin, Johan Paratte, David I Shuman.
% This file is part of GSPbox version 0.7.5
%
% This program is free software: you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation, either version 3 of the License, or
% (at your option) any later version.
%
% This program is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
% GNU General Public License for more details.
%
% You should have received a copy of the GNU General Public License
% along with this program.  If not, see <http://www.gnu.org/licenses/>.

% If you use this toolbox please kindly cite
%     N. Perraudin, J. Paratte, D. Shuman, V. Kalofolias, P. Vandergheynst,
%     and D. K. Hammond. GSPBOX: A toolbox for signal processing on graphs.
%     ArXiv e-prints, Aug. 2014.
% http://arxiv.org/abs/1408.5781

% code author: Vassilis Kalofolias
% date: summer 2017
% test_prox_log_sum (for automatization of regularization constants)

warning off

%% Let's create some artificial coordinates data
gsp_reset_seed(1);
G = gsp_2dgrid(16);
n = G.N;
W0 = G.W;
G = gsp_graph(G.W, G.coords+.05*rand(size(G.coords)));
figure; gsp_plot_graph(G);

% Now suppose our data is the coordinates (x, y) of the points in space.
% Note that the data could often be a signal on these points, for example
% see gsp_demo_learn_graph.
X = G.coords';

%% Right now the weights matrix W is all zeros and ones, but let's make it 
%% follow the geometry by learning weights from the new coordinates:

% First compute the matrix of all pairwise distances Z
Z = gsp_distanz(X).^2;
% this is a highly dense matrix
figure; imagesc(Z);

% Let's learn a graph, we need to know two parameters a and b according to
% [Kalofolias, 2016]
a = 1;
b = 1;
W = gsp_learn_graph_log_degrees(Z, a, b);

% this matrix is naturally sparse, but still has many connections unless we
% "play" with the parameters a and b to make it sparser
figure; imagesc(W);

% update weights and plot
W(W<1e-4) = 0;
G = gsp_update_weights(G, W);
% figure; gsp_plot_graph(G);

%% A practical way of setting parameters is explained in our last 
%% publication [Kalofolias, Perraudin 2017].
% Instead of setting a and b, we need one parameter theta changing
% sparsity. An automatic way to find a good approximation given a desired
% sparsity level is with function gsp_compute_graph_learning_theta. 

% suppose we want 4 edges per node on average
k = 4;
theta = gsp_compute_graph_learning_theta(Z, k);

% then instead of playing with a and b, we k keep them equal to 1 and
% multiply Z by theta for graph learning:
t1 = tic;
[W, info_1] = gsp_learn_graph_log_degrees(theta * Z, 1, 1);
t1 = toc(t1);
% clean edges close to zero
W(W<1e-4) = 0;

% indeed, the new adjacency matrix has around 4 non zeros per row!
figure; imagesc(W); title(['Average edges/node: ', num2str(nnz(W)/G.N)]);

% update weights and plot
G = gsp_update_weights(G, W);
figure; gsp_plot_graph(G);



%% An interesting feature is that we can give a pattern of allowed edges, 
%% keeping all the others to zero. This is important to add contstraints to
%% some problems, or just to reduce computations for others.

% Suppose for example that we know beforehand that no connection could or
% should be formed before pairs of nodes with distance more than 0.02. We
% create a mask with the pattern of allowed edges and pass it to the
% learning algorithm in a parameters structure:
params.edge_mask = zero_diag(Z < 0.02);
% we also set the flag fix_zeros to 1:
params.fix_zeros = 1;

[W2, info_2] = gsp_learn_graph_log_degrees(theta * Z, 1, 1, params);

% clean edges close to zero
W2(W2<1e-4) = 0;

% indeed, the new adjacency matrix has around 4 non zeros per row!
figure; imagesc(W2); title(['Average edges/node: ', num2str(nnz(W2)/G.N)]);

% update weights and plot
G = gsp_update_weights(G, W2);
figure; gsp_plot_graph(G);

fprintf('The computation was %.1f times faster than before!\n', info_1.time / info_2.time);
fprintf('Relative difference between two solutions: %.4f\n', norm(W-W2, 'fro')/norm(W, 'fro'));
% Note that the learned graph is sparser than the mask we gave as input.

figure; subplot(1, 2, 1); spy(W2); title('W_2'); subplot(1, 2, 2), spy(params.edge_mask); title('edge mask');


%% Actually this feature is the one that allows us to work with big data. 
%% Suppose the graph has n nodes. Classical graph learning [Kalofolias 2016]
%% costs O(n^2) computations. If the mask we give has O(n) edges, then the
%% computation drops to O(n) as well. The question is, how can we compute
%% efficiently a mask of O(n) allowed edges for general data, even if we 
%% don't have prior knowledge? Note that computing the full matrix Z 
%% already costs O(n^2) and we want to avoid it! The solution is Approximate
%% Nearest Neighbors (ANN) that computes an approximate sparse matrix Z with
%% much less computations, roughly O(nlog(n)). This is the idea behind
%% [Kalofolias, Perraudin 2017]

% We compute an approximate Nearest Neighbors graph (using the FLANN
% library through GSP-box)
params_NN.use_flann = 1;
% we ask for an approximate k-NN graph with roughly double number of edges.
% The +1 is because FLANN also counts each node as its own neighbor.
params_NN.k = 2 * k + 1;


clock_flann = tic; 
[indx, indy, dist, ~, ~, ~, NN, Z_sorted] = gsp_nn_distanz(X, X, params_NN);
time_flann = toc(clock_flann);
fprintf('Time for FLANN: %.3f seconds\n', toc(clock_flann));

% gsp_nn_distanz gives distance matrix in a form ready to use with sparse:
Z_sp = sparse(indx, indy, dist.^2, n, n, params_NN.k * n * 2);
% symmetrize the distance matrix
Z_sp = gsp_symmetrize(Z_sp, 'full');
% get rid or first row that is zero (each node has 0 distance from itself)
Z_sorted = Z_sorted(:, 2:end).^2;   % first row is zero

% Note that FLANN returns Z already sorted, that further saves computation
% for computing the parameter theta.
Z_is_sorted = true;
theta = gsp_compute_graph_learning_theta(Z_sorted, k, 0, Z_is_sorted);

params.edge_mask = Z_sp > 0;
params.fix_zeros = 1;

[W3, info_3] = gsp_learn_graph_log_degrees(Z_sp * theta, 1, 1, params);
W3(W3<1e-4) = 0;

fprintf('Relative difference between two solutions: %.4f\n', norm(W-W2, 'fro')/norm(W, 'fro'));
% Note that the learned graph is sparser than the mask we gave as input.

figure; subplot(1, 2, 1); spy(W3); title('W_3'); subplot(1, 2, 2), spy(params.edge_mask); title('edge mask');


%% Note that we can alternatively use the vector form of Z, W, and the mask,
%% but always AFTER symmetrizing the distance matrix Z. Make sure you use 
%% our sparse version squareform_sp.m instead of Matlab's native squareform.m,
%% or for big graphs you might have memory issues

z_sp = squareform_sp(Z_sp);
params.edge_mask = z_sp > 0;
% the output will be also in vectorform if the distances were in vectorform
[w3, info_3_sp] = gsp_learn_graph_log_degrees(z_sp * theta, 1, 1, params);
w3(w3<1e-4) = 0;

norm(w3 - squareform_sp(W3)) / norm(w3)