Network Statistics

Permutation allows generating randomized data that mimcs the key properties of the original vertex weight distribution. With enough permutations, it's possible to analyze how different are the results of network diffusion based on real weights in comparison to permuted weights.

The package allows doing this analysis at the level of individual vertices (HierarchicalHotNet.vertex_stats), directed edges (HierarchicalHotNet.diedge_stats), connected components (HierarchicalHotNet.conncomponents_stats) etc.

The statistcs from multiple permutation and cutting thresholds could be binned (HierarchicalHotNet.bin_treecut_stats) and then aggregated for calculating the quantiles of resulting distributions (HierarchicalHotNet.aggregate_treecut_binstats). Finally, HierarchicalHotNet.extreme_treecut_stats can find the edge cutting threshold with the maximal difference between the real and permuted weights.

HierarchicalHotNet.vertex_statsFunction
vertex_stats(weights::AbstractVector{<:Number},
             walkweights::AbstractVector{<:Number},
             [permweights::AbstractMatrix{<:Number}],
             [walkpermweights::AbstractMatrix{<:Number}]) -> DataFrame

Calculates statistics for the permuted vertex weights distribution and how it is different from the actual weights.

Returns the data frame with per-vertex mean, standard deviation, median, MAD and the probability that permuted value is greater/lower than the corresponding real value for the weights of the original matrix (weights) as well as random walk matrix weights (walkweights).

Parameters

  • weights: weights of the vertices in the original network
  • walkweights: weights of the vertices after network diffusion analysis (stationary random walk distribution)
  • permweights: matrix of permuted weights; rows correspond to vertices, columns – to permutations
  • walkpermweights: matrix of vertex weights based on network diffusion analysis using permweights as input; rows correspond to vertices, columns to permutations
source
HierarchicalHotNet.diedge_statsFunction
diedge_stats(weights::AbstractVector{<:Number},
             walkweights::AbstractVector{<:Number},
             [permweights::AbstractMatrix{<:Number}],
             [walkpermweights::AbstractMatrix{<:Number}]) -> DataFrame

Calculates statistics for the directed edges permuted weights distribution and how it is different from the actual weights of directed edges.

The output is similar to HierarchicalHotNet.vertex_stats for vertices.

source
HierarchicalHotNet.treecut_statsFunction
treecut_stats(tree::SCCTree;
              [walkmatrix::AbstractMatrix],
              [maxweight::Number],
              [sources], [sinks], [sourcesinkweights], [top_count],
              [pools]) -> DataFrame

Calculate SCC network statistic for each cutting threshold of tree.

source
HierarchicalHotNet.treecut_compstatsFunction
treecut_compstats(tree::SCCTree,
                  vertex_weights::AbstractVector,
                  vertex_walkweights::AbstractVector,
                  perm_vertex_weights::AbstractMatrix,
                  perm_vertex_walkweights::AbstractMatrix;
                  [mannwhitney_tests::Bool],
                  [pvalue_mw_max::Number],
                  [pvalue_fisher_max::Number],
                  [pools]) -> DataFrame

Calculate SCC network statistic for each cutting threshold of tree.

source
HierarchicalHotNet.bin_treecut_statsFunction
bin_treecut_stats(cutstats_df::AbstractDataFrame) -> DataFrame

Bin treecut thresholds and calculate average statistics in each bin.

Takes the output of HierarchicalHotNet.treecut_stats from multiple SCC trees (discriminated by by_cols), identifies the bind for treecut thresholds and calculates the average metric values (stat_cols) within each bin.

source
HierarchicalHotNet.extreme_treecut_statsFunction
extreme_treecut_stats(stats_df::AbstractDataFrame) -> DataFrame

Calculate the cut threshold and corresponding metric value, where the difference between real (taken from stats_df) and permutation metrics (taken from perm_aggstats_df) are maximal/minimal (depending on the metric).

Arguments

  • stats_df: tree statistics calculated by treecut_stats
  • perm_aggstats_df: aggregated binned permutated tree statistics calculated by aggregate_treecut_binstats
  • extra_join_cols: optional columns, in addition to :threshold_bin to use for joining stats_df and perm_aggstats_df
  • metric_cols: columns of stats_df and perm_aggstats_df containing treecut metrics to consider for threshold calculation (see TreecutMetrics)
  • start_maxquantile: if specified, calculates (in addition to minimal and maximal metric) the metric corresponding to the given quantile as well as $1 - quantile$
  • threshold_range: if given, contrains metric statistic calculation to given min/max thresholds
  • threshold_weight: optional function that takes stats_df row and returns the prior weight of the corresponding cut threshold
source