Network Statistics
Permutation allows generating randomized data that mimcs the key properties of the original vertex weight distribution. With enough permutations, it's possible to analyze how different are the results of network diffusion based on real weights in comparison to permuted weights.
The package allows doing this analysis at the level of individual vertices (HierarchicalHotNet.vertex_stats
), directed edges (HierarchicalHotNet.diedge_stats
), connected components (HierarchicalHotNet.conncomponents_stats
) etc.
The statistcs from multiple permutation and cutting thresholds could be binned (HierarchicalHotNet.bin_treecut_stats
) and then aggregated for calculating the quantiles of resulting distributions (HierarchicalHotNet.aggregate_treecut_binstats
). Finally, HierarchicalHotNet.extreme_treecut_stats
can find the edge cutting threshold with the maximal difference between the real and permuted weights.
HierarchicalHotNet.vertex_stats
— Functionvertex_stats(weights::AbstractVector{<:Number},
walkweights::AbstractVector{<:Number},
[permweights::AbstractMatrix{<:Number}],
[walkpermweights::AbstractMatrix{<:Number}]) -> DataFrame
Calculates statistics for the permuted vertex weights distribution and how it is different from the actual weights.
Returns the data frame with per-vertex mean, standard deviation, median, MAD and the probability that permuted value is greater/lower than the corresponding real value for the weights of the original matrix (weights
) as well as random walk matrix weights (walkweights
).
Parameters
weights
: weights of the vertices in the original networkwalkweights
: weights of the vertices after network diffusion analysis (stationary random walk distribution)permweights
: matrix of permuted weights; rows correspond to vertices, columns – to permutationswalkpermweights
: matrix of vertex weights based on network diffusion analysis usingpermweights
as input; rows correspond to vertices, columns to permutations
HierarchicalHotNet.diedge_stats
— Functiondiedge_stats(weights::AbstractVector{<:Number},
walkweights::AbstractVector{<:Number},
[permweights::AbstractMatrix{<:Number}],
[walkpermweights::AbstractMatrix{<:Number}]) -> DataFrame
Calculates statistics for the directed edges permuted weights distribution and how it is different from the actual weights of directed edges.
The output is similar to HierarchicalHotNet.vertex_stats
for vertices.
HierarchicalHotNet.conncomponents_stats
— FunctionCalculate per-connected component statistics.
HierarchicalHotNet.treecut_stats
— Functiontreecut_stats(tree::SCCTree;
[walkmatrix::AbstractMatrix],
[maxweight::Number],
[sources], [sinks], [sourcesinkweights], [top_count],
[pools]) -> DataFrame
Calculate SCC network statistic for each cutting threshold of tree
.
HierarchicalHotNet.treecut_compstats
— Functiontreecut_compstats(tree::SCCTree,
vertex_weights::AbstractVector,
vertex_walkweights::AbstractVector,
perm_vertex_weights::AbstractMatrix,
perm_vertex_walkweights::AbstractMatrix;
[mannwhitney_tests::Bool],
[pvalue_mw_max::Number],
[pvalue_fisher_max::Number],
[pools]) -> DataFrame
Calculate SCC network statistic for each cutting threshold of tree
.
HierarchicalHotNet.bin_treecut_stats
— Functionbin_treecut_stats(cutstats_df::AbstractDataFrame) -> DataFrame
Bin treecut thresholds and calculate average statistics in each bin.
Takes the output of HierarchicalHotNet.treecut_stats
from multiple SCC trees (discriminated by by_cols
), identifies the bind for treecut thresholds and calculates the average metric values (stat_cols
) within each bin.
HierarchicalHotNet.aggregate_treecut_binstats
— Functionaggregate_treecut_binstats(binstats_df::AbstractDataFrame) -> DataFrame
Aggregate the binned treecut statistics across multiple trees.
Takes binstats_df
, the output of HierarchicalHotNet.bin_treecut_stats
, and calculates the metric values for the specified quantiles.
HierarchicalHotNet.extreme_treecut_stats
— Functionextreme_treecut_stats(stats_df::AbstractDataFrame) -> DataFrame
Calculate the cut threshold and corresponding metric value, where the difference between real (taken from stats_df
) and permutation metrics (taken from perm_aggstats_df
) are maximal/minimal (depending on the metric).
Arguments
stats_df
: tree statistics calculated bytreecut_stats
perm_aggstats_df
: aggregated binned permutated tree statistics calculated byaggregate_treecut_binstats
extra_join_cols
: optional columns, in addition to:threshold_bin
to use for joiningstats_df
andperm_aggstats_df
metric_cols
: columns ofstats_df
andperm_aggstats_df
containing treecut metrics to consider for threshold calculation (seeTreecutMetrics
)start_maxquantile
: if specified, calculates (in addition to minimal and maximal metric) the metric corresponding to the given quantile as well as $1 - quantile$threshold_range
: if given, contrains metric statistic calculation to given min/max thresholdsthreshold_weight
: optional function that takes stats_df row and returns the prior weight of the corresponding cut threshold
HierarchicalHotNet.TreecutMetrics
— Constanttreecut_stats()
metrics (dataframe columns) to consider for bin_treecut_stats()
and extreme_treecut_stats()
.