Network Statistics
Permutation allows generating randomized data that mimcs the key properties of the original vertex weight distribution. With enough permutations, it's possible to analyze how different are the results of network diffusion based on real weights in comparison to permuted weights.
The package allows doing this analysis at the level of individual vertices (HierarchicalHotNet.vertex_stats), directed edges (HierarchicalHotNet.diedge_stats), connected components (HierarchicalHotNet.conncomponents_stats) etc.
The statistcs from multiple permutation and cutting thresholds could be binned (HierarchicalHotNet.bin_treecut_stats) and then aggregated for calculating the quantiles of resulting distributions (HierarchicalHotNet.aggregate_treecut_binstats). Finally, HierarchicalHotNet.extreme_treecut_stats can find the edge cutting threshold with the maximal difference between the real and permuted weights.
HierarchicalHotNet.vertex_stats — Functionvertex_stats(weights::AbstractVector{<:Number},
walkweights::AbstractVector{<:Number},
[permweights::AbstractMatrix{<:Number}],
[walkpermweights::AbstractMatrix{<:Number}]) -> DataFrameCalculates statistics for the permuted vertex weights distribution and how it is different from the actual weights.
Returns the data frame with per-vertex mean, standard deviation, median, MAD and the probability that permuted value is greater/lower than the corresponding real value for the weights of the original matrix (weights) as well as random walk matrix weights (walkweights).
Parameters
weights: weights of the vertices in the original networkwalkweights: weights of the vertices after network diffusion analysis (stationary random walk distribution)permweights: matrix of permuted weights; rows correspond to vertices, columns – to permutationswalkpermweights: matrix of vertex weights based on network diffusion analysis usingpermweightsas input; rows correspond to vertices, columns to permutations
HierarchicalHotNet.diedge_stats — Functiondiedge_stats(weights::AbstractVector{<:Number},
walkweights::AbstractVector{<:Number},
[permweights::AbstractMatrix{<:Number}],
[walkpermweights::AbstractMatrix{<:Number}]) -> DataFrameCalculates statistics for the directed edges permuted weights distribution and how it is different from the actual weights of directed edges.
The output is similar to HierarchicalHotNet.vertex_stats for vertices.
HierarchicalHotNet.conncomponents_stats — FunctionCalculate per-connected component statistics.
HierarchicalHotNet.treecut_stats — Functiontreecut_stats(tree::SCCTree;
[walkmatrix::AbstractMatrix],
[maxweight::Number],
[sources], [sinks], [sourcesinkweights], [top_count],
[pools]) -> DataFrameCalculate SCC network statistic for each cutting threshold of tree.
HierarchicalHotNet.treecut_compstats — Functiontreecut_compstats(tree::SCCTree,
vertex_weights::AbstractVector,
vertex_walkweights::AbstractVector,
perm_vertex_weights::AbstractMatrix,
perm_vertex_walkweights::AbstractMatrix;
[mannwhitney_tests::Bool],
[pvalue_mw_max::Number],
[pvalue_fisher_max::Number],
[pools]) -> DataFrameCalculate SCC network statistic for each cutting threshold of tree.
HierarchicalHotNet.bin_treecut_stats — Functionbin_treecut_stats(cutstats_df::AbstractDataFrame) -> DataFrameBin treecut thresholds and calculate average statistics in each bin.
Takes the output of HierarchicalHotNet.treecut_stats from multiple SCC trees (discriminated by by_cols), identifies the bind for treecut thresholds and calculates the average metric values (stat_cols) within each bin.
HierarchicalHotNet.aggregate_treecut_binstats — Functionaggregate_treecut_binstats(binstats_df::AbstractDataFrame) -> DataFrameAggregate the binned treecut statistics across multiple trees.
Takes binstats_df, the output of HierarchicalHotNet.bin_treecut_stats, and calculates the metric values for the specified quantiles.
HierarchicalHotNet.extreme_treecut_stats — Functionextreme_treecut_stats(stats_df::AbstractDataFrame) -> DataFrameCalculate the cut threshold and corresponding metric value, where the difference between real (taken from stats_df) and permutation metrics (taken from perm_aggstats_df) are maximal/minimal (depending on the metric).
Arguments
stats_df: tree statistics calculated bytreecut_statsperm_aggstats_df: aggregated binned permutated tree statistics calculated byaggregate_treecut_binstatsextra_join_cols: optional columns, in addition to:threshold_binto use for joiningstats_dfandperm_aggstats_dfmetric_cols: columns ofstats_dfandperm_aggstats_dfcontaining treecut metrics to consider for threshold calculation (seeTreecutMetrics)start_maxquantile: if specified, calculates (in addition to minimal and maximal metric) the metric corresponding to the given quantile as well as $1 - quantile$threshold_range: if given, contrains metric statistic calculation to given min/max thresholdsthreshold_weight: optional function that takes stats_df row and returns the prior weight of the corresponding cut threshold
HierarchicalHotNet.TreecutMetrics — Constanttreecut_stats() metrics (dataframe columns) to consider for bin_treecut_stats() and extreme_treecut_stats().