Introduction
diffusr
implements algorithms for network diffusion such
as Markov random walks with restarts and weighted neighbor
classification. Network diffusion has been studied extensively in
bioinformatics, e.g.Β in the field of cancer gene prioritization. Network
diffusion algorithms generally spread information in the form of node
weights along the edges of a graph to other nodes. These weights can for
example be interpreted as temperature, an initial amount of water, the
activation of neurons in the brain, or the location of a random surfer
in the internet. The information (node weights) is iteratively
propagated to other nodes until a equilibrium state or stop criterion
occurs.
Tutorial
First load the package:
Markov random walks
A Markov random walk with restarts calculates the stationary distribution: where is a restart probability of the Markov chain, is a column-normalized stochastic matrix (we do the normalization for you) and is the starting distribution of the Markov chain. We calculate the iterative updates, it is also possible to do the math using the nullspace of the matrix (comes later).
If you want to use Markov random walks just try something like this:
# count of nodes
n <- 5
# starting distribution (has to sum to one)
p0 <- as.vector(rmultinom(1, 1, prob=rep(.2, n)))
# adjacency matrix (either normalized or not)
graph <- matrix(abs(rnorm(n*n)), n, n)
# computation of stationary distribution
pt <- random.walk(p0, graph)
The stationary distribution should have changed quite a bit from the starting distribution:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 1 0 0 0
## p.inf transition.matrix
## [1,] numeric,5 numeric,25
You can also use a matrix p0
:
p0 <- matrix(c(p0, runif(20)), nrow=n)
pt <- random.walk(p0, graph)
pt
## $p.inf
## [1] 0.20155762 0.58401309 0.04504716 0.06790495 0.10147718
##
## $transition.matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000000000 0.62768874 0.1240731 0.32918651 0.08452962
## [2,] 0.578537804 0.00000000 0.4073473 0.03317131 0.30382198
## [3,] 0.001322467 0.08413649 0.0000000 0.34241913 0.17182459
## [4,] 0.147539127 0.09740329 0.1010815 0.00000000 0.43982381
## [5,] 0.272600602 0.19077149 0.3674981 0.29522304 0.00000000
In the later case, a random walk is done over all columns of
p0
separately.
Analytical vs iterative solution
In the last section we computed the iterative solution of the stationary distribution. You can also choose to do this analytically. In that case we need to take the inverse of the transition matrix which might lead to numerical instability, though. However, it usually runs faster than the iterative version.
pt <- random.walk(p0, graph, do.analytical=TRUE)
pt
## $p.inf
## [1] 0.2015512 0.5840193 0.0450461 0.0679042 0.1014792
##
## $transition.matrix
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.000000000 0.62768874 0.1240731 0.32918651 0.08452962
## [2,] 0.578537804 0.00000000 0.4073473 0.03317131 0.30382198
## [3,] 0.001322467 0.08413649 0.0000000 0.34241913 0.17182459
## [4,] 0.147539127 0.09740329 0.1010815 0.00000000 0.43982381
## [5,] 0.272600602 0.19077149 0.3674981 0.29522304 0.00000000
Nearest neighbors
Diffusion using nearest neighbors is done by traversing through a (weighted) graph and take all the neighbors of a node until a certain depths in the graph is reached. We find shortest paths using priority queues:
# count of nodes
n <- 10
# indexes(integer) of nodes for which neighbors should be searched
node.idxs <- c(1L, 5L)
# the adjaceny matrix (does not need to be symmetric)
graph <- rbind(cbind(0, diag(n-1)), 0)
# compute the neighbors until depth 3
neighs <- nearest.neighbors(node.idxs, graph, 3)
Letβs see what which nodes we got:
print(neighs)
## $`1`
## [1] 2 3 4
##
## $`5`
## [1] 6 7 8
Heat diffusion
A Laplacian heat diffusion process calculates the heat distribution over a graph after at a specific time : where is the initial heat distribution, is the heat distribution at time and are the eigenvalues of the of your graph.
You can use the Laplacian heat diffusion process like this:
# count of nodes
n <- 5
# starting distribution (has to sum to one)
h0 <- as.vector(rmultinom(1, 1, prob=rep(.2, n)))
# adjacency matrix (either normalized or not)
graph <- matrix(abs(rnorm(n*n)), n, n)
# computation of stationary distribution
ht <- heat.diffusion(h0, graph)
Here are the results:
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0 0 0 0 1
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.07311824 0.07082704 0.03669767 0.3714729 0.7113146
As before, p0
can also be a matrix:
h0 <- matrix(c(h0, runif(20)), nrow=n)
ht <- heat.diffusion(h0, graph)
ht
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.07311824 0.5026732 0.8447452 0.7071784 1.0029935
## [2,] 0.07082704 0.6723834 1.0313002 0.3950428 0.7453103
## [3,] 0.03669767 0.2953058 0.8854613 0.1676122 0.3530274
## [4,] 0.37147286 0.3374178 0.8467648 0.8508218 0.9281102
## [5,] 0.71131463 0.2777378 0.7588318 0.4574978 0.4870314