Author: Aaron Brooks / @scalefreegan
You can follow along on
http://scalefreegan.github.io/Teaching/DataIntegration
Some way to group elements that are more similar to each other than they are to everything else
What does it mean to be more similar? Need to define a metric
A nonnegative function g(x,y) describing the "distance" between neighboring points for a given set
$g(x,y) \ge 0$
$g(x,y) = g(y,x)$
$g(x,y) = 0 \quad iff.\ x = y$
$g(x,y) + g(y,z) \ge g(x,z)$
Minimize $\sum_{k=1}^K \sum_{x_n \in C_k} ||x_n - \mu_k||^2$ with respect to $C_k$ and $\mu_k$
Given $k$ and inital cluster centers, $\mu_k$
Repeat (1) and (2) until $\sum_{k = 1}^K|\mu_{k_{t+1}} - \mu_{k_{t}}| \le \epsilon$
More detail at The Data Science Lab
What if your data looks like this?
Some advantages
Luxburg (2007). A Tutorial on Spectral Clustering
1. Compute the similarity matrix, S (e.g. Kernel function like RBF)
2. Calculate the affinity matrix, A, from S (e.g. k-nearest neighbors algorithm)
3. Calculate a graph Laplacian, L
$L = D − W$
Wikipedia: Laplacian matrix
4. Perform k-means clustering on matrix, Z, consisting of eigenvectors for k smallest eigenvalues of L
Eigen value decomposition ⇨ Fiedler vector
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2010). Link communities reveal multiscale complexity in networks
Yong-Yeol et al (2011). Flavor network and the principles of food pairing
Brooks and Reiss et al (2014). A system‐level model for the microbial regulatory genome
Brooks and Reiss et al (2014). A system‐level model for the microbial regulatory genome
Selected from their relationship to 120 genes involved in mitosis, DNA mismatch repair, and BMP signaling
For more info: Practical 1