Visualizing and Interpreting Link-community Clusters in R
Visualizing and Interpreting Link-community Clusters in Cytoscape
- Export network
- Import network
End

In this final lesson we will apply tools to visualize and interpret the community structure of our networks.

Because link communities are more appealing for visual analysis (e.g., overlapping communities), we will focus on visualization and interpretation of these clusters, although in principle the same approaches could be applied to the spectral clusters as well.

If you haven’t done so already, please load the R libraries and data from the previous practical session:

# libraries
library(pheatmap)
library(dplyr)
library(reshape2)
library(igraph)
library(linkcomm)
library(kernlab)

# data
GITHUBDIR = "http://scalefreegan.github.io/Teaching/DataIntegration/data/"
load(url(paste(GITHUBDIR, "data.rda", sep = "")))
load(url(paste(GITHUBDIR, "kernel.rda", sep = "")))
load(url(paste(GITHUBDIR, "g_linkcomm.rda", sep = "")))

Visualizing and Interpreting Link-community Clusters in R

Link-communities can provide a richly detailed decomposition of a complex network into structured (highly-connected) communities. However, it can also be more complicated to interpret the meaning of these communities since a given node can belong to multiple communities. Here we use several tools from the linkcomm R package to identify highly connected nodes and communities in network and visualize them. These basic tools will serve as a starting place to dive deeper into understanding the structure of these networks.

Node Centrality

A first feature that one could assess on the network. Oftentimes, one may be interested in nodes that are highly “central” or influential in the network. In the case of link communities, these are the nodes that belong to highly-connected communities that in addition also weights how similar each community to which a node belongs is to the others. Formally,

\[ C_{c}(i) = \sum_{i \in j }^N \left(1-\frac{1}{m}\sum_{i \in j \cap k}^m S(j,k)\right) \]

$N$ is the number of communities to which $i$ belongs, $S(j,k)$ is the similarity between community $j$ and $k$ (Jaccard coefficient for number of shared nodes), $m$ is number of communities to which nodes $j$ and $i$ jointly belong.

Compute node centrality. Print the 5 nodes with the highest modularity

Node centrality is calculated with the function getCommunityCentrality

What are the most central genes? Why are they so central in our networks?

Solution

cc <- getCommunityCentrality(g_linkcomm,  type = "commweight" )
sort(cc, decreasing = T)[1:5]

ENSG00000132383	ENSG00000049541	ENSG00000095002	ENSG00000116062	ENSG00000163918
60.75352	57.59503	57.22532	51.75811	47.34038

What are the nodes that are highly central? Perhaps the following provides some insight:

plot(cc, degree(g)[names(cc)], xlab = "Community Centrality", ylab = "Degree")

Community Modularity

Compute community modularity. Print the 5 communities with the highest modularity

Community modularity is calculated with the function getCommunityConnectedness

Why are values for two of the communities so much higher than the others?

Solution

cm <- getCommunityConnectedness(g_linkcomm, conn = "modularity")
sort(cm, decreasing = T)[1:5]

280	327	244	278	262
0.556787	0.2585082	0.055786	0.0529202	0.0494077

Visualization

The linkcomm R package provides functionality for visually representing link-communities. In particular it allows one to visualize all communities to which a node jointly belongs using node-pies. In this case each node is represented by a pie chart that quantifies the fraction of a node’s edges belonging to each community.

Before plotting the whole network with this representation, which can be a bit bewildering, let’s focus on a simple subnetwork. To extract this subnetwork we will use another handy function, getClusterRelatedness. This function is particularly useful in context of the multiscale nature of the link communities. In principle, our original edge similarity dendrogram could have been cut a multiple heights to generate communities at varying scales. The default algorithm settings chooses this point to be the height that maximizes the density of the result communities. However, we could be justified in merging these communities into larger meta-communities. Let’s do that to find a sub-network to plot.

Compute cluster relatedness. Cut the dendrogram using parameter cutat = 1.5 to define metacommunities

Plot the metacommunity of size 4 as a sub-network with node.pies = T.

Solution

cr = getClusterRelatedness(g_linkcomm)
c_meta = cutDendrogramAt(cr, cutat = 1.5)
c_meta_i = which(sapply(c_meta,length)==4)
plot(g_linkcomm, type = "graph", shownodesin = 0, node.pies = TRUE, vlabel.cex = 0.6, clusterids = c_meta[[c_meta_i]])

Plot the entire linkcommunity network with node.pies = T.

How does this network compare to the previous sub-network?

Solution

plot(g_linkcomm, type = "graph", shownodesin = 0, node.pies = TRUE, vlabel=F)

Visualizing and Interpreting Link-community Clusters in Cytoscape

Export network

For the remainder of the course, Cytoscape will be used to extend our analysis of the integrated kernel network and the communities derived from it by link-community detection.

Export the graph and link community annotations to Cytoscape. For the purpose of this exercise only keep the edges that are assigned to a community by link community clustering.

The community IDs can be accessing the edges attribute of the linkcomm object, e.g. g_linkcomm$edges

In the Solution below I have also written out the evidence codes for each edge.

Solution

mc_edge = as_edgelist(g)
mc_edge = data.frame(node1 = mc_edge[,1], node2 = mc_edge[,2], weight = E(g)$weight)
linkcomm_edgelist = g_linkcomm$edges
towrite = merge(mc_edge,linkcomm_edgelist,by = c("node1","node2"))
towrite = merge(towrite, data, by.x = c("node1","node2"), by.y = c("gene1","gene2"))

write.table(towrite, file = "graph.txt", sep = "\t", quote = F, col.names=T, row.names = F)

Before opening this network in Cytoscape, you might check to see whether the link community network (which is a sub-network of the entire kernel matrix) has an over-representation of one of the original sources of information (e.g., only contains evidences from protein-protein interactions)

Is the link community network biased for a particular kind of evidence?

Solution

Metric	Full Network (%)	Link Community Network (%)
BP	0.0769166	0.075859
CODA80	0.0231004	0.0464079
HIPPO	0.1211711	0.1075413
PI	0.7052801	0.6684516
TM	0.0735319	0.1017403

Import network

The network is now ready to import into Cytoscape.

Analyze the link-community network. Apply techniques you learned in Matt’s first lecture to extend our analysis of the networks.

Some things to try:

Layout the network with different algorithms

Color each edge according to its link community membership

Inspect edges of related clusters. Did they originate from similar sources of information?

Use BiNGO to compute GO enrichment for obvious subnetworks.

You can find a version I made here

Stuck?

You’re on your own for this one!

via GIPHY

End

Thanks for participating in the course.

Practical 4: Cluster Visualization and Interpretation

EMBL Course: Data mining and integration with networks

Aaron Brooks

This should take about 20 minutes

Visualizing and Interpreting Link-community Clusters in R

Node Centrality

Solution

Community Modularity

Solution

Visualization

Solution

Solution

Visualizing and Interpreting Link-community Clusters in Cytoscape

Export network

Solution

Solution

Import network

Stuck?

End

Please take the online course survey to help me make this course better!