Sequence Networks¶
By using the SequenceNetwork
class of pyeed, sequences can be visualized and analyzed in a network representation. The idea between the network representation is to represent individual sequences as nodes and the relationship between sequences as edges, building the network structure. The relationship between the sequences can be mainly parameterized in to ways: (1) the existence of an edge between two sequences, and (2) the weight assigned to the edge between two sequences. The weight of the edges is calculated by performing a pairwise alignment between the sequences, whereas the identity of the sequences is used to parameterize the weight of the edges. The identity of the sequences is calculated by the number of identical characters between the sequences divided by the length of the sequences. But custom weights can be introduced and used in the visualization.
Furthermore, the threshold
defining the minimum identity between two sequences to be considered as an edge can be adjusted. Thus favouring the buildup regions in the network with higher identity.
import json
from pyeed.core import ProteinRecord
from pyeed.network import SequenceNetwork
# load ids
with open("ids.json", "r") as f:
ids = json.load(f)
# load sequences
sequences = ProteinRecord.get_ids(ids)
Output()
The network is created by instantiating the SequenceNetwork
class and providing the sequences as a list of ProteinRecord
objects. By using the update_threshold
method, the threshold can be adjusted.
The network can be visualized by using the visualize()
method. The method allows to colorize the sequences by the different taxonomic classes. The IDs of the sequences can be added to the plot by setting label
to True
. The network can be saved as a PNnetworkx_graph specifying the save_path
.
# create the network
graph = SequenceNetwork(sequences=sequences)
graph.update_threshhold(0.4)
# plot the network
graph.visualize(color="domain")
Output()
Graph connectivity¶
The threshold in the network has a big impact on potenital visualiztions, and has therfore to be tuned right. This is a task that looks different for each individual graph. In the command update_threshold
multiple options can be choosen. In the function update_threshhold(threshold: float, threshold_mode: str = "UNDER_THRESHOLD")
, there are two parameters, which can be adjusted, the threshold is the float and the threshold_mode allows two possible setting. The first default setting says that all the edges under the given threshold are hidden. The other option is "ABOVE_THRESHOLD"
and it allows the edges above the threshold to be hidden.
In the most used case, where the identiy is used as the weight, a threshold (with mode UNDER_THRESHOLD) which is too small will result in a network that has too many edges and a too big theshold will result in to little edges.
Network Centrality¶
A central analysis method of networks is the calculation of the centrality of the nodes. The centrality of a node is a measure of the importance of a node in the network. The centrality of a node can be calculated by using the calculate_centrality
method of the SequenceNetwork
object. Thereby, different kinds of centrality namely degree
, closeness
, betweenness
, and eigenvector
centrality can be calculated.
The network itself is represented as a networkx
graph object, which allows for a wide range of network analysis and visualization methods. The network object can be accessed by the network
attribute of the SequenceNetwork
object.
graph.update_threshhold(0.45)
graph.calculate_centrality("betweenness")
graph.visualize(size=True, color="phylum", labels=True)
Fine-tuning the network visualization¶
For a better visualization of the network, networkx
provides methods to fine-tune the visualization. Together with matplotlib
, the network can be visualized in a more appealing way.
import networkx as nx
import matplotlib.pyplot as plt
import matplotlib as mpl
G = graph.network
# remove isolated node
G.remove_node("NPA53530.1")
G.remove_node("O67275")
G.remove_nodes_from(list(nx.isolates(G)))
# Settings
# Helper function to rescale a list of values to a new range
def rescale(degree: int, newmin: float, newmax: float):
arr = list(degree)
return [
(x - min(arr)) / (max(arr) - min(arr)) * (newmax - newmin) + newmin for x in arr
]
# use the matplotlib color map
graph_colormap = mpl.colormaps["coolwarm"]
# node color varies with Degree
c_map = rescale([G.degree(v) for v in G], 0.4, 4)
c_map = [graph_colormap(i) for i in c_map]
bc = nx.betweenness_centrality(G)
node_size = rescale([v for v in bc.values()], 800, 3000)
# Plotting of the network
pos = nx.spring_layout(G, weight="identity", iterations=100, seed=18)
plt.figure(figsize=(19, 9))
nx.draw_networkx(
G,
pos=pos,
with_labels=True,
node_color=c_map,
node_size=node_size,
font_color="Black",
font_size="6",
font_weight="bold",
edge_color="grey",
alpha=0.5,
width=1,
)
plt.axis("off")
plt.show()
Export to cytoscape¶
Furthermore, the network can be visualized by using cytoscape
. With a pre-installed cytoscape instance, all sequence data and annotations can be exported to a file by using the export_cytoscape_graph()
method.
Alternatively to the graphical user interface, the cytoscape application can be controlled programmatically using the py4cytoscape
library. In this way, the visualization can be controlled and adjusted from the Jupyter Notebook
import py4cytoscape as p4c
# transfer the network to cytoscape
graph.create_cytoscape_graph(
threshold=0.75,
column_name="class",
)
# plot the network
p4c.notebook_export_show_image()
In cyrest_get: Cannot find local or remote Cytoscape. Start Cytoscape and then proceed.
You are connected to Cytoscape! You are connected to Cytoscape! Applying default style... Applying preferred layout