(Gephi, NetworkX) - PDF - Northeastern University

Hands on:
Gephi, NetworkX,
iGraph, Cytoscape
Roberta Sinatra
with the contribution of Pierre Deville, Marc Santolini and Michael Szell
Northeastern University, October 1st 2014
About the problem 4. of the
assignment (due on Monday):
“The red and blue communities are said to be interactive if a typical red
node is just two steps away from some blue node and vice versa.
Evaluate the fraction of purple nodes required for the communities to
become interactive.”
Hint for the solution:
Assume that a typical red node has k nearest neighbors.
Quantify how many of the k neighbors are purple.
Quantify how many blue neighbors a purple node has.
Which is the condition for a typical red node of degree k
to have at least a blue node as second neighbor?
Young discipline, young tools
Many tools out there, “The Tool” does not exist
Choice of software, programming language,
library, module, ..., depends on the your
application, field, taste, problem
See the following slides as a guidelines. If you
know already well another programming
language, tool, library, feel free to use it
Gephi
Tutorial inspired by the official Gephi tutorial on https://gephi.github.io
Victor Hugo and “Les Misérables”
Les Misérables
Do not open the file in the window at the start of Gephi!
Open the file (do one of the following):
download the attachment from the email sent earlier
today: lesmiserables.zip
go to the course website:
http://barabasilab.neu.edu/courses/phys5116/content/
lesmiserables.zip
How the file looks like: nodes
File: lesmiserables_csv/node_list.csv
How the file looks like: nodes
File: lesmiserables_csv/edge_list.csv
How the GML file looks like:
nodes
File: lesmiserables.gml
How the GML file looks like :
edges
File: lesmiserables.gml
Importing the file in Gephi
(.csv)
Importing the file in Gephi
(.csv)
1
2
Importing the file in Gephi
(.csv)
1
2
Importing the file in Gephi
(.csv)
Importing the file in Gephi
(.csv)
Alternative: Importing the GML
file in Gephi
Alternative: Importing the GML
file in Gephi
Random node position
Navigate with the mouse wheel
Edge thickness
Let’s make it pretty: Layout
Force Atlas
Force Atlas
Fruchterman Reingold
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
Ranking Nodes (color)
2
1
Metrics
Metrics
Metrics
Metrics
Computes metrics related to
shortest paths
Ranking Nodes (size)
Ranking Nodes (size)
Ranking Nodes (size)
Adjust Layout
Show and adjust labels
Show labels
Proportional to
node size
Change fontsize
Show and adjust labels
Change Font
Change color
Show and adjust labels
Change Font
Change color
Show and adjust labels
Display other info
Show and adjust labels
Display other info
Community Detection
1
2
3
Community Detection
Right click to randomize colors
Click to change the color
Community Detection
Filters
We can hide nodes or edges
Filters
Export the visualization
Export the visualization
1
2
Export the visualization
Play with other settings
Export the visualization
Vectorial graphics
Save the file
Data
Let’s remove the filter first
Data
Data
To plot degree distributions,
centralities, etc. in a better way
Data
Gephi
Many plugins to make it
better!
About this network
About this network
Visualizing and measuring is only the
start: always interpret your results!
NetworkX: a Python module
Inspired by the tutorial of Salvatore Scellato for the course “Social and
Technological Network Analysis”, University of Cambridge (2011)
Why python?
NetworkX: online resources
Getting started
Import NetworkX
>>> import networkx as nx
There are different Graph classes for undirected and directed
networks. Let’s create a basic Graph class
>>> g = nx.Graph() # empty graph
The graph g can be grown in several ways. NetworkX includes
many graph generator functions and facilities to read and write
graphs in many formats
Getting started
Import NetworkX
>>> import networkx as nx
There are different Graph classes for undirected and directed
networks. Let’s create a basic Graph class
>>> g = nx.Graph() # empty graph
The graph g can be grown in several ways. NetworkX includes
many graph generator functions and facilities to read and write
graphs in many formats
Getting started: add nodes
# One node at a time
>>> g.add_node(1) # method of nx.Graph
# A list of nodes
>>> g.add_nodes_from([2 ,3])
# A container of nodes
>>> h = nx.path_graph(10)
>>> g.add_nodes_from(h) # g now contains the nodes
of h
# In contrast, you can remove any node of the graph
>>> g.remove_node(2)
Getting started: add edges
# Single edge
>>> g.add_edge(1,2)
>>> e=(2,3)
>>> g.add_edge(*e) # unpack edge tuple
# List of edges
>>> g.add_edges_from([(1 ,2) ,(1 ,3)])
# Container of edges
>>> g.add_edges_from(h.edges())
# In contrast, you can remove any edge of the graph
>>> g.remove_edge(1,2)
Getting started:
access nodes and edges
>>> g.add_edges_from([(1 ,2) ,(1 ,3)]) >>>
g.add_node(‘a’)
>>>
>>>
>>>
[1,
g.number_of_nodes() # also g.order() 4
g.number_of_edges() # also g.size() 2
g.nodes()
2, 3, ‘a’]
>>> g.edges()
[(1, 2), (1, 3)]
>>> g.neighbors(1)
[2, 3]
>>> g.degree(1)
2
Getting started:
node and edge iterators
Many applications require iteration over nodes or over edges: simple
and easy in NetworkX
>>> g.add_edge(1,2)
>>> for node in g.nodes():
print node, g.degree(node)
1, 1
2, 1
>>> g.add_edge(1,3,weight=2.5)
>>> g.add_edge(1,2,weight=1.5)
>>> for n1,n2,attr in g.edges(data=True): # unpacking
print n1,n2,attr[‘weight’]
1, 2, 1.5
Getting started:
graph operators
Classic graph operations
subgraph(G, nbunch) - induce subgraph of G on nodes in nbunch
union(G1,G2) - graph union
disjoint_union(G1,G2) - graph union assuming all nodes are different
cartesian_product(G1,G2) - return Cartesian product graph
compose(G1,G2) - combine graphs identifying nodes common to both
complement(G) - graph complement
create_empty_copy(G) - return an empty copy of the same graph class
convert_to_undirected(G) - return an undirected representation of G
convert_to_directed(G) - return a directed representation of G
Getting started: graph generators
# small famous graphs
>>> petersen=nx.petersen_graph()
>>> tutte=nx.tutte_graph()
>>> maze=nx.sedgewick_maze_graph()
>>> tet=nx.tetrahedral_graph()
# classic graphs
>>> K_5=nx.complete_graph(5)
>>> K_3_5=nx.complete_bipartite_graph(3,5)
>>> barbell=nx.barbell_graph(10,10)
>>> lollipop=nx.lollipop_graph(10,20)
# random graphs
>>> er=nx.erdos_renyi_graph(100,0.15)
>>> ws=nx.watts_strogatz_graph(30,3,0.1)
>>> ba=nx.barabasi_albert_graph(100,5)
>>> red=nx.random_lobster(100,0.9,0.9)
Getting started: graph I/O
NetworkX is able to read/write graphs from/to files using
common graph formats:
edge lists
adjacency lists
GML
GEXF
Python pickle
GraphML
Pajek
LEDA
YAML
Getting started: read and write
edge lists
Read and write edge lists
g =
nx.read_edgelist(path,comments='#',create_using=No
ne,delimiter='',nodetype=None,edgetype=None,encodi
ng='utf-8')
nx.write_edgelist(g,path,comments='#',
delimiter=' ',encoding='utf-8')
Formats
• Node pairs with no data:
1 2
• Python dictionary as data:
1 2 {'weight':7, 'color':'green'}
• Arbitrary data:
1 2 7 green
Getting started: simple analysis
lesmis=
nx.read_weighted_edgelist('lesmiserables_edgelist.txt')
lesmis_und = nx.Graph(lesmis)
N,K = lesmis_und.order(), lesmis_und.size()
avg_deg = float(2*K)/N
>>> print "Nodes: ", N
Nodes: 77
>>> print "Edges: ", K
Edges: 254
>>> print "Average degree: ", avg_deg
Average degree: 6.5974025974
Getting started: simple analysis
>>> lesmis_GML=nx.read_gml('lesmiserables.gml')
>>> lesmis_GML_und=nx.Graph(lesmis_GML)
N_gml,K_gml =
lesmis_GML_und.order(), lesmis_GML_und.size()
avg_deg_gml = float(2*K_gml)/N_gml
>>> print "Nodes: ", N_gml
Nodes: 77
>>> print "Edges: ", K_gml
Edges: 254
>>> print "Average degree: ", avg_deg_gml
Average degree: 6.5974025974
Getting started: simple analysis
>>> degrees = lesmis_und.degree()
>>> degrees
{u'30': 2, u'54': 4, u'42': 3, u'43': 3,...
>>> degrees.values()
[2, 4, 3, 3, 9, 11, 13, 12, 13, 12, 10, 1,
# Clustering coefficient of all nodes (in a
dictionary)
>>> clust_coefficients = nx.clustering(lesmis_und)
# Average clustering coefficient
>>> ccs = nx.clustering(lesmis_und)
>>> avg_clust = sum(ccs.values()) / len(ccs)
>>> avg_clust
0.5731367499320134
Getting started: simple analysis
# Betweenness centrality
>>> bet_cen = nx.betweenness_centrality(lesmis_und)
# Closeness centrality
>>> clo_cen = nx.closeness_centrality(lesmis_und)
# Eigenvector centrality
>>> eig_cen = nx.eigenvector_centrality(lesmis_und)
Getting started: simple analysis
# simple network drawing
>>> nx.draw(lesmis_und)
Getting started: simple analysis
>>> import numpy as np
>>> import matplotlib.pyplot as plt
>>> kmin=min(degrees)
>>> kmax=max(degrees)
>>> logBins = np.logspace(np.log10(kmin),
>>> np.log10(kmax),num=20)
>>> logBinDensity, binedges = np.histogram(degrees,
bins=logBins, density=True)
>>> logBins = np.delete(logBins, -1)
>>>
>>>
>>>
>>>
fig = plt.figure()
ax = fig.add_subplot(111)
ax.set_xscale('log')
ax.set_yscale('log')
>>> plt.plot(logBins,logBinDensity,'x',color='blue')
>>> plt.show()
Getting started: simple analysis
Getting started: simple analysis
>>> plt.xlabel('k', fontsize=20)
>>> plt.ylabel('P(k)', fontsize=30)
Remember: compare your network
always with a null model!
>>> nx.connected_double_edge_swap(lesmis_und,
100)
64
Remember: compare your network
always with a null model!
Preserves the degree sequence, but all the rest
might have changed (clustering coefficient,
centralities, shortest path, etc.)
iGraph
Advantages
Available for python, R, Ruby, C/C++
Faster than NetworkX in most cases
(core written in C)
Great to draw graphs with specific layouts
Disadvantages
Still some bugs (not for the simple routines)
Less friendly than NetworkX
Resources
Gephi: https://gephi.github.io/
Python and much more:
https://www.enthought.com/products/canopy/
Networkx: https://gephi.github.io/
iGraph: http://igraph.org/
Matplotlib: http://matplotlib.org
Numpy: http://www.numpy.org/
One of your best
friends: stackoverflow
And remember:
practice, practice,
practice!