Describing network position and understanding

lecture 3: describing network position and understanding assortative mixing
© 2014 Aaron Clauset
003
052
002
001
Aaron Clauset
@aaronclauset
Assistant Professor of Computer Science
University of Colorado Boulder
External Faculty, Santa Fe Institute
051
Five Lectures on Networks
Network Analysis and Modeling
!
Instructor: Aaron Clauset
!
http://santafe.edu/~aaronc/courses/5352/
003
052
002
051
Full lectures notes online (~150 pages in PDF)
001
This graduate-level course will examine modern
techniques for analyzing and modeling the structure and
dynamics of complex networks. The focus will be on
statistical algorithms and methods, and both lectures
and assignments will emphasize model interpretability
and understanding the processes that generate real
data. Applications will be drawn from computational
biology and computational social science. No biological
or social science training is required. (Note: this is not a
scientific computing course, but there will be plenty of
computing for science.)
Software
Data sets
R
Python
Matlab
NetworkX [python]
graph-tool [python, c++]
GraphLab [python, c++]
Mark Newman’s network data sets
Stanford Network Analysis Project
Carnegie Mellon CASOS data sets
NCEAS food web data sets
UCI NET data sets
Pajek data sets
Linkgroup’s list of network data sets
Barabasi lab data sets
Jake Hofman’s online network data sets
Alex Arenas’s data sets
003
UCI-Net
NodeXL
Gephi
Pajek
Network Workbench
Cytoscape
yEd graph editor
Graphviz
052
!
002
Standalone editors
051
!
!
001
!
1. defining a network
2. describing a network
3. null models for networks
4. statistical inference
describing networks
position
describing networks
geometric
position = centrality:
measure of positional
“importance”
harmonic centrality
closeness centrality
betweenness centrality
connectivity
degree centrality
eigenvector centrality
PageRank
Katz centrality
many many more…
Boldi & Vigna, arxiv:1308.2140 (2013)
Borgatti, Social Networks 27, 55–71 (2005)
describing networks
position = centrality:
harmonic, closeness
centrality
importance = being in
“center” of the network
harmonic
ci =
1
n
X 1
1
dij
j6=i
length of shortest path
distance: dij =
Boldi & Vigna, arxiv:1308.2140 (2013)
Borgatti, Social Networks 27, 55–71 (2005)
⇢
`ij
1
if j reachable from i
otherwise
describing networks
position = centrality:
PageRank, Katz, eigenvector
centrality
importance = sum of
importances* of nodes that
point at you
X Ij
Ii =
k
j!i j
or, the left eigenvector of
Ax =
Boldi & Vigna, arxiv:1308.2140 (2013)
Borgatti, Social Networks 27, 55–71 (2005)
x
*modulo several technical details
network position
an example
Giovanni de Medici
network position
1993
Duomo
Palazzo Medici
Giovanni de Medici
network position: closeness
Bischeri
Peruzzi
Lamberteschi
Strozzi
Guadagni
Castellani
Ridolfi
Tornabuoni
nodes: Florence families
edges: inter-family marriages
!
Barbadori
Albizzi
Medici
Acciaiuoli
Salviati
Pazzi
Ginori
which family is
most central?
network position: closeness
B
L
Pe
Str
Gu
C
R
T
Alb
B
Medici
Gi
Ac
Sal
P
Medici
Guadagni
Albizzi
Strozzi
Ridolfi
Bischeri
Tornabuoni
Barbadori
Peruzzi
Castellani
Salviati
Acciaiuoli
Ginori
Lamberteschi
Pazzi
9.5
7.92
7.83
7.67
7.25
7.2
7.17
7.08
6.87
6.87
6.58
5.92
5.33
5.28
4.77
network position
actually, it’s complicated...
network position
an example
Boston commuter rail
559 km of rail
data
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root of
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
Gastner & Newman, J. Stat. Mech. P01015 (2006)
network position
an example
Boston commuter rail
559 km of rail
3272 km of rail
data
minimum travel time
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root of
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
Gastner & Newman, J. Stat. Mech. P01015 (2006)
network position
an example
Boston commuter rail
559 km of rail
3272 km of rail
data
minimum travel time
499 km of rail
minimum weight tree
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root of
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
Gastner & Newman, J. Stat. Mech. P01015 (2006)
network position
559 km
route factor
n
X
1
`i0
q=
n i=1 di0
mean ratio of distance along
edges `i0 to direct Euclidean
distance di0 to root 0
3272 km
q = 1.14
q =1
q = 1.61
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
of the networks to two theoretical models that are each
optimal by one of these two criteria. If one is interested
solely in short, efficient paths to the root vertex then the
optimal network is the “star graph,” in which every vertex is connected directly to the root by a single straight
edge (see Fig. 1b). Conversely, if one is interested solely
in minimizing total edge length, then the optimal network is the minimum spanning tree (MST) (see Fig. 1c).
(Given a set of n vertices at specified points on a flat
plane, the MST is the set of n − 1 edges joining them
such that all vertices belong to a single component and
the sum of the lengths of the edges is minimized [17].)
To make the comparison with the star graph, we consider the distance from each non-root vertex to the root
first along the edges of the network and second along a
simple Euclidean straight line, and calculate the mean
ratio of these two distances over all such vertices. Following Ref. [10], we refer to this quantity as the network’s
route factor, and denote it q:
q=
Gastner & Newman, J. Stat. Mech. P01015 (2006)
499 km
1
n
n
i=1
li0
,
di0
(1)
where li0 is the distance along the edges of the network
route
network
n actua
sewer system 23 922
1.5
gas (WA)
226
1.1
gas (IL)
490
1.4
126
1.1
rail
TABLE I: Number of vertices
length for each of the netwo
with the equivalent results fo
spanning trees on the same
factor for the star graph is al
from the table.)
with the optimal model, t
the real networks ranging
of the corresponding MSTs
But now consider the r
table, which give the route
total edge lengths for the st
these figures are for all ne
optimal case and, more im
the real-world networks to
is optimal in terms of tota
network position
559 km
3272 km
499 km
a simple model
embed n vertices in a plane
until all vertices connected
add edge (i, j) with
minimum value for
wij = dij + `j0
distance from i to j
q = 1.61
of the networks to two theoretical models that are each
optimal by one of these two criteria. If one is interested
solely in short, efficient paths to the root vertex then the
optimal network is the “star graph,” in which every vertex is connected directly to the root by a single straight
edge (see Fig. 1b). Conversely, if one is interested solely
in minimizing total edge length, then the optimal network is the minimum spanning tree (MST) (see Fig. 1c).
(Given a set of n vertices at specified points on a flat
plane, the MST is the set of n − 1 edges joining them
such that all vertices belong to a single component and
the sum of the lengths of the edges is minimized [17].)
To make the comparison with the star graph, we consider the distance from each non-root vertex to the root
first along the edges of the network* and second along a
simple Euclidean straight line, and calculate the mean
ratio of these two distances over all such vertices. Following Ref. [10], we refer to this quantity as the network’s
route factor, and denote it q:
minimum spanning tree
prefer shorter paths to root
q=
Gastner & Newman, J. Stat. Mech. P01015 (2006)
q =1
route length to root
parameter
=0
>0
q = 1.14
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
1
n
n
i=1
li0
,
di0
(1)
where li0 is the distance along the edges of the network
route
network
n actua
sewer system 23 922
1.5
gas (WA)
226
1.1
gas (IL)
490
1.4
126
1.1
rail
TABLE I: Number of vertices
length for each of the netwo
with the equivalent results fo
spanning trees on the same
factor for the star graph is al
from the table.)
with the optimal model, t
the real networks ranging
of the corresponding MSTs
But now consider the r
table, which give the route
total edge lengths for the st
these figures are for all ne
optimal case and, more im
the real-world networks to
is optimal in terms of tota
*this is exactly Prim’s algorithm for MSTs
network position
559 km
3272 km
499 km
a simple model
embed n vertices in a plane
until all vertices connected
add edge (i, j) with
minimum value for
wij = dij + `2j0
511 km
q = 1.11
= 0.4
Gastner & Newman, J. Stat. Mech. P01015 (2006)
q = 1.14
q =1
q = 1.61
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
of the networks to two theoretical models that are each
optimal by one of these two criteria. If one is interested
solely in short, efficient paths to the root vertex then the
optimal network is the “star graph,” in which every verh`ithe root by a single straight
tex is connected directly to
edge (see Fig. 1b). Conversely, if one is interested solely
in minimizing total edge length, then the optimal network is the minimum spanning tree (MST) (see Fig. 1c).
(Given a set of n vertices at specified points on a flat
plane, the MST is the set of n − 1 edges joining them
such that all vertices belong to a single component and
the sum of the lengths of the edges is minimized [17].)
To make the comparison with the star graph, we consider the distance from each non-root vertex to the root
first along the edges of the network and second along a
simple Euclidean straight line, and calculate the mean
ratio of these two distances over all such vertices. Following Ref. [10], we refer to this quantity as the network’s
route factor, and denote it q:
q=
1
n
n
i=1
li0
,
di0
(1)
where li0 is the distance along the edges of the network
route
parab
network
n actua
=
0.4system 23 922 To
sewer
1.5
gas (WA)
226
1.1
distri
gas (IL)
490
1.4
126
1.1
rail
sewer
TABLE I: Number oftotal
vertices
length for each of the netwo
work
with the equivalent results fo
spanning trees on the
same
called
factor for the star graph is al
some
from the table.)
decre
ond,
with the optimal model, t
netwo
the real networks ranging
of the corresponding
MSTs
optim
But now consider the r
spatia
table, which give the
route
total edge lengths for
the str
that
these figures are for all ne
posse
optimal case and, more
im
the real-world networks
to
tal ed
is optimal in terms of tota
mecha
network position
most
centralized
vast wilderness
of in-between
most
decentralized
network position
most
centralized
vast wilderness
of in-between
most
decentralized
network position
most
centralized
vast wilderness
of in-between
most
decentralized
network position
most
centralized
vast wilderness
of in-between
most
decentralized
network position
positions:
• geometric description of network structure
• core vs. periphery
• centrality = importance, influence
B
L
Pe
Str
Gu
C
R
T
Alb
B
Medici
Gi
Ac
Sal
open questions:
• position and dynamics
• what does position predict?
• when does position not matter?
P
FIG. 1: (a) Commuter rail network in the Boston area. The arrow marks the assumed root of t
(c) Minimum spanning tree. (d) The model of Eq. (3) applied to the same set of stations.
of the networks to two theoretical models that are each
optimal by one of these two criteria. If one is interested
solely in short, efficient paths to the root vertex then the
optimal network is the “star graph,” in which every vertex is connected directly to the root by a single straight
edge (see Fig. 1b). Conversely, if one is interested solely
in minimizing total edge length, then the optimal network is the minimum spanning tree (MST) (see Fig. 1c).
(Given a set of n vertices at specified points on a flat
plane, the MST is the set of n − 1 edges joining them
such that all vertices belong to a single component and
the sum of the lengths of the edges is minimized [17].)
To make the comparison with the star graph, we con-
route fac
network
n actual M
sewer system 23 922
1.59
gas (WA)
226
1.13
gas (IL)
490
1.48
126
1.14
rail
TABLE I: Number of vertices n,
length for each of the networks
with the equivalent results for th
spanning trees on the same ver
factor for the star graph is alway
from the table.)
describing networks
homophily and
assortative mixing
like links with like
assortative mixing
~vi
~vj
vertex attributes
Newman, Phys. Rev. E 67, 026126 (2003).
homophily and
assortative mixing
like links with like
!
assortativity coefficient r
quantifies homophily
three types:
scalar attributes
vertex degrees
categorical variables
assortative mixing
~vi
~vj
homophily and
assortative mixing
like links with like
!
scalar attributes:
mean value across ties
1 XX
µ=
Aij vi
2m i j
1 X
=
ki v i
2m i
Newman, Phys. Rev. E 67, 026126 (2003).
assortative mixing
~vi
homophily and
assortative mixing
like links with like
!
scalar attributes:
covariance across ties
~vj
cov(vi , vj ) =
1 X
µ=
ki vi
2m i
Newman, Phys. Rev. E 67, 026126 (2003).
!
P
ij
Aij (vi
P
ij
µ)(vj
µ)
Aij
1 X
=
Aij vi vj µ2
2m ij
✓
◆
X
1
ki kj
=
Aij
vi vj
2m ij
2m
assortative mixing
~vi
~vj
homophily and
assortative mixing
like links with like
!
assortativity coefficient (scalar)
cov(vi , vj )
r=
var(vi , vj )
P
ki kj /2m) vi vj
ij (Aij
= P
ki kj /2m
ij ki ij
[this is just a Pearson correlation across edges]
1r1
Newman, Phys. Rev. E 67, 026126 (2003).
assortative mixing
40
~vi
~vj
age of wife [years]
su
su
30
w
th
un
no
m
Pe
20
10
10
20
30
40
50
age of husband [years]
r = 0.574
strongly assortative
200
150
number
(top) scatter plot of ages of 1141
married couples at time of marriage
[1995 US National Survey of Family
Growth]
100
50
(bottom) histogram of age differences
(M-F) for same data
0
-5
0
5
10
15
20
25
age difference [years]
Newman, Phys. Rev. E 67, 026126 (2003).
FIG. 1: Top: scatter plot of the ages of 1141 married couples
at time of marriage, from the 1995 US National Survey of
w
tr
−
an
fe
da
in
do
ex
It
w
rit
ra
de
pl
pa
assortative mixing
ki
Newman, Phys. Rev. E 67, 026126 (2003).
kj
homophily and
assortative mixing
like links with like
!
degree:
just another scalar
*
* the assortativity coefficient formula simplifies somewhat
in this case. see the Ref in the left corner for more details
assortative mixing
network
⎧ physics coauthorship
⎪
⎪
biology coauthorship
⎪
⎪
⎪
⎪
⎨ mathematics coauthorship
film actor collaborations
social
⎪
⎪
company directors
⎪
⎪
⎪
⎪
⎩ student relationships
email address books
⎧
power grid
⎪
⎨
Internet
technological
⎪
⎩ World-Wide Web
software dependencies
⎧ protein interactions
⎪
⎪
⎨ metabolic network
neural network
biological
⎪
⎪
⎩ marine food web
freshwater food web
ki
type
undirected
undirected
undirected
undirected
undirected
undirected
directed
undirected
undirected
directed
directed
undirected
undirected
directed
directed
directed
size n
52 909
1 520 251
253 339
449 913
7 673
573
16 881
4 941
10 697
269 504
3 162
2 115
765
307
134
92
degree
assortativity r
0.363
0.127
0.120
0.208
0.276
−0.029
0.092
−0.003
−0.189
−0.067
−0.016
−0.156
−0.240
−0.226
−0.263
−0.326
kj
error σr
0.002
0.0004
0.002
0.0002
0.004
0.037
0.004
0.013
0.002
0.0002
0.020
0.010
0.007
0.016
0.037
0.031
ref.
a
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
TABLE II: Size n, degree assortativity coefficient r, and expected error σr on the assortativity, for a number o
technological, and biological networks, both directed and undirected. Social networks: coauthorship networks of (a) p
and biologists [43] and (b) mathematicians [44], in which authors are connected if they have coauthored one or more
in learned journals; (c) collaborations of film actors in which actors are connected if they have appeared together in
more
movies
[5, (2003).
7]; (d) directors of Fortune 1000 companies for 1999, in which two directors are connected if they si
Newman,
Phys. Rev.
E 67, 026126
assortative mixing
homophily and
assortative mixing
like links with like
!
categorical variables:
let eij be fraction of
edges connecting vertices
of type i to vertices of
type j
matrix sum
X
eij = 1
ij
marginals
X
j
Newman, Phys. Rev. E 67, 026126 (2003).
eij = ai
X
i
eij = bj
assortative mixing
homophily and
assortative mixing
like links with like
!
categorical variables:
assortativity coefficient
*
P
P
i eii P
i a i bi
r=
1
i a i bi
Tr e ||e2 ||
=
1 ||e2 ||
Newman, Phys. Rev. E 67, 026126 (2003).
* this equation is equivalent to the popular modularity measure
Q used to score the strength of community structure
assortative mixing
men
1992 study of heterosexual partnerships in San Francisco*
(bipartite network)
black
hispanic
white
other
bi
black
0.258
0.012
0.013
0.005
0.289
women
hispanic white
0.016
0.035
0.157
0.058
0.023
0.306
0.007
0.024
0.204
0.423
other
0.013
0.019
0.035
0.016
0.084
ai
0.323
0.247
0.377
0.053
TABLE I: The mixing matrix eij and the values of ai and
bi for sexual partnerships
the study of Catania et al. [23].
r =in0.621
After Morris [24].
strongly assortative
A.
Mea
To quantify
we define an a
r=
where e is the
means the sum
formula gives r
since eij = ai
perfect assorta
is perfectly dis
vertices of diffe
value
effect on network structure and behavior. The outline
of the paper is as follows. In Section II we study the
effects of assortative mixing by discrete characteristics
such as language or race. In Section III we study *Catania
mixing
Newman, Phys. Rev. E 67, 026126 (2003).
et al., Am. J. Public Health 82, 284-287 (1992).
Northeast
Midwest
South
West
Canada
assortative mixing
4388 Computer Science faculty
vertices are PhD granting institutions in North America
edge (u, v) means PhD at u and now faculty at
v
labels are US census regions + Canada
Northeast
Midwest
South
West
Canada
bi
Northeast
0.119
0.031
0.025
0.049
0.006
0.229
Midwest
0.053
0.067
0.027
0.033
0.005
0.185
South
0.074
0.061
0.083
0.043
0.005
0.267
r = 0.264
moderately assortative
West
0.055
0.026
0.024
0.073
0.005
0.184
Canada
0.022
0.011
0.006
0.011
0.085
0.135
ai
0.322
0.196
0.166
0.209
0.107
assortative mixing
~vi
~vj
homophily and
assortative mixing
like links with like
!
• random graphs tend to be
disassortative r  0
because the mixing is
uniform
• social networks (apparently)
highly assortative, in every
way (attribute, degree,
category)
• extremal values r ⇡ { 1, 1}
suggest underlying
mechanism on that variable
Newman, Phys. Rev. E 67, 026126 (2003).
describing networks
community structure
describing networks
Y
APN
QKD
13
NY
TYK
DDP
DPN
YKD
KET
KHY
assortative community structure
(edges inside the groups)
KLK
0
sitio
nt
600
7
8
Y
9
ons
D
community structure:
a group of vertices that
connect to other groups in
similar ways
community structure
“internal”
edges
Y
APN
QKD
13
NY
TYK
DDP
DPN
YKD
KET
KHY
assortative community structure
(edges inside the groups)
KLK
0
sitio
nt
600
7
8
Y
9
ons
D
“bridge”
edges
community structure:
a group of vertices that
connect to other groups in
similar ways
DAP
NY
13
PNY
DDP
YQK
t
600
7
D
ble
NY
8
s
9
ion
reg
s
C
D
es
cor
tors
ica
ind
ion
reg
aria
yv
ghl
ed
ent
nm
alig
lign
rt a
t to
nt s
me
lign
te a
c
toraslcula
C
community structure
community structure:
a group of vertices that
connect to other groups in
similar ways
mixing matrix
community structure
assortative
disassortative
ordered
core-periphery
edges within groups
edges between groups
linear group hierarchy
dense core, sparse periphery
community structure
!
• enormous interest, especially since 2000
• dozens of algorithms for extracting
various large-scale patterns
assortative
disassortative
ordered
core-periphery
• hundreds of papers published
• spanning Physics, Computer Science,
Statistics, Biology, Sociology, and more
• this was one of the first:
PNAS 2002
5700+ citations on Google Scholar
network communities
1983
most new job opportunities from
“weak ties”
• within-community links = strong
• bridge links = weak
network communities
1983
most new job opportunities from
“weak ties”
• within-community links = strong
• bridge links = weak
why?
information propagates quickly within a
community,
but slowly between communities
network communities
2004
1494 blogs
759 liberal
735 conservative
liberal
conservative
network communities
2004
amazon.com
co-purchasing network
network communities
2004
amazon.com
co-purchasing network
find partition that maximizes
assortativity r on those groups
n = 409,687 items
m = 2,464,630 edges
network communities
clustered
network
purchases = interests
interests = clustered
network communities
political books
on amazon (2004)
©2004 Valdis Krebs
network communities
• community = vertices with same pattern of intercommunity connections
• network macro-structure
• finding them like “network clustering”
• allow us to coarse grain system structure
[decompose heterogeneous structure into homogeneous blocks]
• constrains network synchronization,
information flows, diffusion, influence
network communities
• community = vertices with same pattern of intercommunity connections
• network macro-structure
• finding them like “network clustering”
• allow us to coarse grain system structure
[decompose heterogeneous structure into homogeneous blocks]
• constrains network synchronization,
information flows, diffusion, influence
open questions:
• what processes generate communities?
• what impact on dynamics? network function?
describing networks
motifs
o Arqueolo´gico is found in four of the six
own caves (22) [see review in (23)]. It is
o found on the coast of Peru in sites that
e associated with ephemeral streams (24).
e southern limit in Chile and northwest
gentina has yet to be explored.
describing networks
References and Notes
T. Dillehay, Science 245, 1436 (1989).
D. J. Meltzer et al., Am. Antiq. 62, 659 (1997).
T. F. Lynch, C. M. Stevenson, Quat. Res. 37, 117
(1992).
D. H. Sandweiss et al., Science 281, 1830 (1998).
L. Nu
´n˜ez, M. Grosjean, I. Cartajena, in Interhemispheric Climate Linkages, V. Markgraf, Ed. (Academic Press,
San Diego, CA 2001), pp. 105–117.
M. A. Geyh, M. Grosjean, L. Nu
´n˜ez, U. Schotterer,
Quat. Res. 52, 143 (1999).
J. L. Betancourt, C. Latorre, J. A. Rech, J. Quade, K.
Rylander, Science 289, 1542 (2000).
M. Grosjean et al., Global Planet. Change 28, 35
(2001).
C. Latorre, J. L. Betancourt, K. A. Rylander, J. Quade,
Geol. Soc. Am. Bull. 114, 349 (2002).
Charcoal in layers containing triangular points has
been 14C dated at Tuina-1, Tuina-5, Tambillo-1, San
Lorenzo-1, and Tuyajto-1 between 13,000 and 9000
cal yr B.P. (table S1 and fig. S1).
P. A. Baker et al., Science 291, 640 (2001).
G. O. Seltzer, S. Cross, P. Baker, R. Dunbar, S. Fritz,
Geology 26, 167 (1998).
L. G. Thompson et al., Science 282, 1858 (1998).
M. Grosjean, Science 292, 2391 (2001).
E. P. Tonni, written communication.
M. T. Alberdi, written communication.
J. Fernandez et al., Geoarchaeology 6, 251 (1991).
The histogram of middens is processed from (9).
M. Grosjean, L. Nu
´n˜ez, I. Cartajena, B. Messerli, Quat.
Res. 48, 239 (1997).
The term Silencio Arqueolo
´gico describes the midHolocene collapse of human population at those
archaeological sites of the Atacama Desert that are
vulnerable to multicentennial or millennial-scale
drought. The term Silencio Archaeolo
´gico does not
conflict with the presence of humans at sites that are
not susceptible to climate change, such as in spring
and river oases that drain large (Pleistocene) aquifers
or at sites where wetlands were created during the
arid
middle
Holocene,
such as(2002).
Tulan-67, Tulan-68,
Milo et
al., Science
298, 824-827
networks of Escherichia coli and Saccharomyces cerevisiae or from those found
in the World Wide Web. Similar motifs were found in networks that perform
information processing, even though they describe elements as different as
biomolecules within a cell and synaptic connections between neurons in Caenorhabditis elegans. Motifs may thus define universal classes of networks. This
approach may uncover the basic building blocks of most networks.
Many of the complex networks that occur in
nature have been shown to share global statistical features (1–10). These include the “small
world” property (1–9) of short paths between
any two nodes and highly clustered connections. In addition, in many natural networks,
there are a few nodes with many more connections than the average node has. In these types
of networks, termed “scale-free networks” (4,
6), the fraction of nodes having k edges, p(k),
decays as a power law p(k) ϳ k–␥ (where ␥ is
often between 2 and 3). To go beyond these
global features would require an understanding
of the basic structural elements particular to
each class of networks (9). To do this, we
developed an algorithm for detecting network
motifs: recurring, significant patterns of interconnections. A detailed application to a gene
regulation network has been presented (11).
Related methods were used to test hypotheses
on social networks (12, 13). Here we generalize
this approach to virtually any type of connectivity graph and find the striking appearance of
motifs:
small subgraphs (of interest),
which we then count
Departments of Physics of Complex Systems and
compare counts against null
Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel 76100. Cold Spring Harbor Labmodel
(random graph model)
oratory, Cold Spring Harbor, NY 11724,
USA.
1
2
*To whom correspondence should be addressed. Email: [email protected]
Fig. 1. (A) Examples
of interactions represented by directed
edges between nodes
in some of the networks used for the
present study. These
networks go from the
scale of biomolecules
(transcription factor
protein X binds regulatory DNA regions
of a gene to regulate
the production rate
of
protein
Y),
through cells (neuron
X is synaptically connected to neuron Y),
to organisms (X
describing networks
motifs:
small subgraphs (of interest),
which we then count
compare counts against null
model (random graph model)
• efficient counting is tricky
(combinatorics + graph isomorphism)
• choice of null model key
(see Lecture 2)
• lots of work in this area, mainly in
molecular biology and neuroscience
• see
Sporns and Kotter, PLoS Biol. 2, e369 (2004)
Matias et al., REVSTAT 4, 31-51 (2006)
Wong et al., Brief. in Bioinfo. 13, 202-215 (2011)
003
052
002
051
001
end of lecture 3
selected references
• The structure and function of complex networks. M. E. J.
•
•
•
•
•
•
•
•
•
•
•
Newman, SIAM Review 45, 167–256 (2003).
The Structure and Dynamics of Networks. M. E. J. Newman,
A.-L. Barabási, and D. J. Watts, Princeton University Press
(2006).
Hierarchical structure and the prediction of missing links in
networks. A. Clauset, C. Moore, and M. E. J. Newman,
Nature 453, 98–101 (2008).
Modularity and community structure in networks. M. E. J.
Newman, Proc. Natl. Acad. Sci. USA 103, 8577–8582 (2006).
Why social networks are different from other types of
networks. M. E. J. Newman and J. Park, Phys. Rev. E 68,
036122 (2003)
Random graphs with arbitrary degree distributions and their
applications. M. E. J. Newman, S. H. Strogatz, and D. J.
Watts, Phys. Rev. E 64, 026118 (2001).
Comparing community structure identification. L. Danon, A.
Diaz-Guilera, J. Duch and A. Arenas. J. Stat. Mech. P09008
(2005).
Characterization of Complex Networks: A Survey of
measurements. L. daF. Costa, F. A. Rodrigues, G. Travieso
and P. R. VillasBoas. arxiv:cond-mat/050585 (2005).
Evolution in Networks. S.N. Dorogovtsev and J. F. F.
Mendes. Adv. Phys. 51, 1079 (2002).
Revisting “scale-free” networks. E. F. Keller. BioEssays 27,
1060-1068 (2005).
Currency metabolites and network representations of
metabolism. P. Holme and M. Huss. arxiv:0806.2763 (2008).
Functional cartography of complex metabolic networks. R.
Guimera and L. A. N. Amaral. Nature 433, 895 (2005).
• Graphs over Time: Densification Laws, Shrinking Diameters
•
•
•
•
•
•
•
•
•
and Possible Explanations. J. Leskovec, J. Kleinberg and C.
Faloutsos. Proc. 11th ACM SIGKDD Intl. Conf. on Knowledge
Discovery and Data Mining 2005.
The Structure of the Web. J. Kleinberg and S. Lawrence.
Science 294, 1849 (2001).
Navigation in a Small World. J. Kleinberg. Nature 406 (2000),
845.
Towards a Theory of Scale-Free Graphs: Definitions,
Properties and Implications. L. Li, D. Alderson, J. Doyle, and
W. Willinger. Internet Mathematics 2(4), 2006. A First-Principles Approach to Understanding the Internet’s
Router-Level Topology. L. Li, D. Alderson, W. Willinger, and
J. Doyle. ACM SIGCOMM 2004.
Inferring network mechanisms: The Drosophila melanogaster
protein interaction network. M. Middendorf, E. Ziv and C. H.
Wiggins. Proc. Natl. Acad. Sci. USA 102, 3192 (2005).
Robustness Can Evolve Gradually in Complex Regulatory
Gene Networks with Varying Topology. S. Ciliberti, O. C.
Martin and A. Wagner. PLoS Comp. Bio. 3, e15 (2007).
Simple rules yield complex food webs. R. J. Williams and N.
D. Martinez. Nature 404, 180 (2000).
A network analysis of committees in the U.S. House of
Representatives. M. A. Porter, P. J. Mucha, M. E. J. Newman
and C. M. Warmbrand. Proc. Natl. Acad. Sci. USA 102, 7057
(2005).
On the Robustness of Centrality Measures under Conditions of
Imperfect Data. S. P. Borgatti, K. M. Carley and D.
Krackhardt. Social Networks 28, 124 (2006).