Representing network data with pandas

Intermediate Network Analysis in Python

Eric Ma

Data Carpentry instructor and author of nxviz package

CSV files for network data storage

  • CSV File
person,party,weight
Barrett.Samuel,LondonEnemies,1
Barrett.Samuel,StAndrewsLodge,1
Marshall.Thomas,LondonEnemies,1
Eaton.Joseph,TeaParty,1
Bass.Henry,LondonEnemies,1
Intermediate Network Analysis in Python

CSV files for network data storage

  • Advantages:
    • Human-readable
    • Do further analysis with pandas
  • Disadvantages:
    • Repetitive; disk space
  • Two DataFrames: node and edge lists
Intermediate Network Analysis in Python

Node list and edge list

  • Node list
    • Each row is one node
    • The columns represent metadata attached to that node
  • Edge list
    • Each row is one edge
    • The columns represent the metadata attached to that edge
Intermediate Network Analysis in Python

Pandas and graphs

list(G.nodes(data=True))
[(0, {'bipartite': 0}),
(1, {'bipartite': 0}),
(2, {'bipartite': 0}),
...]
nodelist = []

for n, d in G.nodes(data=True): node_data = dict() node_data['node'] = n
node_data.update(d)
nodelist.append(node_data)
Intermediate Network Analysis in Python

Pandas and graphs

nodelist
[{'bipartite': 0, 'node': 0},
{'bipartite': 0, 'node': 1},
{'bipartite': 0, 'node': 2},
{'bipartite': 0, 'node': 3},
{'bipartite': 0, 'node': 4},...]
Intermediate Network Analysis in Python

Pandas and graphs

import pandas as pd
pd.DataFrame(nodelist)
  bipartite  node
0           0     0
1           0     1
2           0     2
3           0     3
4           0     4
5           1     5
6           1     6
7           1     7
pd.DataFrame(nodelist).to_csv('my_file.csv')
Intermediate Network Analysis in Python

Let's practice!

Intermediate Network Analysis in Python

Preparing Video For Download...