Home Polkadot Network Visualization
Post
Cancel

Polkadot Network Visualization

Overview:

This post is on a project I have been working on named Visualizing-Substrate. It is a program that can access database .db files and use RPC/API calls to substrate-based chains (currently Polkadot and Kusama) to create nodes and edges based on addresses, balances, and transaction data for the entire history of the chain. Addresses and transactions converted to nodes can be chosen based on current address holding and or transaction sizes. The project is currently written in typescript utilizing npm SQLite packages to access the database. Output from my program is designed to be used with Gephi (an open-source network analysis and visualization software) to visualize and organize the nodes and edges with layout algorithms such as the Yifan Hu algorithm an example of which above.

30k

Description

I came up with the idea out of an interest to identify connected accounts and be able to visualize where native tokens on the Kusama blockchain were and who had them. With this program it is easy to identify where assets originate on-chain, where they are currently, and the path they took to get there. Using the community detection algorithm named Modularity also allows me to color nodes by groups depending on their connections. A good example of this is the orange, teal, green, and blue color groups below that all interact with seemingly central nodes. These different colored groups are based on the exchanges that the nodes interact with.

GroupingExample

The database that I used to create the nodes and edges in this project was created using polka-store. Polka-store is a Node.js program written in typescript which scans a Polkadot chain (Polkadot/Kusama/Westend) and stores balance-relevant transactions in a SQLite database. This is needed because Polkadot itself does not directly store transaction data, and requires an API or block explorer to get transaction data which can be slow. With an SQLite database accessing transaction data is fast and can be done with SELECT cases. The database structure looks as follows.

ColumnDescription
chainchain name
idunique id
heightbock height
blockHashblock hash
typeextrinsic method
subTypeextrinsic submethod (e.g. in a ‘utility.batch’ extrinsic)
eventthe event triggered by the extrinsic
timestampunix timestamp (in ms since Jan 1, 1970)
specVersionruntime version
transactionVersiontransaction version
authorIdthe account id of the block validator
senderIdthe account id of the block signer / transaction sender
recipientIdthe account id of the transaction recipient
amountthe amount which was sent or rewarded
totalFeethe fee which was paid by the block signer
feeBalancesthe part of the totalFee that passed to the block author
feeTreasurythe part of the totalFee that passed to the treasury
tipan additional tip paid by the block signer (is part of feeBalances)
successthe transaction was successfull

From this database I need the transactions where the event was a “balances.Transfer” event. From those transactions, I then need the senderId, recipientId, amount, and height values to create my nodes and edges. To do this a created a function to get rows with the desired values using a SELECT FROM WHERE statement. This function also enables me to request transfer transactions after a certain block height and greater than a chosen value in Dot.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  // returns all rows with requested query in database
  GetRows(height: number, amount: number): any {
    const sql = `SELECT senderId,
                 recipientId,
                 amount,
                 height
                 FROM transactions
                 WHERE event = ?
                 AND height > ?
                 AND amount > ?`;

    const rows = db().query(sql, ["balances.Transfer"], height, amount);
    return rows;
  }

Note: Staking rewards and other niche transfers of native assets are not tracked only balance transfers for now.

With the transaction data I am able to create CSV files for nodes and edges. In this case the nodes are non repeating user addresses that have been the sender or receiver of a balance transfer transaction. For this project I also use API calls to a polkadot node to get the current balance of the address for each node. Snippet of a Node.csv file example:

idLabelBalance
16cfqPbPeCb3EBodv7Ma7CadZj4TWHegjiHJ8m3dVdGgKJk116cf…KJk10
1mndd9E8kssCxXDacCbKw3iwFQwdABiFrE8fVRKS5SeS4E41mnd…S4E40
1pqYSLsSSzLCkTdGqebAEwLRSUyPS2JQQzQu1Uvhn85siJm1pqY…siJm0
13YpnBTwkLbUjAg2raa286UkRFvvGJ3jdXSfiHeLf5AQ1kVK13Yp…1kVK1118622.73955348
1H1aMkpYrkjtcQHXc5Zqd2jXxMQqBh7ZnkkRSu62Yz2vEbK1H1a…vEbK1500674.07784768
1W6f6T5mxz2ATndjakXBRvDzX36B7eDurbHEcDKMhnRH2Zx1W6f…H2Zx1500642.48497351
124Xmmo5dn4CStgvkwnmvBX9sYTPaJRYRCCsZ4CDfA1J8F7d124X…8F7d1501230.64040718
15j4dg5GzsL1bw2U2AWgeyAk6QTxq43V7ZPbXdAmbVLjvDCK15j4…vDCK45117043.96
16BPfNA5pVPs9KDWML4cds1dHrUxtRLeozMWqXEwHjqTn6YU16BP…n6YU1923324.50734206

Edge.csv files consist of a Source (senderId), Target (recipientId), Type (directed), Weight (amount), Block (height). Snippet of a Edge.csv file example:

Note: Source and Target addresses shortened for example

SourceTargetTypeWeightBlockFinalized
14rS…oU7U153V…LNfdirected1712161.399415228901522896
16Li…YGLn13V…xMgdirected3521733.364515336171533623
13Vh…6xMg12xt…XkLWdirected3521733.347915336341533640
148G…Fgit1JVr…hathdirected1091082.65415416411541647
13wz…AgoL121H…YMq5directed399999015800901580096
15aG…awQJ12uZ…Jmt2directed120000015810531581059
13fW…WVJ12gP…XLyrdirected3653176.8399017415929491592955
12xt…XkLW13fy…ZEPdirected132088816070211607027

Importing the Node and Edge data into gephi I can create the crazy cool graphs featured here on this post. The exact method as to how I create the graphs with gephi involves quite a bit of tweaking, trial and error. The specific things I play with are the modularity, appearance tab, Yifan Hu Proportional algorithm, and labels noverlap adjustments. Example of Gephi:

gephiExample

Challenges

The size of the graph is dependent on the number of transactions included. More transactions means more nodes and edges. The graph at the top of this post includes a screenshot of part of a 30,000 transactions graph. At this scale it takes a very long time to load any changes and heavily lags my system. The largest graph I have made was with 69,000 transactions, but it takes about 10 minutes to just open the svg file and view it. Seeing as how the polkadot network transactions number will most likely grow indefinitely, I am most likely going to need a beefier computer. Until then I added functionality while creating the nodes and edges to filter out small transactions or transactions before a certain block height depending on requirements. With the filter it enables one to look at specific transactions for example only transactions greater than 10 million dollars or transactions for this month, week, or day. Filtering also cuts down on the number of nodes and edges making it more manageable.

In making the project, the biggest challenge I faced was accessing blockchain data from the Polkadot RPC. As the concept was relatively new to me and tutorial examples were far and few between, I found myself using Doc pages to build up an understanding of how to connect to endpoints and use different methods and queries to obtain the data that I needed. This is where I came to the biggest bottleneck of the project. As it turns out using api calls to get all of the transactions from a blockchain takes a very long time. Luckily after a bunch of searching I found the polka-store project created by github user TheGoldenEye who built a solution. Making api calls to a node to build a database with SQLite using polka-store means I only have to do the slow calls once. After I can access the database at a much more reasonable speed and create the nodes and edges.

Conclusion

The current outcome of the project is a functioning software that allows me to track transactions and use Gephi to visualize the information. I have future plans to expand the functionality of the project as the substrate chains Polkadot and Kusama have gained a large amount of complexity with the addition of Parachains. To do so, I will most likely have to explore and/or create new databases with para-chain transaction data.

Full project code can be found on my github at visualizing-substrate

Want to check out the graph from the beginning in more detail? Click here and you will get a white screen scroll down and right to get to the middle where the nodes are.

This post is licensed under CC BY 4.0 by the author.