Bitcoin,Network,Measurement,and,a,New,Approach,to,Infer,the,Topology

Ruiguang Li,Jiawei Zhu,Dawei Xu,Fudong Wu,Jiaqi Gao,Liehuang Zhu

1 School of Cyberspace Science and Technology,Beijing Institute of Technology,Beijing 100081,China

2 National Computer Network Emergency Response Technical Team/Coordination Center,Beijing 100029,China

3 School of Cyberspace Security,Changchun University,Changchun 130022,China

*The corresponding author,email:liehuangz@bit.edu.cn

Abstract:Bitcoin has made an increasing impact on the world’s economy and financial order,which attracted extensive attention of researchers and regulators from all over the world.Most previous studies had focused more on the transaction layer,but less on the network layer.In this paper,we developed BNS(Bitcoin Network Sniffer),which could find and connect nodes in the Bitcoin network,and made a measurement in detail.We collected nearly 4.1 million nodes in 1.5 hours and identified 9,515 reachable nodes.We counted the reachable nodes’properties such as:service type,port number,client version and geographic distribution.In addition,we analyzed the stability of the reachable nodes in depth and found nearly 60% kept stable during 15 days.Finally,we proposed a new approach to infer the Bitcoin network topology by analyzing the Neighbor Addresses of Adjacent Nodes and their timestamps,which had an accuracy over 80%.

Keywords:Bitcoin network;reachable nodes;node properties;node stability;network topology

Figure 1.Logical structure of Bitcoin.

Bitcoin was first proposed by Satoshi Nakamoto in 2008[1].Since its official operation in January 2009,Bitcoin has been working steadily for 12 years.By now,It’s the most successful digital currency,and keeping increasing in value.With the outbreak of COVID-19 in 2019,most countries had issued much more currency into the world’s economy,which led to serious inflation.A large part of currency had flooded into the Bitcoin market,and raised the Bitcoin’s price.In April 2021,the Bitcoin’s price reached the peak at$65,000.Nowadays,Bitcoin has become important financial means for payment,investment and capital.Some people used Bitcoin to carry out illegal activities,which had a bad impact on the normal economic and financial order.

The Bitcoin system can be divided into the transaction layer and the network layer,as is shown in Figure 1.Most previous studies had focused more on the transaction layer,but less on the network layer.The Bitcoin network is a P2P network,which has the characteristics of decentralization and anonymity.Decentralization means there is no central organization or trust center in the network.The participants gain trust through message interaction.Anonymity means Bitcoin users’accounts and addresses are encrypted to ensure the privacy and security.All the transactions are stored in the block-chain in order of time and published to all participants.Nodes in the Bitcoin network recorded all block-chain data.The decentralization and anonymity of Bitcoin brings difficulties to the supervision,because the transactions are anonymous and difficult to track.Therefore,it’s worth making deep studies on the Bitcoin network’s working mechanism,online nodes,communication protocols and topology.

The main contributions of our work are:1)Develop an efficient measuring platform BNS,which can find nearly 4.1 million node addresses during 1.5 hours.2)Count the reachable nodes’properties,such as service type,port number,client version and geographic distribution.3)Propose a new approach to infer the topology of the reachable nodes,which had an accuracy over 80%.

Bitcoin network is a typical P2P network which has no centralized organization and the topology is dynamically changed.Each node works independently according to the agreed protocols,shaking hands,broadcasting addresses,verifying transactions,packaging blocks and competing mining.Now we focus on the network layer of Bitcoin and introduce the nodes,protocols,address management of the network.

2.1 Nodes

The basic functions of Bitcoin nodes include blocks storing,network routing,competing mining and wallet.Some nodes have all functions and save a complete copy of the blockchain,which are called“Full Nodes”.They provide stable connecting services for the whole network and can be seen as“Servers”.In 2021,the main network of Bitcoin consists of 10,000-15,000 full nodes,which provide all basic functions and ensure the normal operation of the Bitcoin network.

Not all Bitcoin nodes have the four basic functions,and they can be divided into different types.Some nodes which only store a small part of the block-chain are called“Light-weight Nodes”.The light-weight nodes don’t provide connecting services,and can be seen as“Clinets”.Some nodes verify transactions through a SPV(Simple Payment Verification)protocol,so they are called“SPV Nodes”or“SPV Wallets”.Some nodes hold a complete copy of the block-chain,but do not join the task of mining,can be called“Edge Routers”.They are often operated by large commercial companies to support“Exchange”or“Browser”.In addition,some nodes are dedicated to mining and are called“Miners”.Many miners can form a“Pool”and connect to the main network through pool protocols.

Figure 2.Bitcoin network protocols.

From the perspective of the whole Bitcoin network,the nodes can be divided into the“Reachable Nodes”and the“Unreachable Nodes”[2].The reachable nodes can receive outside connecting requests and shake hands with other nodes.They are usually full nodes and can be easily detected.The unreachable nodes can’t receive outside connecting requests,but can send requests and join the network running.They are usually located behind the Firewall or the NAT.This paper mainly discusses the reachable nodes.

2.2 Protocols

The nodes in Bitcoin network communicate through a set of agreed TCP protocols,which make basic functions possible.The main protocols are shown in Figure 2.PING-PONG:A preliminary handshake interaction between two parties.A Pseudo is contained in the message.

VERSION-VERACK:Used to confirm that a successful TCP link has been established.The message contains basic information such as service type,IP address,port number,client version,etc.

GETADDR-ADDR:GETADDR is used when a node wants to request the Neighbor Addresses of another.ADDR is the reply message to GETADDR in which contained up to 1000 IP addresses.

INV-GETDATA-TX(BLOCK):Used to exchange transactions/blocks data between parties.INV contains an index of one party’s block/transaction list.GETDATA is used to request actual blocks/transactions from one party.TX(BLOCK)contains the actual transactions/blocks data.

2.3 Address Management

In the P2P network,it’s very important to manage node addresses effectively.Each Bitcoin node has a database called addrman to store the live node addresses.Addrman manages these node addresses by Buckets and classifies them to Tried and New.Tried means the addresses have been connected,while New means addresses have not yet been tried.There are probably tens of thousands of addresses in one node’s addrman,and each has a timestamp to indicate freshness.

Definition 1.We call these node addresses stored in addrman“Neighbor Addresses”.

A node obtains the Neighbor Addresses in two ways:The first is receiving the unsolicited broadcasting address.Bitcoin network has a mechanism to broadcast fresh addresses named“Tricling”.Addresses of newly joined nodes or with fresh timestamps will be continuously forwarded,until the difference between the timestamp and the current time exceeding ten minutes.These obtained addresses will be stored in addrman as Neighbor Addresses with priority.The second way is through GETADDR-ADDR mechanism.After any node initiates a GETADDR message to another node,the target node will answer with up to 1000 addresses randomly selected from its addrman.So any node can obtain a large number of addresses by this mechanism.

According to some specific algorithm,each node selects a number of Neighbor Addresses in addrman to initiate outgoing connect.In the Bitcoin network,each node can initiate up to 10 outgoing connections and receive up to 117 incoming connections at most.For outgoing connections,the node will modify the timestamp of the target address continuously during the interaction,but not for incoming connections.This feature can help us infer the realtime outgoing connection of the target node.

In recent years,there were some studies about Bitcoin network measuring and topology inferring.

Bitcoin Network Measurement:Joan et al.measured the Bitcoin network[3]from Nov 2013 to Jan 2014,collected 872,000 nodes using Bitcoin-Sniffer[4],and analyzed node properties like geographic distribution,node stability,network transmission delay,etc.Christian Decker et al.measured the propagation of blocks and transactions[5]of Bitcoin network in 2013.Giuseppe Pappalardo et al.did the same work[6]in 2016.Fadhil et al.measured the Bitcoin network[7]during one week,collected 313,676 nodes and 6430 stable online nodes.Sehyun Park et al.measured the Bitcoin nodes[8]in 2018 and carried out a comparing research.They collected nearly 1 million nodes in 37 days,and compared the result with previous works.From these related works,we can find that the number of nodes was closely related to the measurement time.

Topology Inferring:There existed three kinds of methods to infer the topology:

1)By analyzing timestamps. In 2014,Alex Biryukov et al. proposed a topology inferring method[9]by analyzing the address propagating mechanism.The author collected the broadcasting addresses by the entry nodes,and inferred the nodes’connection relationship.That sound reasonable but there would be too much noise,which greatly affected the accuracy.Andrew Miller et al.proposed AddressProbe[10],requested Neighbor Addresses as much as possible,and analyzed the real connections between peers.The method had an accuracy of 70-80% but turned useless after the update of Bitcoin client.Till Neudecker et al.inferred the network topology[11]by observing the changes of a specific transaction’s arrival time,and deduced the connections between nodes.The accuracy and recall were both around 40%.

2)By sending real transactions.Sergi Delgado-Segura et al.proposed a topology inferring technology TX-Probe[12]based on analyzing the“isolated transactions”in 2018.According to the mechanism of“isolated transactions”,the author can infer the real connections of Adjacent Nodes,with an accuracy of 100%and a recall of 95%.Although this method was very accurate,but the fee cost would be very expensive.Matthias Grundmann et al.proposed two topology infering methods[13]by sending“double-spend transactions”to adjacent nodes,with an accurate of 71% and a recall of 87%.These two methods were not verified in the real Bitcoin network and would be expensive to carry out.

Figure 3.BNS Structure.

3)By simulation.Varun Deshpande et al.proposed a way to infer topology[14]by simulation named BTCmap.The author first collected the actual online nodes,inputted them to the simulation platform,and obtained the simulated topology using the real algorithm.The result would be very different to the real network.

4.1 Bitcoin Network Sniffer(BNS)

We developed a high efficient measuring platform named Bitcoin Network Sniffer(BNS)using Java.It can find and connect to thousands of Bitcoin nodes at the same time,collecting nodes information,exchanging messages and counting node properties.On this basis,BNS can infer the network topology using the network-layer data.The system structure is shown in Figure 3 below.BNS obtains IP addresses of Bitcoin nodes through DNSSEED(some node addresses hard-coded in the client source)which can return 203 active IP addresses,as shown in Table1.BNS establishes connections with these nodes,and keeps obtaining the Neighbor Addresses through sending GETADDR messages.After receiving the ADDR messages,BNS establishes multi-thread connections with these new nodes,obtains the Neighbor Addresses and stores them in the IP database(only the addresses non-redundant).The procedure will be repeated formany times until the total number of addresses in the IP database is no longer increasing.At the same time,BNS will record all the connected nodes(reachable nodes),collect the returned VERSION messages and extract information like service type,port number,client version,geographical location etc.The extracted data will be send to the analysis module.

Table 1.IP Addresses returned by DNSSEED.

Table 2.Comparison of measuring results.

4.2 Experiment

We measured the Bitcoin network from 11:00 to 12:30 on Sep 1st 2021 and collected nearly 4.1 million nodes.Besides,BNS had established 9515 TCP connections with the reachable nodes.We compared other works with our measurement,as shown in Table 2.In 2013,the bitcoin-sniffer[4]collected nearly 873,000 nodes during 37 days,in which there were 5769 online nodes.In 2018,the Bitcoin-Node-Scanner[8]collected 500,000 nodes and 8,527 online nodes in 1 day.It collected nearly 1.1 million nodes and 8500 online nodes in 37 day.The well-known website“BITNODES”[15]displayed 9,531 online nodes at the same time to our experiment,which was very close to our result.

From the comparing,we can see that the total number of the Bitcoin nodes had increased tremendously in the recent years maybe because of the coin price’blooming in 2021.And the results also showed the ob-vious advantages in efficiency and performance of our measuring platform.Besides,the number of reachable nodes was increasing continuously.

Table 3.VERSION message format.

VERSION message contains useful information such as protocol version,service type,timestamp,IP address,port number,block height,client version,and etc,as shown in Table 3 below.In Chapter 5,we will make detailed statistics on these node proporties.

In this chapter,we counted the 7996 VERSION messages obtained in 1,analyzed the node properties like service type,port number,client version,geographic distribution.That help us learn more about the Bitcoin reachable nodes.

5.1 Service Type

The Bitcoin nodes can be divided into full nodes and light-weight nodes.By checking the feild of“Services”in the message,we can roughly judge the type of the node.The feild of“Services”is fliled by a number(Service Number),standing for some service flags(NODE_NETWORK,NODE_WITNESS,NODE_BLOOM,etc).Every flag is an indicator of a specifci function.For example,NODE_NETWORK means the node has a complete copy of the blockchain.By counting the NODE_NETWORK flags,we can roughly know how many full nodes are there in the reachable nodes,as shown in Figure 4.In the 7,996 reachable nodes,there were 6,938 full nodes and 1,058 light-weight nodes.That meant that 87%of the reachable nodes were“Servers”and 13%were“Clients”.

Figure 4.Service types.

Table 4.Port numbers.

Table 5.Client versions.

5.2 Port Number

The Bitcoin nodes communicate by TCP protocols with the default port number 8333.The field of“Port Number”stands for the open port to which other nodes connect.We counted port numbers used in the real network,as shown in Table 4.We found that 96.46%of all nodes communicate by port 8333,while only a small proportion used other ports.

5.3 Client Version

There were many different client versions in the Bitcoin network.The latest was Satoshi:0.21 series,while the older might be Satoshi:0.9 series or even earlier versions.We counted different versions in Table 5 below.Consider that there were many modified versions,we had classified them by the standard versions.Statistics showed that 70%of the reachable nodes had upgraded to the newer versions of Satoshi:0.20 series or Satoshi:0.21 series,but still 2.66%were using very old versions.

Figure 5.Geographic distribution of Bitcoin nodes.

Table 6.Continental proportion of reachable nodes.

5.4 Geographic Distribution

By the third-party website“BITNODES”,we got the longitude and latitude information of the 9515 reachable nodes,and drew a geographic distribution map by Python,as shown in Figure 5 below.From Figure 5,we can found that North America and Europe had the most Bitcoin reachable nodes.In addition,we counted the continental proportions of reachable nodes,as shown in Table 6 below.

The Bitcoin network is a dynamic P2P network.The reachable nodes in the network are constantly changing,which are called“Churn Nodes”[16][17].In our research,we connected the reachable nodes periodically,got the change rule of them and evaluated the nodes’stability.

6.1 Experiment

Figure 6.Changes of total reachable nodes.

Figure 7.Percentage change refer to Day1.

We collected the reachable nodes from 16:00 to 20:00 every day during May 12th(DAY1)to May 28th(DAY17)in 2021,and extracted the IP addresses of these nodes to make a dataset.During 17 days,the total number of reachable nodes varied between 9,300 and 9,900.After comparing the nodes of everyday,We found that 6,059 nodes were keeping online from Day1 to Day17,as is shown in Figure 6.In the figure,the blue line stands for the total online nodes,while the red line stands for the stable nodes that keeping online in 17 days.In the Bitcoin network,the reason of nodes’churning may be:1)The delay of transmission looked like unreachable.2)The overflow of incoming connections caused by the upper limit of 114.3)Disconnections caused by the failure of hardware or software.

6.2 Node Stability

We selected IP addresses of Day1 to Day15 from the dataset to evaluate everyday’s relative changes of online nodes.

Percentage Change refer to Day1:Day1 had 9307 reachable nodes,while some nodes would change on the next day.For example,10.1%of all nodes changed on Day2,and 25.9% of all nodes changed on Day15,as is shown in Figure 7.Here,“change”meant that the old nodes went offline and the new nodes joined in.In Figure 7,taking Day1 as a reference,the number of changed nodes kept increasing day by day,from 10.1%(Day2)to 25.9%(Day15).There was a small fluctuation on Day8(May 20),possibly because some offline nodes in Day2 rejoined the network again.

Figure 8.Percentage change refer to the day before.

Percentage Change refer to the Day Before:Now,we observe the changed nodes referring to the previous day.Every day,there were about 9%of all nodes changed referring to the previous day,as shown in Figure 8.The maximum variation occurred on Day9(May 21)with a changing proportion of 11.7%and the minimum variation occurred on Day8(May 20)with a changing proportion of 7.2%.

The conclusions above are observed from the whole network.In fact,many P2P connections are relatively fixed.For example,in our experiment,we deployed 3 test nodes and query their neighbor addresses by the“peerinfo”command.We found that about 2/3 of the connections were very stable,and only 1/3 were constantly changing.

Bitcoin keeps the network topology highly confidential,as it’s very dangerous for an adversary to learn the network topology.It is very important for researchers and regulators to study the Bitcoin network topology which will help to optimize the network and analyze the transactions.In this paper,we propose a new approach to infer the network topology by analyzing the Neighbor Addresses of Adjacent Nodes and their timestamps.

7.1 Methodology

When a Bitcoin node establishes a peer-to-peer connection,it will select some Neighbor Addresses in its addrman to connect according to some build-in algorithm.

Suppose A,B are both Bitcoin nodes:If A,B have a real connection between them,A’s address must be B’s Neighbor Address and B’s address must be A’s Neighbor Address.Therefore,as long as we get all the Neighbor Addresses of each Bitcoin node,it will be possible to search the real connections by analyzing the neighbor relations and the timestamps.

Specifically,we establish multi-threads connections with all reachable nodes in Bitcoin network by BNS,and send GETADDR messages repeatedly to these nodes.Our target is to get all Neighbor Addresses of the reachable nodes.

Of course,not all nodes will return their Neighbor Addresses totally.Some nodes don’t reply GETADDR because of the instable network.Some nodes will interrupt the TCP connection when they find someone send the same GETADDR message twice.Others will reply with the same ADDR messages,in which the Neighbor Addresses are completely same.After many experiments,we found that nodes with the client version before Satoshi:0.20 series can return all the Neighbor Addresses.

Definition 2.We call the node“Active Node”,if it can return all his Neighbor Addresses.

In 2021,the number of the Active Nodes varied between 3000 to 4000.In a node’s addrman,the timestamp updates frequently if it makes a outgoing connection.But if the it receive a incoming connection,the timestamp is set at the time when the connection is initialized and never changes.Therefore,by carefully analyzing the timestamp of neighbor nodes and finding the node with the latest timestamp,it is possible to find the outgoing connection and to infer the realtime topology.

7.2 Model

1)Nodes Collection

Take all Active Nodes in the Bitcoin network as collection B(size is n).bi(i=1,...,n)is a random element in B.Each bi represents a Active Node.

2)Address Vector

We connected to each bi and got all its Neighbor Addresses,say bij(j=1,2,...).We made an address vector:

In the vector,bij represents a Neighbor Address of node bi.Now we check whether bij is contained in collection B,and we can get a boolean vector Bi.In vector Bi,each bij is a boolean value.TRUE means bj is the Neighbor Address of bi,while FALSE means bj isn’t the Neighbor Address of bi.

3)Address Matrix

By repeating the process of 2),we can get n address vectors(B1,B2,...Bn)and made a boolean matrix M.

In matrix M,each element bij is a boolean value.TRUE means bj is the Neighbor Address of bi,while FALSE means bj isn’t the Neighbor Address of bi.

4)Find Symmetric Elements

Next,we try to find the symmetric elements in matrix M,where“symmetric”means the two elements in diagonal-symmetry position are both TRUE.As shown in Figure 9,if bij is TRUE,we need to judge whether bji in the diagonal-symmetry position is also TRUE.If bij and bji are both TRUE,that means that bj is the Neighbor Address of bi and bi is also the Neighbor Address of bj.So they are a pair of symmetric elements and have a good chance to have a real connection between them.We search matrix M entirely and store all the symmetric elements in database.

Definition 3.We call the nodes“Adjacent Nodes”,if they are the symmetric elements in matrix M.

5)Select Pair Nodes

In the result of 4),even if bi and bj were Adjacent Nodes,that Does Not meant they had a real connection between them.We could select pair nodes with real connection by timestamps.We analyzed the nodes’timestamps carefully and eliminated nodes with older timestamps.Because the timestamps of nodes with outgoing connections updated frequently,the pair nodes with newer timestamps should have real connection between them.

Figure 9.Find Symmetric Elements.

Definition 4.We call the nodes“Neighbor Nodes”,if they have a real connection between them.

6)Build Topology

After the process of 5),we got a lot of pairs of Neighbor Nodes,from which we can extract many points and edges(no direction).Based on these points and edges,we can build a topology of the real Bitcoin network.

Error Analysis:Our approach mainly used the mechanism that a node will dynamically update its timestamp when it has established a outgoing connection with outside nodes.But the approach will make errors in the following situations:1)A node will update the timestamp of a newly received node when it receive unsolicited ADDRs.But it has no connections with these nodes.2)When a node initiates a GETADDR request,it will set a 2 hours penalty time for the address in the returned ADDR.Very unfortunately,some nodes will be selected as pair nodes and be mistakenly considered as neighbor nodes.

7.3 Experiment

We measured the Bitcoin network from 11:00 to 17:00 on Sep 16th 2021,and found 2980 Active Nodes.According to our approach,we established an address matrix of 2980×2980.After the procedure of finding symmetric elements and selecting pair nodes,we found 1309 points and 13548 edges.Finally,we used Gephi to draw an abstract topology graph of the Bitcoin network,shown in Figure 10.In the above graph,points and circles stood for Bitcoin nodes,and larger circles meaning higher degree of the node.Lines stood for the connections between Bitcoin nodes,nodes with dense connections being located in the center while sparse one at the edge.As is well known,the Bitcoin network is composed of the reachable nodes and the unreachable nodes[18].Figure 10 topology only included reachable nodes and was the core of the real Bitcoin topology.To make the abstract topology more meaningful,we matched the real longitude and latitude to the 1309 nodes,and located them in a world map,as shown in Figure 11.

Figure 10.Abstract topology.

Figure 11.Geographic topology.

From the geographic topology,we can see that the communications between Europe,North America and East Asia are very dense.The result can also support Figure 5 and Table 6.

EVALUATION:We deployed 3 real Bitcoin nodes to evaluate the inferring results.The average accuracy was beyond 80%,but the average recall was 40%.We got a better accuracy but a worse recall.Due to the instable network and constantly changed software versions of Bitcoin nodes,BNS could not collect entire Neighbor Addresses in the real network.Besides,there were many unreachable nodes which could not be detected by BNS.So what we got was only a part(but the core)of the real topology,and that explained the low recall.

Figure 12.Node degree distribution.

Table 7.Comparison of topology parameters.

7.4 Details of the Topology

See Figure 10 and Figure 11,the average degree of the topology was 10.35.The maximum degree was 22 and the minimum was 1.The node degree distribution is showed in Figure 12.The diameter of the topology was 7.The clustering coefficient parameter was 0.016,and the modularity parameter was 0.252.We made a comparison with TxProbe[12],as shown in Table 7 below.We could find that the number of nodes nearly doubled than TxProbe,because of the effective detecting by BNS.But the average degree was lower than TxProbe because only reachable nodes were showed in the topology.Many of the unreachable nodes were not showed in the topology,so there must be many hiden edges.Our approach was based on detecting the Active Nodes.Because the Active Nodes in the real network were limited,we could not get the whole topology of Bitcion network by this approach.But Figure 10 topology was a core part(Servers)of the whole topology,so it was very meaningful.

In this paper,we had measured and analyzed the real Bitcoin network.We developed a very effective network measuring platform BNS,which can establish multi-threads connections with the network’s reachable nodes and collect returned messages.We collected 4.1 million nodes and 9515 reachable nodes in 1.5 hours and counted nodes’proporties such as service type,port number,client version and geographic distribution.We carried out a deep research on the nodes’stability and found nearly 60%kept stable during 15 days.Finally,we proposed a new approach to infer the network topology based on the address propagating mechanism and timestamps,which had an accuracy over 80%.We analyzed the topology’s parameters and compared with the previous work.

By our approach,we can only see the topology of the visible part(reachable nodes),still we know nothing about the invisible part(unreachable nodes).The unreachable nodes are attracting more researchers’attention[19][20].In the future,we will continue to make deeper research on the unreachable nodes,try to find ways to idendify and analyze them.Then we will get a whole picture of the entire network and evaluate the robustness and invulnerability of the Bitcoin network[21][22][23].

ACKNOWLEDGEMENT

This work was supported by National Key Research and Development Program of China(Grant No.2020YFB1006105).

推荐访问:Measurement Network Bitcoin