ECE 5725 | PiTorrent

Introduction

The benefit to peer-to-peer downloading is it scales better than the traditional client-server architecture. Peers can request parts of the file from each other rather than everyone connecting to the same server, which could get overwhelmed. With peer-to-peer, the more active peers, the faster the download. For this project we build an HTTP server to act as the tracker. A python script was also developed which creates a metainfo (.pt) file so users can connect to the tracker and find peers involved with the desired file. We also wrote the code for the peers. The same peer code is run for the original seeder (the user which initially has the file) and all other peers. Command line options are passed so specify the port, metainfo (.pt) file, a peer name, if they already have the full file, and the name of the destination/source file. Peers can be dynamically added and each peer will connect to each other and request pieces from each other.

Client-server architecture

Peer-to-peer architecture

Objective

Our goal was to make an implementation of a subset of the BitTorrent Protocol (BTP) v1.0. This includes features allowing multiple peers to dynamically connect to a swarm and begin downloading a file simultaneously from one another. This was implemented in Python using existing HTTP server libraries, as well as traditional socket programming.

Video

Design

Build the HTTP tracker server

The first step in the implementation of this protocol for us was to build a HTTP server that acted as a tracker. The tracker is the only centralized part of BTPv1.0. By connecting to this server, a peer can advertise its own IP address and port to all other peers, while simultaneously getting a list of all active peers who are either downloading the file from others or seeding (those who have already downloaded the file and are providing access to the file to others). The protocol consists purely of call and response, with the peer sending an HTTP GET request to the tracker, and the server sending a bencoded response. Bencoding is the protocol for encoding data for metainfo files in the BTPv1.0. It consists of a relatively simple grammar. We used an existing Python library bencode to facilitate these encodings and decodings, as the main focus of our project was on the networking aspects. The GET request is required to have certain fields that the server parses in order to add the peer to the list of active peers.

Make metainfo files (PiTorrent files)

A metainfo file contains information about the file to be downloaded, as well as the IP address and port number needed to connect to the tracker server. This file is completely bencoded and the peer software must parse this in order to retrieve the relevant information. The size of this file is directly related to how large the pieces of a file to be torrented are. File are divided into “pieces” of a fixed size, with all pieces being this size except for the last one, since files typically are of variable size. In order to check the validity of what you are downloading, the metainfo file, what we call a PiTorrent file, contains an entry in it of the SHA1 hash of each piece, all concatenated together. This entry typically determines the actual size of the PiTorrent file, since files tend to have hundreds of pieces. Since each hashed piece is 20 bytes under SHA1 and the specification recommends this file be under 70 kilobytes, we chose 256 kilobytes as our piece size, which made our PiTorrent file 70 kilobytes when the file is around 1 gigabyte in size. In order to validate the pieces it receives, after downloading a full piece, the peer would take the SHA1 hash and compare it with the corresponding hash in the PiTorrent file. With the the ability to create this file, we are now able to take a file set for distribution and create this file to allow others to find the tracker server, where they can request the list of peers involved in its download.

Build the peer software to connect to the server and obtain a list of active peers

Peers will decode the metainfo (.pt) file to get the url of the tracker, and info about the file to download. Here we found it best to print on the server side the list of active peers every time someone makes a connection. The server should send back a bencoded list of dictionaries contains each peer’s id, ip and port. It is useful to print this information on the client side as well and compare with the printout on the server side.

After getting list of peers, peer can connect to one other peer and exchange information on the pieces they currently have

We started with an initial seeder which would connect to the tracker first and then one other peer. We made sure that this peer was able to handshake with one other peer and get into a state where it was ready to exchange pieces.

Requester state-machine

Seeder state-machine

Add support for peer discovery and connection when a new peer comes online dynamically

Since peers are added to the cluster dynamically, there needs to be a way for peers to find peers that come online after the first connection. We handle this differently from BTP. Each peer opens up a connection with every peer on the list received from the tracker. If a peer gets a connection from a new peer it did not know about and its handshake was valid then the new peer is added to a list of possible connections. Later on that peer can open a connection in the other direction. This way all peers can discover each other without constant check-ins with the tracker.

Peers can trade pieces of the file with one another and write them to disk

The final step is to break the file into pieces and send/receive those pieces. Each piece is further broken down into blocks. Blocks are collected in a buffer and once all the blocks have been received it is validated with the hash in the metainfo file. After a piece is validated the peer broadcasts to everyone it is connected to that is now has the new piece. The peer then writes the piece into the file by indexing the file-pointer by the piece number and the piece size. This piece can be loaded back into memory if another peer requests it. The tricky situations come with the last block and the last piece since the file is not necessarily aligned to the number of pieces and the piece size is not necessarily aligned to the size of the block. Some cleanup code is therefore required for the last block in each piece and for the last piece in the file.

Testing

All of our testing was done through usage testing, as we found this to be the best way to find bugs and errors in our networking code. We also would write smaller sample applications when testing out a new feature before adding it to the main peer application. Sending a simple text file compared to a picture or movie file was much easier for debugging since the pieces could be directly read to determine if we had an offset error. We also printed relevant information on handshaking, bit-fields, sent pieces, and received pieces. Also on the tracker, we printed the connected peers every time someone made a request to the tracker.

Results

We were able to have peers request pieces from each other rather than from a single server. This was verified by printing the peer id hash of the source of the piece. Table 1 shows results comparing a single Rpi downloading from the original seeder. The Rpi 1 Model B+ and the Rpi 2 Model B are both used to compare the differences in processor. Interestingly the Rpi 1 had a faster single download speed than the Rpi 2. We expect this is because of communication overhead of running on the multiple cores available on the Rpi 2.

Single download	Time
pi2	132.09
pi1	88.53

Table 1

Then we compared when both Rpis were connected to the network at the same time. The downloads were started at the same time, and giving each a 30 head start. When the downloads were started at the same the slower Rpi is able to keep up with the faster one since the extra connection picks up the slack. In this case both download times are around the same. In the next two cases one Rpi has only been able to connect to the initial seeder but the next Rpi is able to catch up by getting pieces from both. In both cases the Rpi that started last had a lower download speed. This shows that peers are able to benefit from other peers already having pieces rather than only connecting to a single server for the file.

The Rpi 2 is able to benefit the most by starting second. Its speedup compared to a single download was ~3.5x. This is likely due to the multiple cores able to better handle the multiple threads reading/writing to multiple sockets. Given a large network with much more peers than our budget allows, we expect the Rpi 2 to have faster download speeds than the Rpi 2. It is also likely that the network adapters in the Rpi 2 are faster or more optimized.

Conclusion

From our results, I would say that we have achieved our primary goal with this project: leveraging peer to peer techniques to improve the overall download speed of a file across multiple nodes in a network. Our implementation was able to handle peers joining the download dynamically, and they had an increased download speed than if they merely downloaded in a client-server fashion. We were not able to handle peers pausing a download, and then attempting to restart. We also did not handle disconnecting from the downloading party as gracefully as we would have liked, but peers can dynamically disconnect and it will not affect the downloads of others, so long as every piece exists on at least one node in the network of peers. We were pleased with the extent of our project and also with the room for further improvements and features.

Future Work

Future work could include implementing the rest of the bittorrent protocol but there are some steps which our code is already very close to. Currently peers will seed the file forever, even after the download has completed. A possible adjustment would be to implement a way for a peer to disconnect from the cluster. BTP handles this by having every peer check in the the tracker periodically. If a peer fails to check in then that peer is taken off the valid list and other peers will disconnect. This would also handle the case if a peer goes offline or quits in the middle of downloading a file. Currently our code handles lost connections by attempting to reconnect. Threads could be wasted in the connection was actually closed rather than just timing out.

Another optimization could be requesting pieces out of order. Currently our code collects pieces in order but instead of incrementing over the empty pieces, possible speed improvements could result from comparing a peer’s “have list” with what is empty. This would allow for more parallelism since if all peers are around the same point in the file they will most likely all request pieces from the initial seeder rather than from each other.

Product	Vendor	Price	Quanity	Price
Raspberry Pi Model B+	Newark	$25.00	2	$50.00
TP-LINK 5-Port Fast Ethernet Desktop Switch	Amazon	$9.95	1	$9.95
micro SD Card	Amazon	$4.99	1	$4.99
power supply	Newark	$10.33	2	$20.66
			Total	$85.60

Pi-Torrent

Sean Carroll and James Talmage