Bit -Torrent Client

Introduction

In this assignment, you are asked to develop URTorrent, a simplified version of the original BitTorrent protocol. The official BitTorrent specification is located at the bittorrent.org website. We have selectively included information pertinent to this assignment. The program must be written in python. The idea of URTorrent is simple. Peers collaborate to download files. Peers are discovered via a tracker. Let’s say some client has a file to be shared named UR.mp3. The first thing the client needs to do is create a metainfo file containing information about UR.mp3. The metainfo file usually has the .torrent extension. The precise format of the metainfo file will be described below. Roughly, it contains the file name, keywords (if any), and the URL of a program called a tracker, which is responsible for keeping track of all the peers who are collaborating in downloading UR.mp3.
The original client that had the file to share is called a seeder. Peers who do not have the entire file yet are called leechers. There might be multiple seeders for the same file, because once a leecher has downloaded the entire file, it will become a seeder. To start a URTorrent service, a host goes through the following steps:

  • Start running a tracker (or, more likely, have one running already).
  • Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker.
  • Distribute the metainfo file.
  • Start URTorrent (the ‘original’ uploader) which already has the complete file.
  • Upload the requested chunks to the peers.

To start downloading, a user does the following:

  • Run URTorrent. Pass the port number (on which URTorrent listens) and the name of the metainfo file.
  • URTorrent contacts the tracker and gets a list of peers
  • URTorrent downloads chunks of the file from various peers and uploads chunks that are already available locally. 
  • URTorrent waits for the download to complete.

The Details

Creating the metainfo file

In the wild, you will typically have to upload the metainfo file to some webserver, allowing others to search for it and download it. In this assignment, we will assume that every peer has the metainfo file. We will create the metainfo file for UR.mp3 and give it to each participating peer. You can create the metainfo file using a command line utility that Bram Cohen wrote in Python a while back, named btmakemetafile. Here’s a tar file containing the utility (and supporting files). Download and untar it. The complete source tree may be found at: BitTornado. You will get a directory named MMT containing the utility. To create a metafile for UR.mp3, you can type something like this:

btmakemetafile http://127.0.0.1:6969/announce UR.mp3

The first argument http://127.0.0.1:6969/announce is the URL of the tracker, which basically consists of the IP address and the port number that the tracker is running and listening on. Note that the tracker will run the HTTP protocol, and hence the http:// in the beginning. The second argument is the file name. 
This will generate a file called UR.mp3.torrent. Make sure to include the port number in the tracker url if it isn’t 80. This command may take a while to scan over the whole file while hashing it. The /announce path is special and hard-coded into the tracker. Make sure to use the domain or ip your tracker is on instead of my.tracker. BencodingBencoding is used for creating the metainfo (.torrent) file from the original file. You may use this bendecoder written in C by Mike Frysinger.

Also feel free to write your own.

All data in a metainfo file is bencoded. The content of a metainfo file (the file ending in “.torrent”) is a bencoded dictionary, containing the keys listed below. All character string values are UTF-8 encoded.

  • announce: The announce URL of the tracker (string)
  • info: a dictionary that describes the file(s) of the torrent. There are two possible forms: one for the case of a ‘single-file’ torrent with no directory structure, and one for the case of a ‘multi-file’ torrent. For our assignment, all our torrents need to be ‘single file’ torrent. 
  • a few other optional keys such as, creation date, comment, created by, and encoding. When you read the metainfo file, ignore these keys.

The info dictionary contains the following fields:

  • length: length of the file in bytes (integer)
  • name: the filename. This is purely advisory. (string) 
  • piece length: number of bytes in each piece (integer) piece length is almost always a power of two, most commonly 2^18 = 256 KB = 262144 bytes. The most popular sizes are 256 kB, 512 kB, and 1 MB.
  • pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece (byte string, i.e., not url-encoded) 

To compute SHA1 hashes (which helps ensure piece integrity, i.e., that the data has not been corrupted), you can use openssl’s crypto library. You need to link with -lcrypto to compile. Here is a sample program. 

#include 

int main()
{  
  const char str[] = "Original String";
  unsigned char hash[SHA_DIGEST_LENGTH]; // == 20

  SHA1(str, sizeof(str) - 1, hash);

  // do some stuff with the hash

  return 0;
}

Source: stackoverflow.

The Tracker 

Tracker GET requests have the following keys:

  • info_hash: urlencoded 20-byte SHA1 hash of the value of the info key from the metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key above. 
  • peer_id: urlencoded 20-byte string used as a unique ID for the client, generated by the client at startup. This is allowed to be any value and may be binary data. 
  • port: The port number that the client is listening on. 
  • uploaded: The total number of bytes uploaded (since the client sent the ‘started’ event to the tracker) in base ten ASCII. 
  • downloaded: The total number of bytes downloaded (since the client sent the ‘started’ event to the tracker) in base ten ASCII. 
  • left: The number of bytes this client still has to download, encoded in base ten ASCII. 
  • compact: Setting this to 1 indicates that the client accepts a compact response. The peer list is replaced by a peers string with 6 bytes per peer. The first four bytes are the host (in network byte order), the last two bytes are the port (again in network byte order). It should be noted that some trackers only support compact responses (for saving bandwidth) and either refuse requests without “compact=1” or simply send a compact response unless the request contains “compact=0” (in which case they will refuse the request.) 
  • event: If specified, must be one of startedcompletedstopped, (or empty which is the same as not being specified). If not specified, then this request is one performed at regular intervals.

Note that all binary data in the URL (particularly info_hash and peer_id) must be properly escaped. This means any byte not in the set 0-9, a-z, A-Z, ‘.’, ‘-‘, ‘_’ and ‘~’, must be encoded using the “%nn” format, where nn is the hexadecimal value of the byte. (See RFC1738 for details.) This is called the urlencoding of the data. A request from the (original) seeder looks something like this (all on one line though, I separated the request into 3 lines for easy reading and printing):

GET /announce?info_hash=_tWL%26%BD%C4%BDsEn%FD%7E1%2CJ3%40s%1B&
peer_id=M3-4-2--5ffd511f4079&port=6881&key=585b8345&uploaded=0&downloaded=0&
left=0&compact=1&event=started HTTP/1.1

A request from a leecher looks something like this (again, all on one line):

key:xxxxxxxxxxxxxxxx

GET /announce?info_hash=_tWL%26%BD%C4%BDsEn%FD%7E1%2CJ3%40s%1B&
peer_id=M3-4-2--f8c7df8ee1b9&port=6882&key=a6fa9e7e&uploaded=0&downloaded=0
&left=90112&compact=1&event=started HTTP/1.1

Note: 90112 is the size of UR.mp3. A GET request typically consists of the GET request line (as shown above) and an empty line. The last two characters of an HTTP request line must be \r\n. The “empty” line must consist of two characters \r\n. Those are “carriage-return” and “line-feed” characters. The tracker responds with “text/plain” document consisting of a bencoded dictionary with the following keys:

  • failure reason: If present, then no other keys may be present. The value is a human-readable error message as to why the request failed (string). 
  • warning message: (new, optional) Similar to failure reason, but the response still gets processed normally. The warning message is shown just like an error. 
  • interval: Interval in seconds that the client should wait between sending regular requests to the tracker 
  • min interval: (optional) Minimum announce interval. If present clients must not reannounce more frequently than this. 
  • tracker id: A string that the client should send back on its next announcements. If absent and a previous announce sent a tracker id, do not discard the old value; keep using it. 
  • complete: number of peers with the entire file, i.e. seeders (integer) 
  • incomplete: number of non-seeder peers, aka “leechers” (integer) 
  • peers: (dictionary model) The value is a list of dictionaries, each with the following keys:
    • peer id: peer’s self-selected ID, as described above for the tracker request (string) 
    • ip: peer’s IP address either IPv6 (hexed) or IPv4 (dotted quad) or DNS name (string) 
    • port: peer’s port number (integer)
  • peers: (binary model) Instead of using the dictionary model described above, the peers value may be a string consisting of multiples of 6 bytes. First 4 bytes are the IP address and last 2 bytes are the port number. All in the network (big endian) notation. 

Clients may send a request to the tracker more often than the specified interval, if an event occurs (i.e. stopped or completed) or if the client needs to learn about more peers. However, it is considered a bad practice to “hammer” on a tracker to get multiple peers.A reply from the tracker may look something like this:

HTTP/1.0 200 OK
Content-Length: 27
Content-Type: text/plain
Pragma: no-cache

d8:intervali1800e5:peers0:e

In general, the reply from the tracker follows the format specified here. You should print out the first line of the response. If the status code (e.g. the number 200 above) is anything other than 2xx, then something is wrong, don’t read and parse the content. The content of the reply is everything after the blank line. 
Building your own tracker from the scratch would be fun.

But you can use open source implementation of Bittorrent-tracker https://github.com/webtorrent/bittorrent-tracker.
Install Node https://nodejs.org/en/ and the user npm to install the tracker. (Use sudo or and Admin terminal)

sudo npm install -g bittorrent-tracker

Once you’ve install the bittorrent-tracker start it by running

$ bittorrent-tracker -p 6969
HTTP tracker: http://localhost:6969/announce
UDP tracker: udp://0.0.0.0:6969
UDP6 tracker: udp://localhost:6969
WebSocket tracker: ws://localhost:6969
Tracker stats: http://localhost:6969/stats

Open the stats page in your browser http://localhost:6969/stats to see stats on your bit torrent client.

Or you can use the a popular public tracker written by Dirk Engling. Follow the build instruction. Note: if you didn’t do anything special (like having a persistent-connection request along with the GET line), the tracker will close the connection right after sending the reply. Don’t be surprised! You should try telnet localhost 6969 after running opentracker, and then cut-and-paste a sample request shown in the previous section to see the tracker’s response.

The Peer Wire Protocol

Once the tracker is running peers can start collaborating on downloading this file. To get to know other peers, a client talks to the tracker using the Tracker HTTP Protocol, which is simply a tiny subset of the HTTP protocol. The tracker maintains and updates information about the torrent or the swarm, which is the set of participating peers. Every shared file corresponds to a swarm. A new peer can join the swarm by informing and querying the tracker. 

Once knowing each other, peers can talk directly to each other using the Peer Wire Protocol over TCP to exchange pieces of UR.mp3. A swarm can be large, containing any where from tens to thousands of peers. Thus, a client only connects to a subset of those. 
The Peer Wire Protocol (PWP) is only conceptually simple: a client requests pieces of the file from other peers until it has all pieces in the entire file. For efficiency, PWP uses a simple strategy called rarest first in which, given the current list of missing pieces, the client requests a rarest one first.  


To solve the free rider problem, where a peer only downloads the pieces and rarely uploads any piece, PWP uses choke algorithms. A client gives pieces to good guys and “chokes” guys who haven’t given the client much. However, to avoid peers choking each other to death, especially in the beginning, a client also randomly unchokes some (potentially) bad guy too. This is called optimistic unchoking. 


Recent studies have shown that rarest first and choke algorithms are pretty good at solving both the free rider problem and the download efficiency problem. However, these mechanisms are far from perfect and there are a lot of potentials for further research and new protocol design. I hope some of you will get interested in this problem and start experimenting with new ideas. You could be the next Bram Cohen! 
The peer protocol facilitates the exchange of pieces as described in the metainfo file. To request a piece, a client will actually send multiple requests for parts of the piece, called blocks. A block size is typically divisible by the piece size. It is recommended to set the default block size to be 16KB =  214 = 16384 bytes. Feel free to set the default block size to be equal to the piece size if you want. However, since the last piece may not have the full size, some block requests will inevitably have to be of small sizes.

A client must maintain state information for each connection that it has with a remote peer:

  • choked: Whether or not the remote peer has choked this client. When a peer chokes the client, it is a notification that no requests will be answered until the client is unchoked. The client should not attempt to send requests for blocks, and it should consider all pending (unanswered) requests to be discarded by the remote peer. 
  • interested: Whether or not the remote peer is interested in something this client has to offer. This is a notification that the remote peer will begin requesting blocks when the client unchokes them. 

Note that this also implies that the client will also need to keep track of whether or not it is interested in the remote peer, and if it has the remote peer choked or unchoked. So, the real list looks something like this:

  • am_choking: this client is choking the peer 
  • am_interested: this client is interested in the peer 
  • peer_choking: peer is choking this client 
  • peer_interested: peer is interested in this client 

Client connections start out as “choked” and “not interested”. In other words:

  • am_choking = 1 
  • am_interested = 0 
  • peer_choking = 1 
  • peer_interested = 0 

A block is downloaded by the client when the client is interested in a peer and that peer is not choking the client. A block is uploaded by a client when the client is not choking a peer, and that peer is interested in the client. It is important for the client to keep its peers informed as to whether or not it is interested in them. This state information should be kept up-to-date with each peer even when the client is choked. This will allow peers to know if the client will begin downloading when it is unchoked (and vice-versa).Data type

Unless specified otherwise, all integers in the peer wire protocol are encoded as four byte big-endian values. This includes the length prefix on all messages that come after the handshake.

Message flow 

The peer wire protocol consists of an initial 2-way handshake. After that, peers communicate via an exchange of length-prefixed messages. The length-prefix is an integer as described above.

Handshake

The handshake is a required message and must be the first message transmitted by the client. It is (49+len(pstr)) bytes long.

handshake: (pstrlen)(pstr)(reserved)(info_hash)(peer_id)

  • pstrlen: string length of <pstr>, as a single raw byte 
  • pstr: string identifier of the protocol. 
    • In the URTorrent protocol, pstrlen = 18, and pstr = “URTorrent protocol”.
  • reserved: eight (8) reserved bytes. All current implementations use all zeros. Each bit in these bytes can be used to change the behavior of the protocol. An email from Bram suggests that trailing bits should be used first, so that leading bits may be used to change the meaning of trailing bits.
  • info_hash: 20-byte SHA1 hash of the info key in the metainfo file. This is the same info_hash that is transmitted in tracker requests. 
  • peer_id: 20-byte string used as a unique ID for the client 

The initiator of a connection is expected to transmit their handshake immediately. The recipient may wait for the initiator’s handshake, if it is capable of serving multiple torrents simultaneously (torrents are uniquely identified by their info_hash). However, the recipient must respond as soon as it sees the info_hash part of the handshake. If a client receives a handshake with an info_hash that it is not currently serving, then the client must drop the connection.

If the initiator of the connection receives a handshake in which the peer_id does not match the expected peer_id, then the initiator is expected to drop the connection. Note that the initiator presumably received the peer information from the tracker, which includes the peer_id that was registered by the peer. The peer_id from the tracker and in the handshake are expected to match. After handshaking, peers transmit other messages to initiate data transfer. Keepalives are messages of length zero. For URTorrent, you are expected to send a keepalive message every two minutes. 

All of the remaining messages in the protocol take the form of (length prefix)(message ID)(payload)>. The length prefix is a four-byte big-endian value. The message ID is a single decimal byte. The payload is message dependent. 

keep-alive:

The keep-alive message is a message with zero bytes, specified with the length prefix set to zero. There is no message ID and no payload. Peers may close a connection if they receive no messages (keep-alive or any other message) for a certain period of time, so a keep-alive message must be sent to maintain the connection alive if no command have been sent for a given amount of time. This amount of time is generally two minutes.The possible message are values are:

Message Typelength prefix (4 byte)message ID (1 byte)payload (message dependent)Note
choke10N/A 
unchoke11N/A 
intrested12N/A 
not interested13N/A 
have54(piece index) 
bitfield1 + B5(bitfield)B is the length of the bit field
request136(index)(begin)(length) 
piece9 + L7(index)(begin)(block)L is the length of the block
cancel138(index)(begin)(length)
  • index: integer specifying the zero-based piece index 
  • begin: integer specifying the zero-based byte offset within the piece 
  • length: integer specifying the requested length. 
  • block: block of data, which is a subset of the piece specified by index. 

choke, unchoke, interested, not interested:

These messages are fixed length and have no payload.

have:

The have message is fixed length. The payload is the zero-based index of a piece that has just been successfully downloaded and verified via the SHA1 hash.

bitfield: 

The bitfield message may only be sent immediately after the handshaking sequence is completed, and before any other messages are sent. The bitfield needs to be sent if a client has pieces to share.

The bitfield message is variable length, where B is the length of the bitfield. The payload is a bitfield representing the pieces that have been successfully downloaded. The high bit in the first byte corresponds to piece index 0. Bits that are cleared indicated a missing piece, and set bits indicate a valid and available piece. Spare bits at the end are set to zero. A bitfield of the wrong length is considered an error. Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set.

request:

The request message is fixed length, and is used to request a block. The payload concatenates index, begin, and length. The length of the payload is 12 byte (3 integers)

piece:

The piece message is variable length, where L is the length of the block. The payload concatenates index, begin, and block. The length of the payload is 12 byte (3 integers)

cancel:

The cancel message is fixed length, and is used to cancel block requests. The payload is identical to that of the “request” message. It is typically used during “End Game”. For URTorrent, you can ignore this message. 

Algorithms

Piece downloading strategy 

Download pieces in rarest first order. The client can determine this by keeping the initial bitfield from each peer, and updating it with every have message. Then, the client can download the pieces that appear least frequently in these peer bitfields. Note that any Rarest First strategy should include randomization among at least several of the least common pieces, as having many clients all attempting to jump on the same “least common” piece would be counter productive. 

Choking and Optimistic Unchoking

The choke algorithm was introduced to guarantee a reasonable level of upload and download reciprocation. As a consequence, free riders, i.e., peers that never upload, should be penalized. For the sake of clarity, we describe without loss of generality the choke algorithm from the point of view of the local peer. In this section, interested always means interested in the local peer, and choked always means choked by the local peer.
The choke algorithm differs in leecher and seed states. We describe first the choke algorithm in leecher state. At most 4 remote peers can be unchoked and interested at the same time. Peers are unchoked using the following policy.

  1. Every 10 seconds, the interested remote peers are ordered according to their download rate to the local peer and the 3 fastest peers are unchoked.
  2. Every 30 seconds, one additional interested remote peer is unchoked at random. We call this random unchoke the optimistic unchoke.

In the following, we call the three peers unchoked in step 1 the regular unchoked (RU) peers, and the peer unchoked in step 2 the optimistic unchoked (OU) peer. The optimistic unchoke peer selection has two purposes. It allows to evaluate the download capacity of new peers in the peer set, and it allows to bootstrap new peers that do not have any piece to share by giving them their first piece.

In seed state, the choke algorithm is the same as that in leecher state except that in seed state, the ordering performed in step 1 is based on upload rates from the local peer.

Your implementation

Your program needs to handle only a single torrent file. Your urtorrent is supposed to implement all of the above requirements, however. The sequence of tasks looks something like this:

  • Download and compile the opentracker program. Run it.
  • Create the .torrent file for UR.mp3 (or any reasonably sized file). We will test your program with files up to 40MB in size, but not more. For conserving memory, you should store information on the disk rather than storing in RAM. 
  • Create several directories, perhaps name them UR1, UR2, UR3, UR4, and so on. Then put UR.mp3 in one of them and put a copy of urtorrent in each of them. Thus, the copy of urtorrent running inside directory 1 will be the original seeder. Other copies of urtorrent may write temporary files inside their own local directory. Clean up the temp files after the downloading is done, or if the user types quit
  • urtorrent takes two parameters: the first argument is the port number that other peers can connect to, and the second argument is the  .torrent file name. For example,urtorrent UR.mp3.torrent 6881
  • Note that the tracker information is contained in the metainfo file. Upon starting, a client checks the local directory to see if the file is there already. (Check the file name and file size.) If the file is there, the client is running in the seeder state. Otherwise, it is running in the leecher state. The client then issues a GET request to the tracker right away. If the tracker is not running, the client reports an error and quits. 
  • At this point, if the seeder was already started, or if other peers have downloaded some pieces, the current client can start requesting blocks and start downloading. It should fully implement the rarest first and choke algorithms. If you run only 5 peers in total, choke algorithms should not take effect, and thus the peers download blocks/pieces from each other until they are completed. The rarest first algorithm should still be in place, though. I personally would recommend writing a function which tells you which piece(s) to request, given the current list of pieces that the peers have. Initially, the function would just return any missing piece (i.e. without the rarest first algorithm). Once this works, you can change the function to return a random rarest piece.

Your program should understand the following commands. Users can interact with the program by means of these commands:

metainfo

This command asks the client to show the information about the metainfo file. The output format looks something like this:

urtorrent> metainfo
	IP/port    : 127.0.1.1/9999
	ID         : bcd914c766d969a772823815fdc2737b2c8384bf
	metainfo file : Dan Hill & Rique Franks - Sometimes When We Touch.mp3.torrent
	info hash     : 4a060f199e5dc28ff2c3294f34561e2da423bf0b
	file name     : Dan Hill & Rique Franks - Sometimes When We Touch.mp3
	piece length  : 262144
	file size     : 1007616 (3 * [piece length] + 221184)
	announce URL  : http://192.168.0.1:9999/announce
	pieces' hashes:
	0  064b493d90b6811f22e0457aa7f50e9c70b84285
	1  d17cb90e50ca06a651a84f88fde0ecfb22a90cca
	2  20e82d045341032645ebe27eed38103329281175
	3  568c8a0599a7c1e2b3c70d8b8c960134653d497a

	urtorrent>     
			

Everything is in the metainfo file or available locally (IP/port), except for the info hash value which you have to compute. (This has to be computed anyhow in order to send tracker requests.) The value of info hash is the SHA1 hash of the value of the info key in the metainfo dictionary.

Your program should handle errors graciously. If the input .torrent file does not exist or if there were some errors in reading the file, for example, your program should not crash. You can assume that the metainfo file (that we will test) is not greater than 8KB = 8192 bytes. If the given metainfo file is larger than that, you can quit the program and complain (to the screen). In the wild, metainfo files are often kept small, not more than 50KB.

announce

This command asks the client to send a GET request to the tracker. When the response comes back, update the information to be used later. It also prints out the status line of the response (the first line which looks something like HTTP1.0 200 OK). The output may look like this:

urtorrent> announce
	Tracker responded: HTTP/1.0 200 OK
	complete | downloaded | incomplete | interval | min interval |
	---------------------------------------------------------------
	1        | 0          | 0          | 1686     | 843          |
	---------------------------------------------------------------
	Peer List :
	IP               | Port 
	-------------------------------
	127.0.0.1        | 9999
	... Tracker closed connection
			

(I suggest you store the dictionary in some be_node. Remember to be_free the node the next time an announce is called. The be_node is used in processing the command trackerinfo.)

trackerinfo

Prints the information contained in the latest tracker response. This is a subroutine which should have been called when announce is typed. So, the output is just part of the (last sucessful) announce output, without the status line:

urtorrent>trackerinfo
	complete | downloaded | incomplete | interval | min interval |
	---------------------------------------------------------------
	1        | 0          | 0          | 1686     | 843          |
	---------------------------------------------------------------
	Peer List (self included):
		IP               | Port
		-------------------------------
		127.0.0.1        | 9999
		127.0.0.1        | 9998
		127.0.0.1        | 9997
			

show

Prints the status of all current peer connections, including the choked/interested states and upload/download rates. The format looks something like this:

urtorrent> show
		ID | IP address | Status | Bitfield         | Down/s   | Up/s     | 
		--------------------------------------------------------------------
		0 | 127.0.0.1  | 0101   | 1111111111111111 | 0        |  59271    
		1 | 127.0.0.1  | 0101   | 1111111111111111 | 0        |  59271    
			

Down/s is the number of bytes downloaded per second on that connection. The status field has 4 bits, corresponding to the status of that connection: iam_choking (peer),  iam_interested (in peer), peer_choking (me),peer_interested (in me).

status

Prints the status of the current  download: which pieces have been downloaded, which pieces are missing, the number of downloaded, uploaded, and remaining bytes. The format is something like this:

 urtorrent> status
		Downloaded | Uploaded | Left | My bit field 
		------------------------------------------------
		1007616    | 0        | 0    | 1111111111111111
			

or

urtorrent> status
		Downloaded | Uploaded | Left    | My bit field 
		---------------------------------------------------
		0          | 0        | 1007616 | 0000000000000000
			

Submission

You need to submit a tar file containing your source files, a makefile, and a README file.

Grading Guideline

Grading guidelines will evolve as the project progresses. We are hoping this will be an interactive process. Please also keep an eye out for project updates. 

Acknowledgements

This assignment has been adapted from Dr. Sandhya Dwarkadas adaptation of Dr. Tamal Biswas adaption from a project developed by Professor Hung Ngo at University at Buffalo.

%d bloggers like this: