-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy patharchitecture.txt
98 lines (76 loc) · 7.12 KB
/
architecture.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
Architecture
~~~~~~~~~~~~
1. World creator
The simulator operates on map of nodes, ISP and countries so they represent the internet, with special attention given to network speed and latency. The "world" must be reproduceable, so that others might analyse problematic issues that happen in one world but not the others. Thus the world is created from 512 bit (64bytes) seed of pseudo-random generator.
The seed is used to create "random" set of bits that are used to decide various parameters of the world. For easier implementation and to save memory, all numbers in the world are represented as power of 2. Thus word "random" in this document is not truly random, but based on seeded set of bits.
Initial parameters of world are:
* Number of countries
* number of ISPs
* number of nodes
* nodes speed dispersion (16 numbers), sum of the numbers must be 100 (meaning 100%).
* number of executors
To simplify implementation, but preserve network links effects we seek, we implement the "world" in following way:
* Each node receive 32bit digit as "IP" address
* The "IP" is constructed using bit fields, where:
- first 8 bits is country number
- second 8 bits is ISP number
- next 4 bits specify node "link" type (see below)
- rest 12 bits specify node number
Link types define link parameters for the node: up/down speed and latency. Up/Down speed is specified in bytes/second. Latency is specified in microseconds. Here is preliminary list of link types (Down speed/Up speed, latency):
* 0 - 56k dial up: 7KB/4.5KB,140ms
* 1 - ISDN: 16KB/16KB,10ms
* 2 - 3G,4G: 300/300KB,120ms
* 3 - satellite modem: 75/64,700ms
* 4 - ADSL1: 200/8, 80ms
* 5 - ADSL2: 3000/120, 120ms
* 16 - ethernet or colo: unlimited/unlimited,1ms
The arrangement makes it very simply to calculate link max speed/latency between two IP numbers just by looking at two "IP" numbers. For now, we can disregard cross-country and cross ISP network speeds and hard-code inter-network latencies to these:
* between different countries: 200ms
* between different ISPs of the same country: 30ms
The simulator is working by coordinating many computers using "master" and "starters" roles.
TBD: what is chunk, function to create nodes in chunk
One computer is running master process. As it does not require much CPU, it is expected the same computer will run starter process as well. Master process performs the operations below:
* Await connections from starters, probably await predefined number of starters
* Once all starters are "registered", send to each starter list of starters and their tcp/ip details, assign each executor its number, world initiation parameters and awaits completion answer.
* Coordinate simulation of given microsecond (await all starters to report simulation completion before starting simulation of next ms)
Each computer (beside one that runs "master") runs the simulator in "starter" mode that performs the operations below:
* Connect to "master" and provide number of CPUs available
* receive list of other starters and executors TBD: receives list of chunks executors on this node are responsible for
* Connect to other starters
* receive instructions on world creation TBD: it is part of chunks distribution function
* Spawn executor for each CPU on the machine
* Maintain "routing" of messages between executors, no tracking is done here TBD: use chunks function to create the routing table
Executor - is the workhorse process that runs on single CPU:
* it "creates" nodes it is responsible for using Node API. TBD: using the chunking function
* get command from starter node on starting simulation, report completion
2. The Simulation
Simulation granularity is 1 millisecond. Once all executors reported readiness, the simulation might start. Simulated milliseconds are called Epoch, starting from 0. Master acts as synchronization point, issuing command to start sumulation for given epoch and awaiting completion signals from each starter (that receives completion signals from its executors).
Master uses simulation scenario file to issue "tasks" to executors. The scenario language allows operators like these:
* Start A nodes of link type B as bootstrapping nodes TBD: what is going to happen here?
* start A nodes of link type B with list of IPs of the bootstrap nodes
* Populate A nodes of link type B with C GB of content, give me hash of the content TBD: what is going to happen, especially regarding content hash
* Make A nodes of link type B to be interested in content with hash D
* TBD: Get indication on p2p network performance regarding previous commands
The "pseudo random" selection of nodes in scenario uses same random seed as during world creation, thus allowing replayability of world/scenarios and getting same results.
3. Node API
Nodes interact with tested code using library-like API:
* Intialize
* register with bootstrap nodes on given "IP" numbers.
* Create A GB of content. Returns hash of the content.
* Be interested in getting content A (content identifying hash).
* Are there any unsatisfied interests? (returns yes or no)
* Receive and process transmission from IP A: (data).
* Wanna send transmission? (returns list of transmissions to send)
Transmission is process of transferring data of specified length from one node to another. Transmission is delivered to target node once it is fully "sent" by sending node and fully "received" by target node. Sending and receival progress is tracked by sending and receiving executor respectively every epoch. Transmission is started by sending data in full and creating trasmission ID (TID). At each passing epoch, sending executor updates tracking data structure on progress of the transmission, while taking into account link capacity. It then sends the transmission progress report (TPR) of each transmission to target executor. For simplicity, back-throttling is not simulated at the time.
Similarily, each executor receives transmission data and saves it aside. It then updates transmission progress data structure on each incoming TPR. Once final TPR is received, the transmission data are passed to the node.
Executor handles tracking transmissions delivery timing. For each transmission, sending executor performs these actions:
* Send transmission to target executor complete transmission data with unique ID. Compress transmission data as it is likely to contain many zeroes.
* At each epoch do below:
- For each transmission "in flight" update transmission progress counters, send TPR to the target node
- Once transmission is complete, it is deleted on the source node, final TPR is sent to the target node
Receiving executor performs these actions:
* Receive transmission, store data aside (in compressed state), create transmission progress data structure
* At each epoch do below:
- For each incoming TPR, update transmission progress counters
- Once counters show that transmission have been delivered, pass the data to the node.
The tested code should be aware of running in simulator environment. To minimise data usage, on request to create X GB of content, it is advised to fill only first 256 bytes with pseudorandom data (to get unique hash) and fill rest of it with zeroes. Similarily, it should minimise amount of disk used for storing the data.