-
Notifications
You must be signed in to change notification settings - Fork 119
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add more detaled descriptions and correct sample configurations
- Loading branch information
Showing
8 changed files
with
140 additions
and
72 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,111 @@ | ||
# ns3-rdma | ||
NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch | ||
# NS-3 simulator for RDMA | ||
This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch. | ||
|
||
# Note | ||
TIMELY module has not been merged into this yet. We are working on merging it. We will also add descriptions for this project soon. | ||
It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained [here](https://www.nsnam.org/wiki/Ns-3_on_Visual_Studio_2012). | ||
|
||
## Note | ||
TIMELY module has not been merged into this yet. We are working on merging it. | ||
|
||
## Quick Start | ||
|
||
### Build | ||
To compile it out-of-the-box, you need Visual Studio. | ||
People have successfully built it with *free* version, | ||
which can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=48146). | ||
Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution. | ||
|
||
You may try building it with the original Makefile, etc. We have done it a while back, but now you probably need to edit a few things to make it work. | ||
|
||
### Run | ||
The binary will be generated at windows/ns-3-dev/x64/Release/main.exe. | ||
We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt | ||
Execute main.exe in windows/ns-3-dev/x64/Release/: | ||
``` | ||
cd windows\ns-3-dev\x64\Release\ | ||
main.exe mix\config.txt | ||
``` | ||
|
||
It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish. | ||
The trace will be generated at mix/mix.tr, as defined by mix/config.txt | ||
|
||
There are quite a few options in mix/config.txt. We will gradually add documentation. | ||
For your own convenience you can just check the code, | ||
project "main" -- source files -- "third.cc", and see how these options are parsed. | ||
You can also raise issues if you have any questions. | ||
|
||
## What did we add exactly? | ||
|
||
**point-to-point/model/qbb-net-device.cc** and all other qbb-* files: | ||
|
||
DCQCN and PFC implementation. | ||
It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption. | ||
|
||
In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it. | ||
We cannot find the original repository anymore. | ||
|
||
**network/model/broadcom-node.cc** and **.h**: | ||
|
||
This implements a Broadcom ASIC switch model, which | ||
is mostly doing all kinds of buffer threshold-related operations. These include deciding | ||
whether PFC should be triggered, ECN should be marked, buffer is too full so packets should | ||
be dropped, etc. It supports both static and dynamic thresholds for PFC. | ||
|
||
*Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.* | ||
|
||
**network/utils/broadcom-egress-queue.cc** and **.h**: | ||
|
||
This is the actual MMU buffering packets. | ||
It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will | ||
decide which queue to be dequeued. Strategies like strict priority and round robin are supported. | ||
|
||
**applications/model/udp-echo-client.cc**: | ||
|
||
We implement the RDMA client here, which aligns | ||
with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles | ||
when PFC pause the link. Original UDP client keeps sending packets at line rate, soon | ||
it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets | ||
pushed back by PFC. | ||
|
||
**internet/model/seq-ts-header.cc** and **.h**: | ||
|
||
We didn't implement the full InfiniBand | ||
header. Instead, what we really need is just the sequence number (for detecting corruption | ||
drops, and also help us understand the throughput) and timestamp (required by TIMELY.) | ||
This is where we encode this information into packets. | ||
|
||
**main/third.cc**: | ||
|
||
The main() function. | ||
|
||
There may be other edits here and there, especially the trace generation is scattered | ||
among various network stacks. But above are the major ones. | ||
|
||
## Q&A | ||
|
||
**Q: Why do you port it to Windows?** | ||
|
||
A: This is a Microsoft project. Visual Studio, including the free version, works well. | ||
|
||
**Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?** | ||
|
||
A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well. | ||
|
||
**Q: I don't understand ... (some part of the code or configuration)** | ||
|
||
A: Raise issues on GitHub, so that your questions can also help others. If you really do | ||
not want others know you are working on this, you can email [email protected] | ||
|
||
**Q: What papers should I cite, if I also publish?** | ||
|
||
A: Below are the ones you should definitely check. They are ranked from most relevant to | ||
less. That said, all of them are quite relevant: | ||
|
||
*ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY*, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.) | ||
|
||
*Congestion Control for Large-scale RDMA Deployments*, SIGCOMM'15 (DCQCN) | ||
|
||
*TIMELY: RTT-based Congestion Control for the Datacenter*, SIGCOMM'15 (TIMELY) | ||
|
||
*RDMA over Commodity Ethernet at Scale*, SIGCOMM'16 (discussed go-back-to-N) | ||
|
||
*Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them*, HotNets'16 (PFC deadlock analysis, directly used this simulator.) |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
28 changes: 17 additions & 11 deletions
28
windows/ns-3-dev/mix/config.txt → windows/ns-3-dev/x64/Release/mix/config.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,33 +1,39 @@ | ||
ENABLE_QCN 1 | ||
USE_DYNAMIC_PFC_THRESHOLD 1 | ||
PACKET_LEVEL_ECMP 0 | ||
FLOW_LEVEL_ECMP 0 | ||
FLOW_LEVEL_ECMP 1 | ||
|
||
PAUSE_TIME 5 | ||
PACKET_PAYLOAD_SIZE 1000 | ||
|
||
TOPOLOGY_FILE C:\ns-3-win2\windows\ns-3-dev\mix\topology.txt | ||
FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow.txt | ||
TCP_FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow_tcp.txt | ||
TRACE_FILE C:\ns-3-win2\windows\ns-3-dev\mix\trace.txt | ||
TRACE_OUTPUT_FILE Z:\mix.tr | ||
TOPOLOGY_FILE mix/topology.txt | ||
FLOW_FILE mix/flow.txt | ||
TCP_FLOW_FILE mix/flow_tcp_0.txt | ||
TRACE_FILE mix/trace.txt | ||
TRACE_OUTPUT_FILE mix/mix.tr | ||
|
||
SEND_IN_CHUNKS 0 | ||
APP_START_TIME 1.0 | ||
APP_STOP_TIME 10.0 | ||
SIMULATOR_STOP_TIME 2.05 | ||
SIMULATOR_STOP_TIME 3.01 | ||
|
||
CNP_INTERVAL 50 | ||
ALPHA_RESUME_INTERVAL 55 | ||
NP_SAMPLING_INTERVAL 0 | ||
CLAMP_TARGET_RATE 1 | ||
CLAMP_TARGET_RATE_AFTER_TIMER 0 | ||
RP_TIMER 60 | ||
BYTE_COUNTER 300000 | ||
BYTE_COUNTER 300000000 | ||
DCTCP_GAIN 0.00390625 | ||
KMAX 1000 | ||
KMIN 40 | ||
PMAX 1.0 | ||
KMAX 1000 | ||
KMIN 40 | ||
PMAX 1.0 | ||
FAST_RECOVERY_TIMES 5 | ||
RATE_AI 40Mb/s | ||
RATE_HAI 200Mb/s | ||
|
||
ERROR_RATE_PER_LINK 0.0000 | ||
L2_CHUNK_SIZE 4000 | ||
L2_WAIT_FOR_ACK 0 | ||
L2_ACK_INTERVAL 256 | ||
L2_BACK_TO_ZERO 0 |
6 changes: 2 additions & 4 deletions
6
windows/ns-3-dev/mix/flow.txt → windows/ns-3-dev/x64/Release/mix/flow.txt
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
4 | ||
2 | ||
2 1 3 10000000 2.0 9.5 | ||
3 1 3 10000000 2.0 9.5 | ||
4 1 3 10000000 2.0 9.5 | ||
5 1 3 10000000 2.0 9.5 | ||
|
||
|
||
First line: flow # | ||
src dst pg packet# | ||
src dst priority packet# start_time end_time |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
4 1 3 | ||
0 | ||
0 1 40Gbps 0.001ms 0 | ||
0 2 40Gbps 0.001ms 0 | ||
0 3 40Gbps 0.001ms 0 | ||
|
||
First line: total node #, switch node #, link # | ||
Second line: switch node IDs... | ||
src0 dst0 rate delay error_rate | ||
src1 dst1 rate delay error_rate | ||
... |
File renamed without changes.