Skip to content

Commit

Permalink
Add more detaled descriptions and correct sample configurations
Browse files Browse the repository at this point in the history
  • Loading branch information
bobzhuyb committed Oct 25, 2016
1 parent c414c6b commit 8f81493
Show file tree
Hide file tree
Showing 8 changed files with 140 additions and 72 deletions.
114 changes: 110 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,111 @@
# ns3-rdma
NS3 simulator for RDMA over Converged Ethernet v2 (RoCEv2), including the implementation of DCQCN, TIMELY, PFC, ECN and shared buffer switch
# NS-3 simulator for RDMA
This is an NS-3 simulator for RDMA over Converged Ethernet v2 (RoCEv2). It includes the implementation of DCQCN, TIMELY, PFC, ECN and Broadcom shared buffer switch.

# Note
TIMELY module has not been merged into this yet. We are working on merging it. We will also add descriptions for this project soon.
It is based on NS-3 version 3.17, and ported to Visual Studio environment, as explained [here](https://www.nsnam.org/wiki/Ns-3_on_Visual_Studio_2012).

## Note
TIMELY module has not been merged into this yet. We are working on merging it.

## Quick Start

### Build
To compile it out-of-the-box, you need Visual Studio.
People have successfully built it with *free* version,
which can be downloaded [here](https://www.microsoft.com/en-us/download/details.aspx?id=48146).
Open windows/ns-3-dev/ns-3-dev.sln, just build the whole solution.

You may try building it with the original Makefile, etc. We have done it a while back, but now you probably need to edit a few things to make it work.

### Run
The binary will be generated at windows/ns-3-dev/x64/Release/main.exe.
We include a sample configuration file at windows/ns-3-dev/x64/Release/mix/config.txt
Execute main.exe in windows/ns-3-dev/x64/Release/:
```
cd windows\ns-3-dev\x64\Release\
main.exe mix\config.txt
```

It runs a 2:1 incast at 40Gbps for 1 second. Please allow a few minutes for it to finish.
The trace will be generated at mix/mix.tr, as defined by mix/config.txt

There are quite a few options in mix/config.txt. We will gradually add documentation.
For your own convenience you can just check the code,
project "main" -- source files -- "third.cc", and see how these options are parsed.
You can also raise issues if you have any questions.

## What did we add exactly?

**point-to-point/model/qbb-net-device.cc** and all other qbb-* files:

DCQCN and PFC implementation.
It also includes go-back-to-N and go-back-to-0 that handle packet drop due to corruption.

In 2013, we got a very basic NS-3 PFC implementation somewhere, and developed based on it.
We cannot find the original repository anymore.

**network/model/broadcom-node.cc** and **.h**:

This implements a Broadcom ASIC switch model, which
is mostly doing all kinds of buffer threshold-related operations. These include deciding
whether PFC should be triggered, ECN should be marked, buffer is too full so packets should
be dropped, etc. It supports both static and dynamic thresholds for PFC.

*Disclaim: this module is purely based on authors' personal understanding of Broadcom ASIC. It does not reflect any official confirmation from either Microsoft or Broadcom.*

**network/utils/broadcom-egress-queue.cc** and **.h**:

This is the actual MMU buffering packets.
It also includes switch scheduler, i.e., when upper layer ask for a packet to send, it will
decide which queue to be dequeued. Strategies like strict priority and round robin are supported.

**applications/model/udp-echo-client.cc**:

We implement the RDMA client here, which aligns
with the fact that RoCEv2 includes UDP header. In particular, original UDP client has troubles
when PFC pause the link. Original UDP client keeps sending packets at line rate, soon
it builds up huge queue and memory runs out. Here we throttle the sending rate if it gets
pushed back by PFC.

**internet/model/seq-ts-header.cc** and **.h**:

We didn't implement the full InfiniBand
header. Instead, what we really need is just the sequence number (for detecting corruption
drops, and also help us understand the throughput) and timestamp (required by TIMELY.)
This is where we encode this information into packets.

**main/third.cc**:

The main() function.

There may be other edits here and there, especially the trace generation is scattered
among various network stacks. But above are the major ones.

## Q&A

**Q: Why do you port it to Windows?**

A: This is a Microsoft project. Visual Studio, including the free version, works well.

**Q: Fine. What if I want to run it on Linux, and do not want to spend time changing the build process?**

A: You can build it using Visual Studio and run the .exe using WINE. We have tested WINE 1.6.2 and it works well.

**Q: I don't understand ... (some part of the code or configuration)**

A: Raise issues on GitHub, so that your questions can also help others. If you really do
not want others know you are working on this, you can email [email protected]

**Q: What papers should I cite, if I also publish?**

A: Below are the ones you should definitely check. They are ranked from most relevant to
less. That said, all of them are quite relevant:

*ECN or Delay: Lessons Learnt from Analysis of DCQCN and TIMELY*, CoNEXT'16 (this project is released with this paper, we ask you to at least cite this paper if you use this code.)

*Congestion Control for Large-scale RDMA Deployments*, SIGCOMM'15 (DCQCN)

*TIMELY: RTT-based Congestion Control for the Datacenter*, SIGCOMM'15 (TIMELY)

*RDMA over Commodity Ethernet at Scale*, SIGCOMM'16 (discussed go-back-to-N)

*Deadlocks in Datacenter Networks: Why Do They Form, and How to Avoid Them*, HotNets'16 (PFC deadlock analysis, directly used this simulator.)
24 changes: 0 additions & 24 deletions windows/ns-3-dev/mix/flow_tcp.txt

This file was deleted.

29 changes: 0 additions & 29 deletions windows/ns-3-dev/mix/topology.txt

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,33 +1,39 @@
ENABLE_QCN 1
USE_DYNAMIC_PFC_THRESHOLD 1
PACKET_LEVEL_ECMP 0
FLOW_LEVEL_ECMP 0
FLOW_LEVEL_ECMP 1

PAUSE_TIME 5
PACKET_PAYLOAD_SIZE 1000

TOPOLOGY_FILE C:\ns-3-win2\windows\ns-3-dev\mix\topology.txt
FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow.txt
TCP_FLOW_FILE C:\ns-3-win2\windows\ns-3-dev\mix\flow_tcp.txt
TRACE_FILE C:\ns-3-win2\windows\ns-3-dev\mix\trace.txt
TRACE_OUTPUT_FILE Z:\mix.tr
TOPOLOGY_FILE mix/topology.txt
FLOW_FILE mix/flow.txt
TCP_FLOW_FILE mix/flow_tcp_0.txt
TRACE_FILE mix/trace.txt
TRACE_OUTPUT_FILE mix/mix.tr

SEND_IN_CHUNKS 0
APP_START_TIME 1.0
APP_STOP_TIME 10.0
SIMULATOR_STOP_TIME 2.05
SIMULATOR_STOP_TIME 3.01

CNP_INTERVAL 50
ALPHA_RESUME_INTERVAL 55
NP_SAMPLING_INTERVAL 0
CLAMP_TARGET_RATE 1
CLAMP_TARGET_RATE_AFTER_TIMER 0
RP_TIMER 60
BYTE_COUNTER 300000
BYTE_COUNTER 300000000
DCTCP_GAIN 0.00390625
KMAX 1000
KMIN 40
PMAX 1.0
KMAX 1000
KMIN 40
PMAX 1.0
FAST_RECOVERY_TIMES 5
RATE_AI 40Mb/s
RATE_HAI 200Mb/s

ERROR_RATE_PER_LINK 0.0000
L2_CHUNK_SIZE 4000
L2_WAIT_FOR_ACK 0
L2_ACK_INTERVAL 256
L2_BACK_TO_ZERO 0
Original file line number Diff line number Diff line change
@@ -1,9 +1,7 @@
4
2
2 1 3 10000000 2.0 9.5
3 1 3 10000000 2.0 9.5
4 1 3 10000000 2.0 9.5
5 1 3 10000000 2.0 9.5


First line: flow #
src dst pg packet#
src dst priority packet# start_time end_time
File renamed without changes.
11 changes: 11 additions & 0 deletions windows/ns-3-dev/x64/Release/mix/topology.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
4 1 3
0
0 1 40Gbps 0.001ms 0
0 2 40Gbps 0.001ms 0
0 3 40Gbps 0.001ms 0

First line: total node #, switch node #, link #
Second line: switch node IDs...
src0 dst0 rate delay error_rate
src1 dst1 rate delay error_rate
...
File renamed without changes.

0 comments on commit 8f81493

Please sign in to comment.