forked from netvirt/udt4
-
Notifications
You must be signed in to change notification settings - Fork 0
/
draft-gg-udt-xx.txt
1062 lines (737 loc) · 42 KB
/
draft-gg-udt-xx.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
Internet Engineering Task Force Yunhong Gu
Internet Draft University of Illinois at Chicago
Intended status: Informational April 12, 2010
Expires: October 12, 2010
UDT: UDP-based Data Transfer Protocol
draft-gg-udt-03.txt
Status of this Memo
This Internet-Draft is submitted to IETF in full conformance with the
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet-
Drafts.
Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt.
The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html.
This Internet-Draft will expire on October 15, 2010.
Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect
to this document.
Abstract
This document describes UDT, or the UDP based Data Transfer protocol.
UDT is designed to be an alternative data transfer protocol for the
situations when TCP does not work well. One of the most common cases,
and also the original motivation of UDT, is to overcome TCP's
Yunhong Gu Expires - October 12, 2010 [Page 1]
UDT April 12, 2010
inefficiency in high bandwidth-delay product (BDP) networks. Another
important target use scenario is to allow networking researchers,
students, and application developers to easily implement and deploy
new data transfer algorithms and protocols. Furthermore, UDT can also
be used to better support firewall traversing.
UDT is completely built on top of UDP. However, UDT is connection
oriented, unicast, and duplex. It supports both reliable data
streaming and partial reliable messaging. The congestion control
module is an open framework that can be used to implement and/or
deploy different control algorithms. UDT also has a native/default
control algorithm based on AIMD rate control.
Yunhong Gu Expires - October 12, 2010 [Page 2]
UDT April 12, 2010
Table of Contents
1. Introduction...................................................4
2. Packet Structures..............................................5
3. UDP Multiplexer................................................8
4. Timers.........................................................8
5. Connection Setup and shutdown..................................9
5.1 Client/Server Connection Setup............................10
5.2 Rendezvous Connection Setup...............................10
5.3 Shutdown..................................................11
6. Data Sending and Receiving....................................11
6.1 The Sender's Algorithm....................................11
6.2 The Receiver's Algorithm..................................12
6.3 Flow Control..............................................15
6.4 Loss Information Compression Scheme.......................15
7. Configurable Congestion Control (CCC).........................15
7.1 CCC Interface.............................................15
7.2 UDT's Native Control Algorithm............................16
Security Considerations..........................................18
Normative References.............................................18
Informative References...........................................18
Author's Addresses...............................................19
Yunhong Gu Expires - October 12, 2010 [Page 3]
UDT April 12, 2010
1. Introduction
The Transmission Control Protocol (TCP) [RFC5681] has been very
successful and greatly contributes to the popularity of today's
Internet. Today TCP still contributes the majority of the traffic on
the Internet.
However, TCP is not perfect and it is not designed for every specific
applications. In the last several years, with the rapid advance of
optical networks and rich Internet applications, TCP has been found
inefficient as the network bandwidth-delay product (BDP) increases.
Its AIMD (additive increase multiplicative decrease) algorithm
reduces the TCP congestion window drastically but fails to recover it
to the available bandwidth quickly. Theoretical flow level analysis
has shown that TCP becomes more vulnerable to packet loss as the BDP
increases higher [LM97].
To overcome the TCP's inefficiency problem over high speed wide area
networks is the original motivation of UDT. Although there are new
TCP variants deployed today (for example, BiC TCP [XHR04] on Linux
and Compound TCP [TS06] on Windows), certain problems still exist.
For example, none of the new TCP variants address RTT unfairness, the
situation that connections with shorter RTT consume more bandwidth.
Moreover, as the Internet continues to evolve, new challenges and
requirements to the transport protocol will always emerge.
Researchers need a platform to rapidly develop and test new
algorithms and protocols. Network researchers and students can use
UDT to easily implement their ideas on transport protocols, in
particular congestion control algorithms, and conduct experiments
over real networks.
Finally, there are other situations when UDT can be found more
helpful than TCP. For example, UDP-based protocol is usually easier
for punching NAT firewalls. For another example, TCP's congestion
control and reliability control is not desirable in certain
applications of VOIP, wireless communication, etc. Application
developers can use (with or without modification) UDT to suit their
requirements.
Due to all those reasons and motivations described above, we believe
that it is necessary to design a well defined and developed UDP-based
data transfer protocol.
As its name suggest, UDT is built solely on the top of UDP [RFC768].
Both data and control packets are transferred using UDP. UDT is
connection-oriented in order to easily maintain congestion control,
reliability, and security. It is a unicast protocol while multicast
is not considered here. Finally, data can be transferred over UDT in
Yunhong Gu Expires - October 12, 2010 [Page 4]
UDT April 12, 2010
duplex.
UDT supports both reliable data streaming and partial reliable
messaging. The data streaming semantics is similar to that of TCP,
while the messaging semantics can be regarded as a subset of SCTP
[RFC4960].
This document defines UDT's protocol specification. The detailed
description and performance analysis can be found in [GG07], and a
fully functional reference implementation can be found at [UDT].
2. Packet Structures
UDT has two kinds of packets: the data packets and the control
packets. They are distinguished by the 1st bit (flag bit) of the
packet header.
The data packet header structure is as following.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0| Packet Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|FF |O| Message Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The data packet header starts with 0. Packet sequence number uses the
following 31 bits after the flag bit. UDT uses packet based
sequencing, i.e., the sequence number is increased by 1 for each sent
data packet in the order of packet sending. Sequence number is
wrapped after it is increased to the maximum number (2^31 - 1).
The next 32-bit field in the header is for the messaging. The first
two bits "FF" flags the position of the packet is a message. "10" is
the first packet, "01" is the last one, "11" is the only packet, and
"00" is any packets in the middle. The third bit "O" means if the
message should be delivered in order (1) or not (0). A message to be
delivered in order requires that all previous messages must be either
delivered or dropped. The rest 29 bits is the message number, similar
to packet sequence number (but independent). A UDT message may
contain multiple UDT packets.
Following are the 32-bit time stamp when the packet is sent and the
destination socket ID. The time stamp is a relative value starting
Yunhong Gu Expires - October 12, 2010 [Page 5]
UDT April 12, 2010
from the time when the connection is set up. The time stamp
information is not required by UDT or its native control algorithm.
It is included only in case that a user defined control algorithm may
require the information (See Section 6).
The Destination ID is used for UDP multiplexer. Multiple UDT socket
can be bound on the same UDP port and this UDT socket ID is used to
differentiate the UDT connections.
If the flag bit of a UDT packet is 1, then it is a control packet and
parsed according to the following structure.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| Type | Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| | Additional Info |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Stamp |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Socket ID |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
~ Control Information Field ~
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
There are 8 types of control packets in UDT and the type information
is put in bit field 1 - 15 of the header. The contents of the
following fields depend on the packet type. The first 128 bits must
exist in the packet header, whereas there may be an empty control
information field, depending on the packet type.
Particularly, UDT uses sub-sequencing for ACK packet. Each ACK packet
is assigned a unique increasing 16-bit sequence number, which is
independent of the data packet sequence number. The ACK sequence
number uses bits 32 - 63 ("Additional Info") in the control packet
header. The ACK sequence number ranges from 0 to (2^31 - 1).
TYPE 0x0: Protocol Connection Handshake
Additional Info: Undefined
Control Info:
1) 32 bits: UDT version
2) 32 bits: Socket Type (STREAM or DGRAM)
3) 32 bits: initial packet sequence number
4) 32 bits: maximum packet size (including UDP/IP headers)
5) 32 bits: maximum flow window size
6) 32 bits: connection type (regular or rendezvous)
Yunhong Gu Expires - October 12, 2010 [Page 6]
UDT April 12, 2010
7) 32 bits: socket ID
8) 32 bits: SYN cookie
9) 128 bits: the IP address of the peer's UDP socket
TYPE 0x1: Keep-alive
Additional Info: Undefined
Control Info: None
TYPE 0x2: Acknowledgement (ACK)
Additional Info: ACK sequence number
Control Info:
1) 32 bits: The packet sequence number to which all the
previous packets have been received (excluding)
[The following fields are optional]
2) 32 bits: RTT (in microseconds)
3) 32 bits: RTT variance
4) 32 bits: Available buffer size (in bytes)
5) 32 bits: Packets receiving rate (in number of packets
per second)
6) 32 bits: Estimated link capacity (in number of packets
per second)
TYPE 0x3: Negative Acknowledgement (NAK)
Additional Info: Undefined
Control Info:
1) 32 bits integer array of compressed loss information
(see section 3.9).
TYPE 0x4: Unused
TYPE 0x5: Shutdown
Additional Info: Undefined
Control Info: None
TYPE 0x6: Acknowledgement of Acknowledgement (ACK2)
Additional Info: ACK sequence number
Control Info: None
TYPE 0x7: Message Drop Request:
Additional Info: Message ID
Control Info:
1) 32 bits: First sequence number in the message
2) 32 bits: Last sequence number in the message
TYPE 0x7FFF: Explained by bits 16 - 31, reserved for user defined
Control Packet
Finally, Time Stamp and Destination Socket ID also exist in the
control packets.
Yunhong Gu Expires - October 12, 2010 [Page 7]
UDT April 12, 2010
3. UDP Multiplexer
A UDP multiplexer is used to handle concurrent UDT connections
sharing the same UDP port. The multiplexer dispatch incoming UDT
packets to the corresponding UDT sockets according to the destination
socket ID in the packet header.
One multiplexer is used for all UDT connections bound to the same UDP
port. That is, UDT sockets on different UDP port will be handled by
different multiplexers.
A multiplexer maintains two queues. The sending queue includes the
sockets with at least one packet scheduled for sending. The UDT
sockets in the sending queue are ordered by the next packet sending
time. A high performance timer is maintained by the sending queue and
when it is time for the first socket in the queue to send its packet,
the packet will be sent and the socket will be removed. If there are
more packets for that socket to be sent, the socket will be re-
inserted to the queue.
The receiving queue reads incoming packets and dispatches them to the
corresponding sockets. If the destination ID is 0, the packet will be
sent to the listening socket (if there is any), or to a socket that
is in rendezvous connection phase. (See Section 5.)
Similar to the sending queue, the receiving queue also maintains a
list of sockets waiting for incoming packets. The receiving queue
scans the list to check if any timer expires for each socket every
SYN (SYN = 0.01 second, defined in Section 4).
4. Timers
UDT uses four timers to trigger different periodical events. Each
event has its own period and they are all independent. They use the
system time as origins and should process wrapping if the system time
wraps.
For a certain periodical event E in UDT, suppose the time variable is
ET and its period is p. If E is set or reset at system time t0 (ET =
t0), then at any time t1, (t1 - ET >= p) is the condition to check if
E should be triggered.
The four timers are ACK, NAK, EXP and SND. SND is used in the sender
only for rate-based packet sending (see Section 6.1), whereas the
other three are used in the receiver only.
ACK is used to trigger an acknowledgement (ACK). Its period is set by
the congestion control module. However, UDT will send an ACK no
Yunhong Gu Expires - October 12, 2010 [Page 8]
UDT April 12, 2010
longer than every 0.01 second, even though the congestion control
does not need timer-based ACK. Here, 0.01 second is defined as the
SYN time, or synchronization time, and it affects many of the other
timers used in UDT.
NAK is used to trigger a negative acknowledgement (NAK). Its period
is dynamically updated to 4 * RTT_+ RTTVar + SYN, where RTTVar is the
variance of RTT samples.
EXP is used to trigger data packets retransmission and maintain
connection status. Its period is dynamically updated to N * (4 * RTT
+ RTTVar + SYN), where N is the number of continuous timeouts. To
avoid unnecessary timeout, a minimum threshold (e.g., 0.5 second)
should be used in the implementation.
The recommended granularity of their periods is microseconds.
However, accurate time keeping is not necessary, except for SND.
In the rest of this document, a name of a time variable will be used
to represent the associated event, the variable itself, or the value
of its period, depending on the context. For example, ACK can mean
either the ACK event or the value of ACK period.
5. Connection Setup and shutdown
UDT supports two different connection setup methods, the traditional
client/server mode and the rendezvous mode. In the latter mode, both
UDT sockets connect to each other at (approximately) the same time.
The UDT client (in rendezvous mode, both peer are clients) sends a
handshake request (type 0 control packet) to the server or the peer
side. The handshake packet has the following information (suppose UDT
socket A sends this handshake to B):
1) UDT version: this value is for compatibility purpose. The
current version is 4.
2) Socket Type: STREAM (0) or DGRAM (1).
3) Initial Sequence Number: It is the sequence number for the first
data packet that A will send out. This should be a random value.
4) Packet Size: the maximum size of a data packet (including all
headers). This is usually the value of MTU.
5) Maximum Flow Window Size: This value may not be necessary;
however, it is needed in the current reference implementation.
6) Connection Type. This information is used to differential the
connection setup modes and request/response.
7) Socket ID. The client UDT socket ID.
8) Cookie. This is a cookie value used to avoid SYN flooding attack
[RFC4987].
9) Peer IP address: B's IP address.
Yunhong Gu Expires - October 12, 2010 [Page 9]
UDT April 12, 2010
5.1 Client/Server Connection Setup
One UDT entity starts first as the server (listener). The server
accepts and processes incoming connection request, and creates new
UDT socket for each new connection.
A client that wants to connect to the server will send a handshake
packet first. The client should keep on sending the handshake packet
every constant interval until it receives a response handshake from
the server or a timeout timer expires.
When the server first receives the connection request from a client,
it generates a cookie value according to the client address and a
secret key and sends it back to the client. The client must then send
back the same cookie to the server.
The server, when receiving a handshake packet and the correct cookie,
compares the packet size and maximum window size with its own values
and set its own values as the smaller ones. The result values are
also sent back to the client by a response handshake packet, together
with the server's version and initial sequence number. The server is
ready for sending/receiving data right after this step is finished.
However, it must send back response packet as long as it receives any
further handshakes from the same client.
The client can start sending/receiving data once it gets a response
handshake packet from the server. Further response handshake
messages, if received any, should be omitted.
The connection type from the client should be set to 1 and the
response from the server should be set to -1.
The client should also check if the response is from the server that
the original request was sent to.
5.2 Rendezvous Connection Setup
In this mode, both clients send a connect request to each other at
the same time. The initial connection type is set to 0. Once a peer
receives a connection request, it sends back a response. If the
connection type is 0, then the response sends back -1; if the
connection type is -1, then the response sends back -2; No response
will be sent for -2 request.
The rendezvous peer does the same check on the handshake messages
(version, packet size, window size, etc.) as described in Section
5.1. In addition, the peer only process the connection request from
the address it has sent a connection request to. Finally, rendezvous
connection should be rejected by a regular UDT server (listener).
Yunhong Gu Expires - October 12, 2010 [Page 10]
UDT April 12, 2010
A peer initializes the connection when it receives -1 response.
The rendezvous connection setup is useful when both peers are behind
firewalls. It can also provide better security and usability when a
listening server is not desirable.
5.3 Shutdown
If one of the connected UDT entities is being closed, it will send a
shutdown message to the peer side. The peer side, after received this
message, will also be closed. This shutdown message, delivered using
UDP, is only sent once and not guaranteed to be received. If the
message is not received, the peer side will be closed after 16
continuous EXP timeout (see section 3.5). However, the total timeout
value should be between a minimum threshold and a maximum threshold.
In our reference implementation, we use 3 seconds and 30 seconds,
respectively.
6. Data Sending and Receiving
Each UDT entity has two logical parts: the sender and the receiver.
The sender sends (and retransmits) application data according to the
flow control and congestion control. The receiver receives both data
packets and control packets, and sends out control packets according
to the received packets and the timers.
The receiver is responsible for triggering and processing all control
events, including congestion control and reliability control, and
their related mechanisms.
UDT always tries to pack application data into fixed size packets
(the maximum packet size negotiated during connection setup), unless
there is not enough data to be sent.
We explained the rationale of some of the UDT data sending/receiving
schemes in [GHG04b].
6.1 The Sender's Algorithm
Data Structures and Variables:
1) Sender's Loss List: The sender's loss list is used to store the
sequence numbers of the lost packets fed back by the receiver
through NAK packets or inserted in a timeout event. The numbers
are stored in increasing order.
Data Sending Algorithm:
1) If the sender's loss list is not empty, retransmit the first
Yunhong Gu Expires - October 12, 2010 [Page 11]
UDT April 12, 2010
packet in the list and remove it from the list. Go to 5).
2) In messaging mode, if the packets has been the loss list for a
time more than the application specified TTL (time-to-live), send
a message drop request and remove all related packets from the
loss list. Go to 1).
3) Wait until there is application data to be sent.
4)
a. If the number of unacknowledged packets exceeds the
flow/congestion window size, wait until an ACK comes. Go to
1).
b. Pack a new data packet and send it out.
5) If the sequence number of the current packet is 16n, where n is an
integer, go to 2).
6) Wait (SND - t) time, where SND is the inter-packet interval
updated by congestion control and t is the total time used by step
1 to step 5. Go to 1).
6.2 The Receiver's Algorithm
Data Structures and Variables:
1) Receiver's Loss List: It is a list of tuples whose values include:
the sequence numbers of detected lost data packets, the latest
feedback time of each tuple, and a parameter k that is the number
of times each one has been fed back in NAK. Values are stored in
the increasing order of packet sequence numbers.
2) ACK History Window: A circular array of each sent ACK and the time
it is sent out. The most recent value will overwrite the oldest
one if no more free space in the array.
3) PKT History Window: A circular array that records the arrival time
of each data packet.
4) Packet Pair Window: A circular array that records the time
interval between each probing packet pair.
5) LRSN: A variable to record the largest received data packet
sequence number. LRSN is initialized to the initial sequence
number minus 1.
6) ExpCount: A variable to record number of continuous EXP time-out
events.
Data Receiving Algorithm:
1) Query the system time to check if ACK, NAK, or EXP timer has
expired. If there is any, process the event (as described below
in this section) and reset the associated time variables. For
ACK, also check the ACK packet interval.
2) Start time bounded UDP receiving. If no packet arrives, go to 1).
1) Reset the ExpCount to 1. If there is no unacknowledged data
packet, or if this is an ACK or NAK control packet, reset the EXP
timer.
3) Check the flag bit of the packet header. If it is a control
Yunhong Gu Expires - October 12, 2010 [Page 12]
UDT April 12, 2010
packet, process it according to its type and go to 1).
4) If the sequence number of the current data packet is 16n + 1,
where n is an integer, record the time interval between this
packet and the last data packet in the Packet Pair Window.
5) Record the packet arrival time in PKT History Window.
6)
a. If the sequence number of the current data packet is greater
than LRSN + 1, put all the sequence numbers between (but
excluding) these two values into the receiver's loss list and
send them to the sender in an NAK packet.
b. If the sequence number is less than LRSN, remove it from the
receiver's loss list.
7) Update LRSN. Go to 1).
ACK Event Processing:
1) Find the sequence number prior to which all the packets have been
received by the receiver (ACK number) according to the following
rule: if the receiver's loss list is empty, the ACK number is LRSN
+ 1; otherwise it is the smallest sequence number in the
receiver's loss list.
2) If (a) the ACK number equals to the largest ACK number ever
acknowledged by ACK2, or (b) it is equal to the ACK number in the
last ACK and the time interval between this two ACK packets is
less than 2 RTTs, stop (do not send this ACK).
3) Assign this ACK a unique increasing ACK sequence number. Pack the
ACK packet with RTT, RTT Variance, and flow window size (available
receiver buffer size). If this ACK is not triggered by ACK timers,
send out this ACK and stop.
4) Calculate the packet arrival speed according to the following
algorithm:
Calculate the median value of the last 16 packet arrival
intervals (AI) using the values stored in PKT History Window.
In these 16 values, remove those either greater than AI*8 or
less than AI/8. If more than 8 values are left, calculate the
average of the left values AI', and the packet arrival speed is
1/AI' (number of packets per second). Otherwise, return 0.
5) Calculate the estimated link capacity according to the following
algorithm:
Calculate the median value of the last 16 packet pair
intervals (PI) using the values in Packet Pair Window, and the
link capacity is 1/PI (number of packets per second).
6) Pack the packet arrival speed and estimated link capacity into the
ACK packet and send it out.
7) Record the ACK sequence number, ACK number and the departure time
of this ACK in the ACK History Window.
NAK Event Processing:
Search the receiver's loss list, find out all those sequence numbers
whose last feedback time is k*RTT before, where k is initialized as 2
Yunhong Gu Expires - October 12, 2010 [Page 13]
UDT April 12, 2010
and increased by 1 each time the number is fed back. Compress
(according to section 6.4) and send these numbers back to the sender
in an NAK packet.
EXP Event Processing:
1) Put all the unacknowledged packets into the sender's loss list.
2) If (ExpCount > 16) and at least 3 seconds has elapsed since that
last time when ExpCount is reset to 1, or, 3 minutes has elapsed,
close the UDT connection and exit.
3) If the sender's loss list is empty, send a keep-alive packet to
the peer side.
4) Increase ExpCount by 1.
On ACK packet received:
1) Update the largest acknowledged sequence number.
2) Send back an ACK2 with the same ACK sequence number in this ACK.
3) Update RTT and RTTVar.
4) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN.
5) Update flow window size.
6) If this is a Light ACK, stop.
7) Update packet arrival rate: A = (A * 7 + a) / 8, where a is the
value carried in the ACK.
8) Update estimated link capacity: B = (B * 7 + b) / 8, where b is
the value carried in the ACK.
9) Update sender's buffer (by releasing the buffer that has been
acknowledged).
10) Update sender's loss list (by removing all those that has been
acknowledged).
On NAK packet received:
1) Add all sequence numbers carried in the NAK into the sender's loss
list.
2) Update the SND period by rate control (see section 3.6).
3) Reset the EXP time variable.
On ACK2 packet received:
1) Locate the related ACK in the ACK History Window according to the
ACK sequence number in this ACK2.
2) Update the largest ACK number ever been acknowledged.
3) Calculate new rtt according to the ACK2 arrival time and the ACK
departure time, and update the RTT value as: RTT = (RTT * 7 +
rtt) / 8.
4) Update RTTVar by: RTTVar = (RTTVar * 3 + abs(RTT - rtt)) / 4.
5) Update both ACK and NAK period to 4 * RTT + RTTVar + SYN.
On message drop request received:
1) Tag all packets belong to the message in the receiver buffer so
that they will not be read.
2) Remove all corresponding packets in the receiver's loss list.
Yunhong Gu Expires - October 12, 2010 [Page 14]
UDT April 12, 2010
On Keep-alive packet received:
Do nothing.
On Handshake/Shutdown packet received:
See Section 5.
6.3 Flow Control
The flow control window size is 16 initially.
On ACK packet received:
The flow window size is updated to the receiver's available buffer
size.
6.4 Loss Information Compression Scheme
The loss information carried in an NAK packet is an array of 32-bit
integers. If an integer in the array is a normal sequence number (1st
bit is 0), it means that the packet with this sequence number is
lost; if the 1st bit is 1, it means all the packets starting from
(including) this number to (including) the next number in the array
(whose 1st bit must be 0) are lost.
For example, the following information carried in an NAK:
0x00000002, 0x80000006, 0x0000000B, 0x0000000E
means packets with sequence number 2, 6, 7, 8, 9, 10, 11, and 14 are
lost.
7. Configurable Congestion Control (CCC)
The congestion control in UDT is an open framework so that user-
defined control algorithm can be easily implemented and switched.
Particularly, the native control algorithm is also implemented by
this framework.
The user-defined algorithm may redefine several control routines to
read and adjust several UDT parameters. The routines will be called
when certain event occurs. For example, when an ACK is received, the
control algorithm may increase the congestion window size.
7.1 CCC Interface
UDT allow users to access two congestion control parameters: the
congestion window size and the inter-packet sending interval. Users
may adjust these two parameters to realize window-based control,
rate-based control, or a hybrid approach.
In addition, the following parameters should also be exposed.
Yunhong Gu Expires - October 12, 2010 [Page 15]
UDT April 12, 2010
1) RTT
2) Maximum Segment/Packet Size
3) Estimated Bandwidth
4) The latest packet sequence number that has been sent so far
5) Packet arriving rate at the receiver side
A UDT implementation may expose additional parameters as well. This
information can be used in user-defined congestion control algorithms
to adjust the packet sending rate.
The following control events can be redefined via CCC (e.g., by a
callback function).
1) init: when the UDT socket is connected.
2) close: when the UDT socket is closed.
3) onACK: when ACK is received.
4) onLOSS: when NACK is received.
5) onTimeout: when timeout occurs.
6) onPktSent: when a data packet is sent.
7) onPktRecv: when a data packet is received.
Users can also adjust the following parameters in the user-defined
control algorithms.
1) ACK interval: An ACK may be sent every fixed number of packets.
User may define this interval. If this value is -1, then it means
no ACK will be sent based on packet interval.
2) ACK Timer: An ACK will also be sent every fixed time interval.
This is mandatory in UDT. The maximum and default ACK time
interval is SYN.
3) RTO: UDT uses 4 * RTT + RTTVar to compute RTO. Users may redefine
this.
Detailed description and discussion of UDT/CCC can be found in
[GG05].
7.2 UDT's Native Control Algorithm
UDT has a native and default control algorithm, which will be used if
no user-defined algorithm is implemented and configured. The native
UDT algorithm should be implemented using CCC.
UDT's native algorithm is a hybrid congestion control algorithm,
hence it adjusts both the congestion window size and the inter-packet
interval. The native algorithm uses timer-based ACK and the ACK
interval is SYN.
The initial congestion window size is 16 packets and the initial
inter-packet interval is 0. The algorithm start with Slow Start phase
Yunhong Gu Expires - October 12, 2010 [Page 16]
UDT April 12, 2010
until the first ACK or NAK arrives.
On ACK packet received:
1) If the current status is in the slow start phase, set the
congestion window size to the product of packet arrival rate and
(RTT + SYN). Slow Start ends. Stop.
2) Set the congestion window size (CWND) to: CWND = A * (RTT + SYN) +
16.
3) The number of sent packets to be increased in the next SYN period
(inc) is calculated as:
if (B <= C)
inc = 1/PS;
else
inc = max(10^(ceil(log10((B-C)*PS*8))) * Beta/PS, 1/PS);
where B is the estimated link capacity and C is the current
sending speed. All are counted as packets per second. PS is the
fixed size of UDT packet counted in bytes. Beta is a constant
value of 0.0000015.
4) The SND period is updated as:
SND = (SND * SYN) / (SND * inc + SYN).
These four parameters are used in rate decrease, and their initial
values are in the parentheses: AvgNAKNum (1), NAKCount (1),
DecCount(1), LastDecSeq (initial sequence number - 1).
We define a congestion period as the period between two NAKs in which
the first biggest lost packet sequence number is greater than the
LastDecSeq, which is the biggest sequence number when last time the
packet sending rate is decreased.
AvgNAKNum is the average number of NAKs in a congestion period.
NAKCount is the current number of NAKs in the current period.
On NAK packet received:
1) If it is in slow start phase, set inter-packet interval to
1/recvrate. Slow start ends. Stop.
2) If this NAK starts a new congestion period, increase inter-packet
interval (snd) to snd = snd * 1.125; Update AvgNAKNum, reset
NAKCount to 1, and compute DecRandom to a random (average
distribution) number between 1 and AvgNAKNum. Update LastDecSeq.
Stop.
3) If DecCount <= 5, and NAKCount == DecCount * DecRandom:
a. Update SND period: SND = SND * 1.125;
b. Increase DecCount by 1;
c. Record the current largest sent sequence number (LastDecSeq).
The native UDT control algorithm is designed for bulk data transfer
over high BDP networks. [GHG04a]
Yunhong Gu Expires - October 12, 2010 [Page 17]
UDT April 12, 2010
Security Considerations
UDT's security mechanism is similar to that of TCP. Most of TCP's
approach to counter security attack should also be implemented in
UDT.
IANA Considerations
This document has no actions for IANA.
Normative References
[RFC768] J. Postel, User Datagram Protocol, Aug. 1980.
Informative References
[RFC4987] W. Eddy, TCP SYN Flooding Attacks and Common Mitigations.
[GG07] Yunhong Gu and Robert L. Grossman, UDT: UDP-based Data
Transfer for High-Speed Wide Area Networks, Computer Networks
(Elsevier). Volume 51, Issue 7. May 2007.
[GG05] Yunhong Gu and Robert L. Grossman, Supporting Configurable
Congestion Control in Data Transport Services, SC 2005, Nov 12 -
18, Seattle, WA, USA.
[GHG04b] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, Experiences
in Design and Implementation of a High Performance Transport
Protocol, SC 2004, Nov 6 - 12, Pittsburgh, PA, USA.
[GHG04a] Yunhong Gu, Xinwei Hong, and Robert L. Grossman, An Analysis
of AIMD Algorithms with Decreasing Increases, First Workshop on
Networks for Grid Applications (Gridnets 2004), Oct. 29, San Jose,
CA, USA.
[LM97] T. V. Lakshman and U. Madhow, The Performance of TCP/IP for
Networks with High Bandwidth-Delay Products and Random Loss,
IEEE/ACM Trans. on Networking, vol. 5 no 3, July 1997, pp. 336-
350.
[RFC5681] Allman, M., Paxson, V. and E. Blanton, TCP Congestion
Control, September 2009.
[RFC4960] R. Stewart, Ed. Stream Control Transmission Protocol.
September 2007.
[TS06] K. Tan, Jingmin Song, Qian Zhang, Murari Sridharan, A Compound
TCP Approach for High-speed and Long Distance Networks, in IEEE
Infocom, April 2006, Barcelona, Spain.
Yunhong Gu Expires - October 12, 2010 [Page 18]
UDT April 12, 2010
[UDT] UDT: UDP-based Data Transfer, URL http://udt.sf.net.
[XHR04] Lisong Xu, Khaled Harfoush, and Injong Rhee, Binary Increase
Congestion Control for Fast Long-Distance Networks, INFOCOM 2004.