tcp的vegas算法是基于delay的一个拥塞控制算法,所谓基于delay也就是说窗口的变化只和RTT的变化相关。而传统的基于丢包的算法是窗口的变化和丢包相关.

先来看原理,paper地址在这里(94年提出来的),基本上linux的实现就是按照paper来实现的,注意Vegas它是第一个基于delay的拥塞算法.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.9587&rep=rep1&type=pdf

它的主要思想是估计一段时间能够发送的数据量,然后和最终发送的数据量比较。如果预测要发送的数据没有被发送,那么就会被认为可能出现拥塞状况,如果这个状态持久,那么就减慢发送速度,并且这个算法不仅作用于拥塞避免状态,而且还作用于slow start状态。

不过vegas的缺点也是很明显.那就是他会被欺骗,也就是说本身这个正向的延迟就是比它期待的高,比如在tcp中,有可能正向反向做的不是相同的路径,那么当反向有拥塞的时候,就有问题了。也就是数据包ack返回给发送端的就是延迟的。此时就会导致Vegas降低拥塞窗口。这个问题就是基于延迟的拥塞算法的一个陷阱。不过在linux下vegas算法被打开,只有是正常状态才会被打开,而只要遇到异常(丢包/快重传..)就会使用经典的newreno算法.

并且如果连接都是Vegas算法,那么这些连接就是公平的,而如果有些是,有些不是,那么就不是公平的了,因此经典的tcp发送者是会尝试填满网络中的队列,而Vegas是尝试着保持队列为空。因此最终就会导致使用经典tcp拥塞算法的,发送的数据包越来越多,而Vegas的就会越来越慢。

那么如何来统计数据量呢,在linux下是这样做的,首先会有一个采样周期(不小于2个RTT),然后根据这段采样的最小RTT和全局的最小RTT进行比较,从而来得到对应的窗口是否应当增长或者减小以及增加/减小多少.

接下来来看代码,首先来看核心的结构体,注释很详细:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
  
struct vegas {

u32 beg_snd_nxt; /\* right edge during last RTT \*/

u32 beg_snd_una; /\* left edge during last RTT \*/

u32 beg_snd_cwnd; /\* saves the size of the cwnd \*/

u8 doing_vegas_now;/\* if true, do vegas for this RTT \*/

u16 cntRTT; /\* # of RTTs measured within last RTT \*/

u32 minRTT; /\* min of RTTs measured within last RTT (in usec) \*/

u32 baseRTT; /\* the min of all Vegas RTT measurements seen (in usec) \*/

};

然后来看对应的RTT采集函数.很简单的函数,主要就是更新baseRTT/minRTT/cntRTT这三个值,其中minRTT和cntRTT在其他的地方还有更新的(因为他们每个采样周期都会被重置),后续我们会看到,:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
  
void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, s32 rtt_us)

{

struct vegas *vegas = inet_csk_ca(sk);

u32 vrtt;

if (rtt_us < 0)

return;

/\* Never allow zero rtt or baseRTT \*/

vrtt = rtt_us + 1;

/\* Filter to find propagation delay: \*/

if (vrtt < vegas->baseRTT)

vegas->baseRTT = vrtt;

/* Find the min RTT during the last RTT to find

* the current prop. delay + queuing delay:

*/

vegas->minRTT = min(vegas->minRTT, vrtt);

vegas->cntRTT++;

}

然后注意一下doing_vegas_now这个域,这个域表示是否需要vegas算法,而这个域的设置在set_state回调中,通过这个域,可以说混合了基于延迟和丢包的算法(类似微软的CTCP):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
  
void tcp_vegas_state(struct sock *sk, u8 ca_state)

{

if (ca_state == TCP_CA_Open)

vegas_enable(sk);

else

vegas_disable(sk);

}

最后来看核心的算法,它算法是这样子的,由于我们需要采样某段时间(3个RTT)之间的数据发送速度,因此这里就只能在每次采样结束之后才会去使用vegas算法(否则使用newreno),并且只有good ack才有效(ack大于我们上次我们采样结束时保存的snd_nxt,否则还是运行new_reno算法).也就是通过这段时间的RTT变化,来推算下段时间的窗口变化.

然后来看根据采样值如何来计算的,这里通过计算这段采样中的发送速度,然后用这个速度来乘全局最小的RTT,这个值就是期待的拥塞窗口值,然后再用类似方法来计算对应的窗口变化差值(通过RTT的变化),这个差值就是算法的核心,通过这个diff来决定拥塞窗口如何变化。

来看代码如何实现。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
  
//拥塞避免时使用

static int alpha = 2;

static int beta = 4;

//slow start时使用

static int gamma = 1;

static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)

{

struct tcp_sock *tp = tcp_sk(sk);

struct vegas *vegas = inet_csk_ca(sk);

if (!vegas->doing_vegas_now) {

tcp_reno_cong_avoid(sk, ack, in_flight);

return;

}

//判断ack,也就是上次采样标记之后的数据全部ack之后,才会进入vegas算法.

if (after(ack, vegas->beg_snd_nxt)) {

/\* Do the Vegas once-per-RTT cwnd adjustment. \*/

/* Save the extent of the current window so we can use this

* at the end of the next RTT.

*/

vegas->beg_snd_nxt = tp->snd_nxt;

/* We do the Vegas calculations only if we got enough RTT

* samples that we can be reasonably sure that we got

* at least one RTT sample that wasn&#8217;t from a delayed ACK.

* If we only had 2 samples total,

* then that means we&#8217;re getting only 1 ACK per RTT, which

* means they&#8217;re almost certainly delayed ACKs.

* If we have 3 samples, we should be OK.

*/

if (vegas->cntRTT <= 2) {

/* We don&#8217;t have enough RTT samples to do the Vegas

* calculation, so we&#8217;ll behave like Reno.

*/

tcp_reno_cong_avoid(sk, ack, in_flight);

} else {

u32 rtt, diff;

u64 target_cwnd;

/* We have enough RTT samples, so, using the Vegas

* algorithm, we determine if we should increase or

* decrease cwnd, and by how much.

*/

/* Pluck out the RTT we are using for the Vegas

* calculations. This is the min RTT seen during the

* last RTT. Taking the min filters out the effects

* of delayed ACKs, at the cost of noticing congestion

* a bit later.

*/

rtt = vegas->minRTT;

/* Calculate the cwnd we should have, if we weren&#8217;t

* going too fast.

*

* This is:

\* (actual rate in segments) \* baseRTT

*/

//计算下次应该的窗口

target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;

/* Calculate the difference between the window we had,

* and the window we would like to have. This quantity

* is the "Diff" from the Arizona Vegas papers.

*/

//计算窗口的变化差

diff = tp->snd_cwnd * (rtt-vegas->baseRTT) / vegas->baseRTT;

//如果变化太大(适用于slow start),则减小窗口

if (diff > gamma && tp->snd_cwnd <= tp->snd_ssthresh) {

/* Going too fast. Time to slow down

* and switch to congestion avoidance.

*/

/* Set cwnd to match the actual rate

* exactly:

\* cwnd = (actual rate) \* baseRTT

* Then we add 1 because the integer

* truncation robs us of full link

* utilization.

*/

tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);

tp->snd_ssthresh = tcp_vegas_ssthresh(tp);

} else if (tp->snd_cwnd <= tp->snd_ssthresh) {

/\* Slow start. \*/

tcp_slow_start(tp);

} else {

/\* Congestion avoidance. \*/

/* Figure out where we would like cwnd

* to be.

*/

if (diff > beta) {

/* The old window was too fast, so

* we slow down.

*/

//发送太快,因此减速

tp->snd_cwnd&#8211;;

tp->snd_ssthresh

= tcp_vegas_ssthresh(tp);

} else if (diff < alpha) {

/* We don&#8217;t have enough extra packets

* in the network, so speed up.

*/

//网络中包太少,因此加速

tp->snd_cwnd++;

} else {

/* Sending just as fast as we

* should be.

*/

}

}

if (tp->snd_cwnd < 2)

tp->snd_cwnd = 2;

else if (tp->snd_cwnd > tp->snd_cwnd_clamp)

tp->snd_cwnd = tp->snd_cwnd_clamp;

tp->snd_ssthresh = tcp_current_ssthresh(sk);

}

//重置cntRTT和minRTT

/\* Wipe the slate clean for the next RTT. \*/

vegas->cntRTT = 0;

vegas->minRTT = 0x7fffffff;

}

/\* Use normal slow start \*/

else if (tp->snd_cwnd <= tp->snd_ssthresh)

tcp_slow_start(tp);

}