tcp拥塞算法vegas分析

tcp的vegas算法是基于delay的一个拥塞控制算法，所谓基于delay也就是说窗口的变化只和RTT的变化相关。而传统的基于丢包的算法是窗口的变化和丢包相关.

先来看原理，paper地址在这里(94年提出来的),基本上linux的实现就是按照paper来实现的，注意Vegas它是第一个基于delay的拥塞算法.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.9587&rep=rep1&type=pdf

它的主要思想是估计一段时间能够发送的数据量，然后和最终发送的数据量比较。如果预测要发送的数据没有被发送，那么就会被认为可能出现拥塞状况，如果这个状态持久，那么就减慢发送速度，并且这个算法不仅作用于拥塞避免状态，而且还作用于slow start状态。

不过vegas的缺点也是很明显.那就是他会被欺骗，也就是说本身这个正向的延迟就是比它期待的高，比如在tcp中，有可能正向反向做的不是相同的路径，那么当反向有拥塞的时候，就有问题了。也就是数据包ack返回给发送端的就是延迟的。此时就会导致Vegas降低拥塞窗口。这个问题就是基于延迟的拥塞算法的一个陷阱。不过在linux下vegas算法被打开，只有是正常状态才会被打开，而只要遇到异常(丢包/快重传..)就会使用经典的newreno算法.

并且如果连接都是Vegas算法，那么这些连接就是公平的，而如果有些是，有些不是，那么就不是公平的了，因此经典的tcp发送者是会尝试填满网络中的队列，而Vegas是尝试着保持队列为空。因此最终就会导致使用经典tcp拥塞算法的，发送的数据包越来越多，而Vegas的就会越来越慢。

那么如何来统计数据量呢，在linux下是这样做的，首先会有一个采样周期(不小于2个RTT)，然后根据这段采样的最小RTT和全局的最小RTT进行比较，从而来得到对应的窗口是否应当增长或者减小以及增加/减小多少.

接下来来看代码,首先来看核心的结构体，注释很详细:

  
struct vegas {
	  
u32 beg_snd_nxt; /\* right edge during last RTT \*/
	  
u32 beg_snd_una; /\* left edge during last RTT \*/
	  
u32 beg_snd_cwnd; /\* saves the size of the cwnd \*/
	  
u8 doing_vegas_now;/\* if true, do vegas for this RTT \*/
	  
u16 cntRTT; /\* # of RTTs measured within last RTT \*/
	  
u32 minRTT; /\* min of RTTs measured within last RTT (in usec) \*/
	  
u32 baseRTT; /\* the min of all Vegas RTT measurements seen (in usec) \*/
  
};

然后来看对应的RTT采集函数.很简单的函数，主要就是更新baseRTT/minRTT/cntRTT这三个值，其中minRTT和cntRTT在其他的地方还有更新的(因为他们每个采样周期都会被重置)，后续我们会看到，:

  
void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, s32 rtt_us)
  
{
	  
struct vegas *vegas = inet_csk_ca(sk);
	  
u32 vrtt;

if (rtt_us < 0)
		  
return;

/\* Never allow zero rtt or baseRTT \*/
	  
vrtt = rtt_us + 1;

/\* Filter to find propagation delay: \*/
	  
if (vrtt < vegas->baseRTT)
		  
vegas->baseRTT = vrtt;

/* Find the min RTT during the last RTT to find
	   
* the current prop. delay + queuing delay:
	   
*/
	  
vegas->minRTT = min(vegas->minRTT, vrtt);
	  
vegas->cntRTT++;
  
}

然后注意一下doing_vegas_now这个域，这个域表示是否需要vegas算法,而这个域的设置在set_state回调中，通过这个域，可以说混合了基于延迟和丢包的算法(类似微软的CTCP):

  
void tcp_vegas_state(struct sock *sk, u8 ca_state)
  
{

if (ca_state == TCP_CA_Open)
		  
vegas_enable(sk);
	  
else
		  
vegas_disable(sk);
  
}

最后来看核心的算法，它算法是这样子的，由于我们需要采样某段时间(3个RTT)之间的数据发送速度，因此这里就只能在每次采样结束之后才会去使用vegas算法(否则使用newreno)，并且只有good ack才有效(ack大于我们上次我们采样结束时保存的snd_nxt,否则还是运行new_reno算法).也就是通过这段时间的RTT变化，来推算下段时间的窗口变化.

然后来看根据采样值如何来计算的，这里通过计算这段采样中的发送速度，然后用这个速度来乘全局最小的RTT,这个值就是期待的拥塞窗口值，然后再用类似方法来计算对应的窗口变化差值(通过RTT的变化)，这个差值就是算法的核心，通过这个diff来决定拥塞窗口如何变化。

来看代码如何实现。

  
//拥塞避免时使用
  
static int alpha = 2;
  
static int beta = 4;
  
//slow start时使用
  
static int gamma = 1;

static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
  
{
	  
struct tcp_sock *tp = tcp_sk(sk);
	  
struct vegas *vegas = inet_csk_ca(sk);

if (!vegas->doing_vegas_now) {
		  
tcp_reno_cong_avoid(sk, ack, in_flight);
		  
return;
	  
}
  
//判断ack,也就是上次采样标记之后的数据全部ack之后，才会进入vegas算法.

if (after(ack, vegas->beg_snd_nxt)) {
		  
/\* Do the Vegas once-per-RTT cwnd adjustment. \*/

/* Save the extent of the current window so we can use this
		   
* at the end of the next RTT.
		   
*/
		  
vegas->beg_snd_nxt = tp->snd_nxt;

/* We do the Vegas calculations only if we got enough RTT
		   
* samples that we can be reasonably sure that we got
		   
* at least one RTT sample that wasn&#8217;t from a delayed ACK.
		   
* If we only had 2 samples total,
		   
* then that means we&#8217;re getting only 1 ACK per RTT, which
		   
* means they&#8217;re almost certainly delayed ACKs.
		   
* If we have 3 samples, we should be OK.
		   
*/

if (vegas->cntRTT <= 2) {
			  
/* We don&#8217;t have enough RTT samples to do the Vegas
			   
* calculation, so we&#8217;ll behave like Reno.
			   
*/
			  
tcp_reno_cong_avoid(sk, ack, in_flight);
		  
} else {
			  
u32 rtt, diff;
			  
u64 target_cwnd;

/* We have enough RTT samples, so, using the Vegas
			   
* algorithm, we determine if we should increase or
			   
* decrease cwnd, and by how much.
			   
*/

/* Pluck out the RTT we are using for the Vegas
			   
* calculations. This is the min RTT seen during the
			   
* last RTT. Taking the min filters out the effects
			   
* of delayed ACKs, at the cost of noticing congestion
			   
* a bit later.
			   
*/
			  
rtt = vegas->minRTT;

/* Calculate the cwnd we should have, if we weren&#8217;t
			   
* going too fast.
			   
*
			   
* This is:
			   
\* (actual rate in segments) \* baseRTT
			   
*/
  
//计算下次应该的窗口
			  
target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;

/* Calculate the difference between the window we had,
			   
* and the window we would like to have. This quantity
			   
* is the "Diff" from the Arizona Vegas papers.
			   
*/
  
//计算窗口的变化差
			  
diff = tp->snd_cwnd * (rtt-vegas->baseRTT) / vegas->baseRTT;
  
//如果变化太大(适用于slow start)，则减小窗口
			  
if (diff > gamma && tp->snd_cwnd <= tp->snd_ssthresh) {
				  
/* Going too fast. Time to slow down
				   
* and switch to congestion avoidance.
				   
*/

/* Set cwnd to match the actual rate
				   
* exactly:
				   
\* cwnd = (actual rate) \* baseRTT
				   
* Then we add 1 because the integer
				   
* truncation robs us of full link
				   
* utilization.
				   
*/
				  
tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
				  
tp->snd_ssthresh = tcp_vegas_ssthresh(tp);

} else if (tp->snd_cwnd <= tp->snd_ssthresh) {
				  
/\* Slow start. \*/
				  
tcp_slow_start(tp);
			  
} else {
				  
/\* Congestion avoidance. \*/

/* Figure out where we would like cwnd
				   
* to be.
				   
*/
				  
if (diff > beta) {
					  
/* The old window was too fast, so
					   
* we slow down.
					   
*/
  
//发送太快，因此减速
					  
tp->snd_cwnd&#8211;;
					  
tp->snd_ssthresh
						  
= tcp_vegas_ssthresh(tp);
				  
} else if (diff < alpha) {
					  
/* We don&#8217;t have enough extra packets
					   
* in the network, so speed up.
					   
*/
  
//网络中包太少，因此加速
					  
tp->snd_cwnd++;
				  
} else {
					  
/* Sending just as fast as we
					   
* should be.
					   
*/
				  
}
			  
}

if (tp->snd_cwnd < 2)
				  
tp->snd_cwnd = 2;
			  
else if (tp->snd_cwnd > tp->snd_cwnd_clamp)
				  
tp->snd_cwnd = tp->snd_cwnd_clamp;

tp->snd_ssthresh = tcp_current_ssthresh(sk);
		  
}
  
//重置cntRTT和minRTT
		  
/\* Wipe the slate clean for the next RTT. \*/
		  
vegas->cntRTT = 0;
		  
vegas->minRTT = 0x7fffffff;
	  
}
	  
/\* Use normal slow start \*/
	  
else if (tp->snd_cwnd <= tp->snd_ssthresh)
		  
tcp_slow_start(tp);

}