首页 > kernel, 协议 > tcp拥塞算法vegas分析

tcp拥塞算法vegas分析

原创文章,转载请注明: 转载自pagefault

本文链接地址: tcp拥塞算法vegas分析

tcp的vegas算法是基于delay的一个拥塞控制算法,所谓基于delay也就是说窗口的变化只和RTT的变化相关。而传统的基于丢包的算法是窗口的变化和丢包相关.

先来看原理,paper地址在这里(94年提出来的),基本上linux的实现就是按照paper来实现的,注意Vegas它是第一个基于delay的拥塞算法.

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.100.9587&rep=rep1&type=pdf

它的主要思想是估计一段时间能够发送的数据量,然后和最终发送的数据量比较。如果预测要发送的数据没有被发送,那么就会被认为可能出现拥塞状况,如果这个状态持久,那么就减慢发送速度,并且这个算法不仅作用于拥塞避免状态,而且还作用于slow start状态。

不过vegas的缺点也是很明显.那就是他会被欺骗,也就是说本身这个正向的延迟就是比它期待的高,比如在tcp中,有可能正向反向做的不是相同的路径,那么当反向有拥塞的时候,就有问题了。也就是数据包ack返回给发送端的就是延迟的。此时就会导致Vegas降低拥塞窗口。这个问题就是基于延迟的拥塞算法的一个陷阱。不过在linux下vegas算法被打开,只有是正常状态才会被打开,而只要遇到异常(丢包/快重传..)就会使用经典的newreno算法.

并且如果连接都是Vegas算法,那么这些连接就是公平的,而如果有些是,有些不是,那么就不是公平的了,因此经典的tcp发送者是会尝试填满网络中的队列,而Vegas是尝试着保持队列为空。因此最终就会导致使用经典tcp拥塞算法的,发送的数据包越来越多,而Vegas的就会越来越慢。

那么如何来统计数据量呢,在linux下是这样做的,首先会有一个采样周期(不小于2个RTT),然后根据这段采样的最小RTT和全局的最小RTT进行比较,从而来得到对应的窗口是否应当增长或者减小以及增加/减小多少.

接下来来看代码,首先来看核心的结构体,注释很详细:

struct vegas {
	u32	beg_snd_nxt;	/* right edge during last RTT */
	u32	beg_snd_una;	/* left edge  during last RTT */
	u32	beg_snd_cwnd;	/* saves the size of the cwnd */
	u8	doing_vegas_now;/* if true, do vegas for this RTT */
	u16	cntRTT;		/* # of RTTs measured within last RTT */
	u32	minRTT;		/* min of RTTs measured within last RTT (in usec) */
	u32	baseRTT;	/* the min of all Vegas RTT measurements seen (in usec) */
};

然后来看对应的RTT采集函数.很简单的函数,主要就是更新baseRTT/minRTT/cntRTT这三个值,其中minRTT和cntRTT在其他的地方还有更新的(因为他们每个采样周期都会被重置),后续我们会看到,:

void tcp_vegas_pkts_acked(struct sock *sk, u32 cnt, s32 rtt_us)
{
	struct vegas *vegas = inet_csk_ca(sk);
	u32 vrtt;

	if (rtt_us < 0)
		return;

	/* Never allow zero rtt or baseRTT */
	vrtt = rtt_us + 1;

	/* Filter to find propagation delay: */
	if (vrtt < vegas->baseRTT)
		vegas->baseRTT = vrtt;

	/* Find the min RTT during the last RTT to find
	 * the current prop. delay + queuing delay:
	 */
	vegas->minRTT = min(vegas->minRTT, vrtt);
	vegas->cntRTT++;
}

然后注意一下doing_vegas_now这个域,这个域表示是否需要vegas算法,而这个域的设置在set_state回调中,通过这个域,可以说混合了基于延迟和丢包的算法(类似微软的CTCP):

void tcp_vegas_state(struct sock *sk, u8 ca_state)
{

	if (ca_state == TCP_CA_Open)
		vegas_enable(sk);
	else
		vegas_disable(sk);
}

最后来看核心的算法,它算法是这样子的,由于我们需要采样某段时间(3个RTT)之间的数据发送速度,因此这里就只能在每次采样结束之后才会去使用vegas算法(否则使用newreno),并且只有good ack才有效(ack大于我们上次我们采样结束时保存的snd_nxt,否则还是运行new_reno算法).也就是通过这段时间的RTT变化,来推算下段时间的窗口变化.

然后来看根据采样值如何来计算的,这里通过计算这段采样中的发送速度,然后用这个速度来乘全局最小的RTT,这个值就是期待的拥塞窗口值,然后再用类似方法来计算对应的窗口变化差值(通过RTT的变化),这个差值就是算法的核心,通过这个diff来决定拥塞窗口如何变化。

来看代码如何实现。

//拥塞避免时使用
static int alpha = 2;
static int beta  = 4;
//slow start时使用
static int gamma = 1;

static void tcp_vegas_cong_avoid(struct sock *sk, u32 ack, u32 in_flight)
{
	struct tcp_sock *tp = tcp_sk(sk);
	struct vegas *vegas = inet_csk_ca(sk);

	if (!vegas->doing_vegas_now) {
		tcp_reno_cong_avoid(sk, ack, in_flight);
		return;
	}
//判断ack,也就是上次采样标记之后的数据全部ack之后,才会进入vegas算法.

	if (after(ack, vegas->beg_snd_nxt)) {
		/* Do the Vegas once-per-RTT cwnd adjustment. */

		/* Save the extent of the current window so we can use this
		 * at the end of the next RTT.
		 */
		vegas->beg_snd_nxt  = tp->snd_nxt;

		/* We do the Vegas calculations only if we got enough RTT
		 * samples that we can be reasonably sure that we got
		 * at least one RTT sample that wasn't from a delayed ACK.
		 * If we only had 2 samples total,
		 * then that means we're getting only 1 ACK per RTT, which
		 * means they're almost certainly delayed ACKs.
		 * If  we have 3 samples, we should be OK.
		 */

		if (vegas->cntRTT <= 2) {
			/* We don't have enough RTT samples to do the Vegas
			 * calculation, so we'll behave like Reno.
			 */
			tcp_reno_cong_avoid(sk, ack, in_flight);
		} else {
			u32 rtt, diff;
			u64 target_cwnd;

			/* We have enough RTT samples, so, using the Vegas
			 * algorithm, we determine if we should increase or
			 * decrease cwnd, and by how much.
			 */

			/* Pluck out the RTT we are using for the Vegas
			 * calculations. This is the min RTT seen during the
			 * last RTT. Taking the min filters out the effects
			 * of delayed ACKs, at the cost of noticing congestion
			 * a bit later.
			 */
			rtt = vegas->minRTT;

			/* Calculate the cwnd we should have, if we weren't
			 * going too fast.
			 *
			 * This is:
			 *     (actual rate in segments) * baseRTT
			 */
//计算下次应该的窗口
			target_cwnd = tp->snd_cwnd * vegas->baseRTT / rtt;

			/* Calculate the difference between the window we had,
			 * and the window we would like to have. This quantity
			 * is the "Diff" from the Arizona Vegas papers.
			 */
//计算窗口的变化差
			diff = tp->snd_cwnd * (rtt-vegas->baseRTT) / vegas->baseRTT;
//如果变化太大(适用于slow start),则减小窗口
			if (diff > gamma && tp->snd_cwnd <= tp->snd_ssthresh) {
				/* Going too fast. Time to slow down
				 * and switch to congestion avoidance.
				 */

				/* Set cwnd to match the actual rate
				 * exactly:
				 *   cwnd = (actual rate) * baseRTT
				 * Then we add 1 because the integer
				 * truncation robs us of full link
				 * utilization.
				 */
				tp->snd_cwnd = min(tp->snd_cwnd, (u32)target_cwnd+1);
				tp->snd_ssthresh = tcp_vegas_ssthresh(tp);

			} else if (tp->snd_cwnd <= tp->snd_ssthresh) {
				/* Slow start.  */
				tcp_slow_start(tp);
			} else {
				/* Congestion avoidance. */

				/* Figure out where we would like cwnd
				 * to be.
				 */
				if (diff > beta) {
					/* The old window was too fast, so
					 * we slow down.
					 */
//发送太快,因此减速
					tp->snd_cwnd--;
					tp->snd_ssthresh
						= tcp_vegas_ssthresh(tp);
				} else if (diff < alpha) {
					/* We don't have enough extra packets
					 * in the network, so speed up.
					 */
//网络中包太少,因此加速
					tp->snd_cwnd++;
				} else {
					/* Sending just as fast as we
					 * should be.
					 */
				}
			}

			if (tp->snd_cwnd < 2)
				tp->snd_cwnd = 2;
			else if (tp->snd_cwnd > tp->snd_cwnd_clamp)
				tp->snd_cwnd = tp->snd_cwnd_clamp;

			tp->snd_ssthresh = tcp_current_ssthresh(sk);
		}
//重置cntRTT和minRTT
		/* Wipe the slate clean for the next RTT. */
		vegas->cntRTT = 0;
		vegas->minRTT = 0x7fffffff;
	}
	/* Use normal slow start */
	else if (tp->snd_cwnd <= tp->snd_ssthresh)
		tcp_slow_start(tp);

}
Share
分类: kernel, 协议 标签: , ,
  1. 本文目前尚无任何评论.
  1. 本文目前尚无任何 trackbacks 和 pingbacks.