今天同事遇到一个问题,就是server(read hat 5, kernel 2.6.18)的dmesg打出了下面两个信息

TCP: too many of orphaned sockets

Out of socket memory

一般我们看到这个信息,第一反应肯定是需要调节tcp_mem(/proc/sys/net/ipv4)了,可是根据当时的内存使用情况,使用的内存并没有超过 tcp_mem。然后我先去看了最新的内核代码,3.4.4,其中涉及到socket 内存报警在这里

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
  
bool tcp_check_oom(struct sock *sk, int shift)

{

bool too_many_orphans, out_of_socket_memory;

too_many_orphans = tcp_too_many_orphans(sk, shift);

out_of_socket_memory = tcp_out_of_memory(sk);

if (too_many_orphans && net_ratelimit())

pr_info("too many orphaned sockets\n");

if (out_of_socket_memory && net_ratelimit())

pr_info("out of memory — consider tuning tcp_mem\n");

return too_many_orphans || out_of_socket_memory;

}

上面的代码很简单,就是如果孤儿socket太多,则打印警告,然后如果socket memory超过限制,也打印出警告。

于是此时我就怀疑是老版本内核的问题,然后就找到了2.6.18涉及到这部分的代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50

static int tcp_out_of_resources(struct sock *sk, int do_reset)

{

struct tcp_sock *tp = tcp_sk(sk);

int orphans = atomic_read(&tcp_orphan_count);

………………………………………………………

//注意这里

if (orphans >= sysctl_tcp_max_orphans ||

(sk->sk_wmem_queued > SOCK_MIN_SNDBUF &&

atomic_read(&tcp_memory_allocated) > sysctl_tcp_mem[2])) {

if (net_ratelimit())

printk(KERN_INFO "Out of socket memory\n");

/* Catch exceptional cases, when connection requires reset.

\* 1. Last segment was sent recently. \*/

if ((s32)(tcp_time_stamp &#8211; tp->lsndtime) <= TCP_TIMEWAIT_LEN ||

/\* 2. Window is closed. \*/

(!tp->snd_wnd && !tp->packets_out))

do_reset = 1;

if (do_reset)

tcp_send_active_reset(sk, GFP_ATOMIC);

tcp_done(sk);

NET_INC_STATS_BH(LINUX_MIB_TCPABORTONMEMORY);

return 1;

}

return 0;

}

此时就找到原因了,原来在18的内核中,如果孤儿进程超过限制或者socket的内存超过限制,都会打印出out of socket memory。所以如果是18的内核,这部分是有误的,发生out of socket memory并不代表一定需要调节tcp_mem.