Socket recv（）挂起带有MSG_WAITALL的大消息_C_Linux_Sockets_Networking_Tcp

Socket recv（）挂起带有MSG_WAITALL的大消息

c linux sockets networking tcp

Socket recv（）挂起带有MSG_WAITALL的大消息,c,linux,sockets,networking,tcp,C,Linux,Sockets,Networking,Tcp,我有一个从服务器读取大文件的应用程序，它经常挂起在特定的机器上。它已经在RHEL5.2下成功运行了很长一段时间。我们最近升级到了RHEL6.1，它现在定期挂起我已经创建了一个测试应用程序来重现这个问题。它挂起的次数约为100次中的98次 #include <errno.h> #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/param.h>

我有一个从服务器读取大文件的应用程序，它经常挂起在特定的机器上。它已经在RHEL5.2下成功运行了很长一段时间。我们最近升级到了RHEL6.1，它现在定期挂起

我已经创建了一个测试应用程序来重现这个问题。它挂起的次数约为100次中的98次

#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/param.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>
#include <netdb.h>
#include <sys/socket.h>
#include <sys/time.h>

int mFD = 0;

void open_socket()
{
  struct addrinfo hints, *res;
  memset(&hints, 0, sizeof(hints));
  hints.ai_socktype = SOCK_STREAM;
  hints.ai_family = AF_INET;

  if (getaddrinfo("localhost", "60000", &hints, &res) != 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  mFD = socket(res->ai_family, res->ai_socktype, res->ai_protocol);

  if (mFD == -1)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  if (connect(mFD, res->ai_addr, res->ai_addrlen) < 0)
  {
    fprintf(stderr, "Exit %d\n", __LINE__);
    exit(1);
  }

  freeaddrinfo(res);
}

void read_message(int size, void* data)
{
  int bytesLeft = size;
  int numRd = 0;

  while (bytesLeft != 0)
  {
    fprintf(stderr, "reading %d bytes\n", bytesLeft);

    /* Replacing MSG_WAITALL with 0 works fine */
    int num = recv(mFD, data, bytesLeft, MSG_WAITALL);

    if (num == 0)
    {
      break;
    }
    else if (num < 0 && errno != EINTR)
    {
      fprintf(stderr, "Exit %d\n", __LINE__);
      exit(1);
    }
    else if (num > 0)
    {
      numRd += num;
      data += num;
      bytesLeft -= num;
      fprintf(stderr, "read %d bytes - remaining = %d\n", num, bytesLeft);
    }
  }

  fprintf(stderr, "read total of %d bytes\n", numRd);
}

int main(int argc, char **argv)
{
  open_socket();

  uint32_t raw_len = atoi(argv[1]);
  char raw[raw_len];

  read_message(raw_len, raw);

  return 0;
}

这将看到6000000字节发送到测试应用程序，该应用程序尝试使用对recv（）的单个调用读取数据

我希望听到任何关于我可能做错了什么的建议，或者任何进一步调试问题的方法。

MSG_WAITALL

应该阻止，直到收到所有数据。从：

此标志请求操作块停止，直到完全请求得到满足

但是，网络堆栈中的缓冲区可能不够大，无法包含所有内容，这就是服务器上出现错误消息的原因。客户端网络堆栈根本无法容纳那么多数据

解决方案是增加缓冲区大小（

sorcvbuf

选项到

setsockopt

），将消息拆分为较小的片段，或者接收较小的片段，将其放入自己的缓冲区。最后是我的建议

编辑：我在您的代码中看到，您已经按照我的建议执行了（使用自己的缓冲读取较小的数据块），所以只需删除

MSG_WAITALL

标志，它就可以工作了

哦，当

recv

返回零时，这意味着另一端已经关闭了连接，您也应该这样做。

考虑以下两个可能的规则：

在接收到已经发送的内容之前，接收方可能会等待发送方发送更多内容

发送方可以等待接收方收到已发送的内容，然后再发送更多内容

我们可以有这两条规则中的任何一条，但不能同时有这两条

为什么?？因为如果允许接收者等待发送者，这意味着发送者不能在发送更多之前等待接收者接收，否则我们会死锁。如果发送方被允许等待接收方，这意味着接收方不能在收到更多信息之前等待发送方发送，否则我们会死锁

如果这两件事同时发生，我们就会陷入僵局。在接收方收到已发送的信息之前，发送方不会发送更多信息，而接收方不会收到已发送的信息，除非发送方发送更多信息。轰

TCP选择规则2（原因显而易见）。因此，它不能支持规则1。但是在你的代码中，你是接收者，你在等待发送者发送更多的信息，然后才收到已经发送的信息。所以这将导致僵局

>  gcc socket_test.c -o socket_test
>  perl -e 'for (1..6000000){ print "a" }' | nc -l 60000
>  ./socket_test 6000000