Nginx+;Gunicorn&x2B;Django高延迟

Nginx+;Gunicorn&x2B;Django高延迟,django,nginx,gunicorn,Django,Nginx,Gunicorn,我正在尝试调整我的WS以支持约20k个并发用户 无论我改变什么配置,当我的测试遇到2(两)k个用户和各种502/504错误时,我仍然得到相同的6秒平均响应时间/每个端点 网络服务: CloudFlare Nginx Gunicorn Django/DRF Memcache Postgres 以下是我尝试过的: 将gunicorn员工从4人增加到10人 将服务(pod)实例从3个增加到10个 将gunicorn工作线程超时增加到120 将Nginx proxy_pass超时增加到120 大多数

我正在尝试调整我的WS以支持约20k个并发用户

无论我改变什么配置,当我的测试遇到2(两)k个用户和各种502/504错误时,我仍然得到相同的6秒平均响应时间/每个端点

网络服务:

CloudFlare Nginx Gunicorn Django/DRF Memcache Postgres

以下是我尝试过的:

  • 将gunicorn员工从4人增加到10人
  • 将服务(pod)实例从3个增加到10个
  • 将gunicorn工作线程超时增加到120
  • 将Nginx proxy_pass超时增加到120
大多数端点每100秒命中一次数据库,其他请求从memcache获取数据

有人能帮我指出我应该改变什么样的配置吗

我应该在哪里查找延迟/瓶颈

Gunicorn的工作人员显然正在退出,我不理解这一点,因为WS视图中没有逻辑。它应该只从memcache获取一个查询并返回它

Nginx日志:

latforms/android HTTP/1.1", upstream: "http://10.0.1.17:9090/endpoints/platforms/android", host: "myhost.co"
2018/08/13 23:43:25 [error] 8893#8893: *2809163 upstream timed out (110: Connection timed out) while connecting to upstream, client: 200.211.198.133, server: myhost.co, request: "GET /endpoints/store/products/729 HTTP/1.1", upstream: "http://10.0.1.18:9090/endpoints/store/products/729", host: "myhost.co"
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1" 200 1718 "-" "python-requests/2.18.4" 627 80.840 [production-service-api-80] 10.0.0.112:9090, 10.0.1.13:9090, 10.0.0.113:9090 0, 0, 11150 40.000, 40.000, 0.840 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1" 200 1718 "-" "python-requests/2.18.4" 689 80.857 [production-service-api-80] 10.0.0.112:9090, 10.0.1.12:9090, 10.0.0.113:9090 0, 0, 11150 40.000, 40.000, 0.857 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/home/ HTTP/1.1" 200 10072 "-" "python-requests/2.18.4" 670 80.580 [production-service-api-80] 10.0.1.13:9090, 10.0.1.11:9090, 10.0.0.112:9090 0, 0, 66511 40.001, 40.002, 0.577 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/691/ HTTP/1.1" 200 703 "-" "python-requests/2.18.4" 646 80.486 [production-service-api-80] 10.0.1.8:9090, 10.0.1.13:9090, 10.0.1.12:9090 0, 0, 1968 40.000, 40.000, 0.486 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/5458 HTTP/1.1" 301 0 "-" "python-requests/2.18.4" 678 80.444 [production-service-api-80] 10.0.1.13:9090, 10.0.1.12:9090, 10.0.1.17:9090 0, 0, 0 40.000, 40.002, 0.442 504, 504, 301
....
90, 10.0.1.11:9090, 10.0.1.8:9090 0, 0, 1968 40.000, 40.000, 0.584 504, 504, 200
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/5458/ HTTP/1.1" 200 241 "-" "python-requests/2.18.4" 647 80.709 [production-service-api-80] 10.0.0.113:9090, 10.0.1.8:9090, 10.0.0.112:9090 0, 0, 327 40.001, 40.000, 0.708 504, 504, 200
--
2018/08/13 23:43:25 [error] 8766#8766: *2809243 upstream timed out (110: Connection timed out) while connecting to upstream, client: 200.211.198.133, server: myhost.co, request: "GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1", upstream: "http://10.0.1.13:9090/endpoints/store/categories/?cat_pk=13081", host: "myhost.co"
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/692 HTTP/1.1" 301 0 "-" "python-requests/2.18.4" 677 80.672 [production-service-api-80] 10.0.1.17:9090, 10.0.1.10:9090, 10.0.0.113:9090 0, 0, 0 40.001, 40.001, 0.670 504, 504, 301
200.211.198.133 - [200.211.198.133] - - [13/Aug/2018:23:43:25 +0000] "GET /endpoints/store/products/4608/ HTTP/1.1" 200 553 "-" "python-requests/2.18.4" 647 80.591 [production-service-api-80] 10.0.1.11:9090, 10.0.1.17:9090, 10.0.1.8:9090 0, 0, 1090 40.000, 40.003, 0.588 504, 504, 200
Gunicorn日志:

{"asctime": "2018-08-13 23:42:55,145", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/store/products/691/ HTTP/1.1\" 200 1968 \"-\" \"python-requests/2.18.4\""}
{"asctime": "2018-08-13 23:42:55,167", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/store/products/729 HTTP/1.1\" 301 - \"-\" \"python-requests/2.18.4\""}
[2018-08-13 23:42:55 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:36)
[2018-08-13 23:42:55 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:37)
[2018-08-13 23:42:55 +0000] [382] [INFO] Booting worker with pid: 382
[2018-08-13 23:42:55 +0000] [383] [INFO] Booting worker with pid: 383
{"asctime": "2018-08-13 23:42:55,403", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/store/products/691/ HTTP/1.1\" 200 1968 \"-\" \"python-requests/2.18.4\""}
....
{"asctime": "2018-08-13 23:42:55,184", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/store/categories/?cat_pk=13081 HTTP/1.1\" 200 11150 \"-\" \"python-requests/2.18.4\""}
{"asctime": "2018-08-13 23:42:55,262", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/platforms/android HTTP/1.1\" 200 48 \"-\" \"python-requests/2.18.4\""}
{"asctime": "2018-08-13 23:42:55,439", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:55 +0000] \"GET /endpoints/platforms/android HTTP/1.1\" 200 48 \"-\" \"python-requests/2.18.4\""}
--
[2018-08-13 23:42:56 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:31)
{"asctime": "2018-08-13 23:42:56,689", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:56 +0000] \"GET /endpoints/store/products/729/ HTTP/1.1\" 200 2163 \"-\" \"python-requests/2.18.4\""}
{"asctime": "2018-08-13 23:42:56,799", "name": "gunicorn.access", "levelname": "INFO", "message": "10.0.0.13 - - [13/Aug/2018:23:42:56 +0000] \"GET /endpoints/store/products/5458/ HTTP/1.1\" 200 327 \"-\" \"python-requests/2.18.4\""}

你为什么不使用uwsgi

为了更好地工作,请这样做

  • 减少代码中的数据库命中率
  • 增加gunicorn的工作人员计数
  • gunicorn和nginx的diable信息记录

  • 如果这些配置不适用于您,您必须更改设置配置或增加服务器资源。

    关于uwsgi:我们尝试了uwsgi,但问题仍然存在,因此我们改回gunicorn,并决定尝试“微调”它。如果你能为uwsgi指出良好的调谐设置,我们也愿意尝试。至于1和2,我们已经做到了。所有查询都被缓存,我们尝试从一个范围或工作者编号(3到10)开始。对于3-禁用日志记录真的是一个好的实践吗?对于大量用户的信息日志记录使IO成为瓶颈。假设每个请求至少有一个写在硬盘上。此外,如果代码中没有中断工作,则工作线程不得大于CPU线程。要查看每个请求的数据库命中率,请使用django profiler查看数据库命中率和。如果你在uwsgi上有问题,最好解决它们,而不是改用gunicorn。qunicorn比uwsgi更容易设置,但在大量请求中,它的效果并不好。