我使用haproxy来负载平衡我的MQTT代理集群。每个MQTT代理可以轻松地处理多达1,00000个连接。但是,我在only中面临的问题是,只处理每个节点的upto30k连接。每当任何节点接近32k连接时,haproxy CPU就会突然飙升到100%,现在所有连接都开始下降。
这方面的问题是,每30k连接,我必须转到另一个MQTT代理。如何将其增加到每个MQTT代理节点的至少60k连接?
我的虚拟机:1 Cpu,2 GB内存。我曾尝试增加CPU数量,但也面临着同样的问题。
我的配置-
bind 0.0.0.0:1883
maxconn 1000000
mode tcp
#sticky session load balancing – new feature
stick-table type string len 32 size 200k expire 30m
stick on req.payload(0,0),mqtt_field_value(connect,client_identifier)
option clitcpka # For TCP keep-alive
option tcplog
timeout client 600s
timeout server 2h
timeout check 5000
server mqtt1 10.20.236.140:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt2 10.20.236.142:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5
server mqtt3 10.20.236.143:1883 check-send-proxy send-proxy-v2 check inter 10s fall 2 rise 5我还调好了系统参数。
sysctl -w net.core.somaxconn=60000
sysctl -w net.ipv4.tcp_max_syn_backlog=16384
sysctl -w net.core.netdev_max_backlog=16384
sysctl -w net.ipv4.ip_local_port_range='1024 65535'
sysctl -w net.ipv4.tcp_rmem='1024 4096 16777216'
sysctl -w net.ipv4.tcp_wmem='1024 4096 16777216'
modprobe ip_conntrack
sysctl -w net.nf_conntrack_max=1000000
sysctl -w net.netfilter.nf_conntrack_max=1000000
sysctl -w net.netfilter.nf_conntrack_tcp_timeout_time_wait=30
sysctl -w net.ipv4.tcp_max_tw_buckets=1048576
sysctl -w net.ipv4.tcp_fin_timeout=15
tee -a /etc/security/limits.conf << EOF
root soft nofile 1048576
root hard nofile 1048576
haproxy soft nproc 1048576
haproxy hard nproc 1048576
EOFHaproxy haproxy -v的输出
HAProxy version 2.4.18-1ppa1~focal 2022/07/27 - https://haproxy.org/
Status: long-term supported branch - will stop receiving fixes around Q2 2026.
Known bugs: http://www.haproxy.org/bugs/bugs-2.4.18.html
Running on: Linux 5.4.0-122-generic #138-Ubuntu SMP Wed Jun 22 15:00:31 UTC 2022 x86_64
Build options :
TARGET = linux-glibc
CPU = generic
CC = cc
CFLAGS = -O2 -g -O2 -fdebug-prefix-map=/build/haproxy-96Se88/haproxy-2.4.18=. -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -Wall -Wextra -Wdeclaration-after-statement -fwrapv -Wno-address-of-packed-member -Wno-unused-label -Wno-sign-compare -Wno-unused-parameter -Wno-clobbered -Wno-missing-field-initializers -Wno-cast-function-type -Wtype-limits -Wshift-negative-value -Wshift-overflow=2 -Wduplicated-cond -Wnull-dereference
OPTIONS = USE_PCRE2=1 USE_PCRE2_JIT=1 USE_OPENSSL=1 USE_LUA=1 USE_SLZ=1 USE_SYSTEMD=1 USE_PROMEX=1
DEBUG =
Feature list : +EPOLL -KQUEUE +NETFILTER -PCRE -PCRE_JIT +PCRE2 +PCRE2_JIT +POLL -PRIVATE_CACHE +THREAD -PTHREAD_PSHARED +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +FUTEX +ACCEPT4 -CLOSEFROM -ZLIB +SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING
Default settings :
bufsize = 16384, maxrewrite = 1024, maxpollevents = 200
Built with multi-threading support (MAX_THREADS=64, default=1).
Built with OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020
Running on OpenSSL version : OpenSSL 1.1.1f 31 Mar 2020
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.3
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with libslz for stateless compression.
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE2 version : 10.34 2019-11-21
PCRE2 library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 9.4.0
Available polling systems :
epoll : pref=300, test result OK
poll : pref=200, test result OK
select : pref=150, test result OK
Total: 3 (3 usable), will use epoll.
Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
h2 : mode=HTTP side=FE|BE mux=H2 flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
fcgi : mode=HTTP side=BE mux=FCGI flags=HTX|HOL_RISK|NO_UPG
<default> : mode=HTTP side=FE|BE mux=H1 flags=HTX
h1 : mode=HTTP side=FE|BE mux=H1 flags=HTX|NO_UPG
<default> : mode=TCP side=FE|BE mux=PASS flags=
none : mode=TCP side=FE|BE mux=PASS flags=NO_UPG
Available services : prometheus-exporter
Available filters :
[SPOE] spoe
[CACHE] cache
[FCGI] fcgi-app
[COMP] compression
[TRACE] trace发布于 2022-09-01 13:46:01
我看不出你的配置中有什么能解释为什么你没有CPU就不能达到超过30K的连接。我不确定那些调优的系统参数对你有什么好处。
作为参考,我成功地运行了带有2CPU和4G内存的HAProxy (香草坞映像haproxy:2.4.17),并且可以达到150 K的maxconn,而不需要连接CPU。
在每个实例缩放问题之外,您可能遇到的另一个问题是假设一个HAProxy到1个MQTT。
的问题是,对于每30k连接,我必须转到另一个MQTT代理。
根据我的经验,最好在单个代理之前运行多个HAProxy节点。(不确定您的约束是什么。)每个后端拥有多个HAProxy实例是非常关键的。当您失去一个实例时,您将只损失您流量的一小部分。(时间,而不是如果。)这是这里的关键部分,因为当流量被丢弃时,客户端将同时尝试重新连接。这就是你意外的DDoS自己,因为一个丢失的虚拟机或荚。
在您当前的垂直比例(CPU和内存),您可以获得30K连接。如果您设置maxconn 30K,那将是34个节点。如果您以Kubernetes吊舱的形式将它们作为部署来运行,在HAProxy集群前面使用类似ELB的内容,那么这应该是很容易做到的。无论如何,您需要弄清楚如何在不重复后端的情况下将HAProxy作为集群运行。
一个好的经验法则是“当升级不再有效时,扩大规模”。
https://stackoverflow.com/questions/73539455
复制相似问题