我们在生产环境中有一个很大的mysql性能问题,18个“前端”服务器将mysql查询发送到一个惟一的mysql服务器。我们在每个前端服务器上使用Redis做了许多事情,但是仍然有一个重要的工作只在mysql服务器上完成。因此,我们在18台服务器上各有5个进程,向mysql服务器发送大的select查询。where条件对于每个查询都是变化的,因为结果被缓存在每个ad服务器上的redis缓存中,并在where条件相同时重用。因为我们有不同的缓存系统,所以这个查询使用的是SQL_NO_CACHE选项。系统上还有其他的查询,很多小的写和一些大的读和写,但是大部分的负载来自这个查询。查询相当大:它使用17个表,其中一些表有几百万条记录,查询最多可以返回5000条记录,每条记录都相当大。但是,查询经过了很好的优化,在负载较低时运行在0.02s内。在我们通常的系统负载下,查询平均在1.5s内运行。我们在每个广告服务器上处理查询队列,并监视查询在执行前平均停留在队列中的时间。通常,查询在队列中停留约2分钟。但是最近,随着流量的增加,在增加了几台服务器之后,平均执行时间增加到了2s,排队时间超过了20分钟。所以我们决定尝试一些微调..。
我们的mysql服务器是一个24核,运行Debian6,有64GRAM和MySQL 5.1 (仅限InnoDB)。由于CPU和RAM没有100%的使用,我们认为可能会有另一个瓶颈。我们注意到服务器上有许多处于time_wait状态的TCP套接字,并认为可能可用的TCP套接字数量是一个瓶颈,因此我们从使用单个IP地址改为对mysql服务器使用5个IP地址。我们还将TCP套接字的生存期从2小时改为2分钟。2小时后,性能得到了很好的提高:查询平均执行时间从2s下降到0.5s,队列时间从20分钟下降到10分钟…。但在一天半之后,执行时间恢复到15秒,排队时间延长到20分钟。为了回到以前的情况,我们已经尝试过逆转所有的变化,但是都没有成功。
我们正在寻找一位具有InnoDB/Debian良好经验的高级DBA,他可以帮助我们解决这个性能问题,然后帮助我们找到瓶颈并优化mysql服务器的调优。
以下是我们的my.cnf文件:
# Example MySQL config file for very large systems.
#
# This is for a large system with memory of 1G-2G where the system runs mainly
# MySQL.
#
# You can copy this file to
# /etc/my.cnf to set global options,
# mysql-data-dir/my.cnf to set server-specific options (in this
# installation this directory is @localstatedir@) or
# ~/.my.cnf to set user-specific options.
#
# In this file, you can use all long options that a program supports.
# If you want to know which options a program supports, run the program
# with the "--help" option.
# The following options will be passed to all MySQL clients
[client]
#password = your_password
port = 3306
socket = /var/run/mysqld/mysqld.sock
# The MySQL server
[mysqld]
skip-name-resolve
skip-federated
default_time_zone = 'Europe/Paris'
innodb_locks_unsafe_for_binlog = 1
transaction-isolation=READ-UNCOMMITTED
#innodb_force_recovery = 4
#skip-grant-tables
#skip-networking
#replicate-ignore-db = mysql
user = mysql
port = 3306
old_passwords = false
max_connect_errors = 10000
socket = /var/run/mysqld/mysqld.sock
pid-file = /var/run/mysqld/mysqld.pid
datadir = /var/lib/mysql/prod
#basedir = /usr
tmpdir = /var/tmp
#skip-locking
# LC 28-06-2012 Passage a 300M apres calcul des index des table MyISAM :
# select sum(INDEX_LENGTH/1024/1024) from information_schema.tables where engine="MyISAM";
key_buffer = 300M
max_allowed_packet = 128M
# Modif LC 28-06-2012 car pas assez de tables cachees 80000 -> 160000 -> 40000 -> 20000 -> 1000
# Ajout de table_open_cache le nouveau nom du parametre
#table_cache = 6000
table_open_cache = 24000 # max_connections * 18 ( 18 tables used in targeting query ) + delta
table_definition_cache = 512
sort_buffer_size = 32M
read_buffer_size = 10M
read_rnd_buffer_size = 3M
myisam_sort_buffer_size = 1M
thread_cache_size = 64
query_cache_type = 1
query_cache_limit = 128M
query_cache_size = 512M
query_cache_min_res_unit = 8K
join_buffer_size = 4M
max_connections = 200
low_priority_updates = 1 # JBN : On the slave, updates are more important than reads
concurrent_insert = 2
wait_timeout = 300
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 48
#thread_concurrency = 24
#
#Temporary tables
#
# Modif LC 11-06-2012 car trop de tables temporaires 1024 -> 8192
#tmp_table_size = 1024M
#max_heap_table_size = 1024M
tmp_table_size = 8192M
max_heap_table_size = 8192M
max_tmp_tables = 64
#
# log paths
#
log-error = /var/log/mysql/error.log
slow_query_log_file = /var/log/mysql/mysql-slow.log
slow_query_log = 1
long_query_time = 1
log-queries-not-using-indexes
# This is a master server in a master/slave replication (id=1 + binlogging)
server-id = 1
#log_bin = /var/lib/mysql/mysql-bin.log
expire_logs_days = 1
max_binlog_size = 1G
#binlog_do_db = include_database_name
#binlog_ignore_db = include_database_name
#relay_log = mysql-relay
#relay_log_space_limit = 50G
# Fulltext search
ft_min_word_len = 3
#
# Language variables
#
character-set-server = utf8
collation-server = utf8_general_ci
language = /usr/share/mysql/english
#
# InnoDB parameter
#
ignore_builtin_innodb
plugin-load=innodb=ha_innodb_plugin.so;innodb_trx=ha_innodb_plugin.so;innodb_locks=ha_innodb_plugin.so;innodb_lock_waits=ha_innodb_plugin.so;innodb_cmp=ha_innodb_plugin.so;innodb_cmp_reset=ha_innodb_plugin.so;innodb_cmpmem=ha_innodb_plugin.so;innodb_cmpmem_reset=ha_innodb_plugin.so
innodb_file_format=barracuda
# Uncomment the following if you are using InnoDB tables
#innodb_data_home_dir = @localstatedir@/
innodb_data_file_path = ibdata1:512M:autoextend
innodb_table_locks = OFF
innodb_support_xa = 0
innodb_autoinc_lock_mode = 2 # WARNING : binlog-format = ROW IS MANDATORY
# binlog_format = row
innodb_flush_log_at_trx_commit = 2
#innodb_log_group_home_dir = @localstatedir@/
#innodb_log_arch_dir = @localstatedir@/
# You can set .._buffer_pool_size up to 50 - 80 %
# of RAM but beware of setting memory usage too high
# LC 28-06-2012 innodb_buffer_pool_size 12G -> 24G -> 18G
# KHA 29-06-2012 -> 12G ( 18G too big )
innodb_buffer_pool_size = 48G
innodb_additional_mem_pool_size = 32M
innodb_log_file_size = 128M
innodb_log_buffer_size = 8M
#innodb_flush_log_at_trx_commit = 1
innodb_lock_wait_timeout = 250
innodb_file_per_table
# LC 12-06-2012 innodb_thread_concurrency 64-> 24
innodb_thread_concurrency = 12
#innodb_thread_concurrency = 64
# LC 17-06-2012 Ajout innodb_flush_method=O_DIRECT pour voir l'impact perf au niveau RAID hardware
# A tester
innodb_flush_method=O_DIRECT
[mysqldump]
quick
max_allowed_packet = 16M
# Language variables needed for backup / restores
default-character-set = utf8
[mysql]
no-auto-rehash
# Remove the next comment character if you are not familiar with SQL
#safe-updates
# Language variables needed for backup / restores
default-character-set = utf8
[isamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[myisamchk]
key_buffer = 256M
sort_buffer_size = 256M
read_buffer = 2M
write_buffer = 2M
[mysqlhotcopy]
interactive-timeout
# Cyb tuning
[mysqld]
log-warnings=2 # logs more information to log-error提前谢谢你的帮助。
更新:
我们发现了性能下降的原因:一个合作伙伴开始向我们发送更详细的数据,大大增加了查询返回的行数和检查的行数。为了避免这种情况,我们将旧数据存档,并将其从2个主表中删除,并将限制5000改为限制500。我们的服务有一点退化,但我们可以接受,平均查询响应时间下降到5s,这是可以接受的,并允许我们工作。
现在我们已经了解到,我们的更改并不是性能下降的原因,我们将重新应用它们(更多无害线程和监听多个IP地址的mySQL )。
发布于 2013-11-19 08:32:21
你好,您的MySQL服务器和至少一个前端服务器有munin图吗?在我遇到的情况下,这个问题很少来自TIME_WAIT。在Apache上存在大量连接的问题上,解决方案是创建多个实例。重新编译内核以缩短TIME_WAIT的持续时间,没有什么改变。
关于MySQL配置(但没有看到任何图形或访问服务器,很难说),我不会将query_cache_limit设置得那么高,您是否尝试过将innodb_thread_concurrency设置为8-10,将0和thread_concurrency设置为更低?
服务器交换吗?你把调皮调到0了吗?根据内存的使用情况,可以根据您的数据量进一步增加innodb_buffer_pool_size。尝试将innodb_buffer_pool_instances提高到4或8,这在某些情况下是一个改进。
https://dba.stackexchange.com/questions/53375
复制相似问题