MySQL处理海量数据
本教程将介绍MySQL处理海量数据的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。
问题描述
目前,我的应用程序每小时大约生成4000万条记录,我已经为每小时创建了一个分区,这样我就可以更容易地在需要时删除该分区,还可以使用该分区来聚合数据。
如果没有发生任何查询,我的聚合过程将运行得很好,但一旦启动这些查询,聚合代码就需要一个多小时才能完成。
在MySQL中,是否有基于对数据库发生的查询而冻结且不影响的进程?
回复@Rick
内存:32 GB
Innodb_Buffer_Pool_Size:20 GB
固态硬盘:是
读取类型:它是
GROUP BY和UPDATE混合覆盖主键
我不想每隔5分钟进行一次聚合,因为这也会生成大量记录,应用程序无法实现,我实际上每小时保存5个分区并运行最旧的分区,我的应用程序至少需要5小时的非聚合数据。
对于我的应用程序,不需要ACID类型的特征,因此将默认隔离更改为Read-Unmitted,并将自动提交更改为0,这提高了聚合代码的运行速度,但插入受到了影响,这需要2秒以上的时间。
此处更新聚合查询的配置文件信息
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| QUERY_ID | SEQ | STATE| DURATION | CPU_USER| CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| 50754 |2 | continuing inside routine | 0.000015 |0.000197 |0.000036 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | NULL| NULL | NULL |
| 50754 |3 | checking permissions| 0.000007 |0.000005 |0.000001 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc |809 |
| 50754 |4 | checking permissions| 0.000006 |0.000006 |0.000000 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | check_access | sql_authorization.cc |809 |
| 50754 |5 | Opening tables| 0.000017 |0.000013 |0.000003 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | open_tables | sql_base.cc | 5815 |
| 50754 |6 | init | 0.000260 |0.000400 |0.000073 | 1 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | handle_query | sql_select.cc |128 |
| 50754 |7 | System lock| 0.000011 |0.000009 |0.000001 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_lock_tables | lock.cc |330 |
| 50754 |8 | optimizing | 0.000115 |0.000098 |0.000017 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc |158 |
| 50754 |9 | statistics | 0.001624 |0.003051 |0.000552 | 3 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc |374 |
| 50754 | 10 | preparing | 0.000158 |0.000134 |0.000024 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | optimize | sql_optimizer.cc |482 |
| 50754 | 11 | Sorting result| 0.000009 |0.000007 |0.000001 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | make_tmp_tables_info | sql_select.cc | 3849 |
| 50754 | 12 | executing | 0.000006 |0.000005 |0.000001 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | exec| sql_executor.cc|126 |
| 50754 | 13 | Sending data | 40.298694 | 144.161765 | 12.297466 |596361 | 261826 | 265128 | 2899384 | 0 | 0 | 0 |328 | 0 | exec| sql_executor.cc|202 |
| 50754 | 14 | end | 0.000031 |0.000024 |0.000005 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | handle_query | sql_select.cc |206 |
| 50754 | 15 | query end | 0.000016 |0.000013 |0.000003 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc| 4959 |
| 50754 | 16 | closing tables| 0.000055 |0.000048 |0.000007 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc| 5018 |
| 50754 | 17 | query end | 0.000007 |0.000005 |0.000002 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc| 4959 |
| 50754 | 18 | closing tables| 0.000012 |0.000009 |0.000002 | 0 | 0 |0 | 0 | 0 | 0 | 0 | 0 | 0 | mysql_execute_command | sql_parse.cc| 5018 |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
聚合代码如下所示,每次占用约100个客户端密钥,每小时可用客户端密钥约为100K
insert into DB.NETWORK_USAGE_FINAL(clientKey,k1,k2,k3,k4,k5,createdAt)
select clientKey, sum(k1) as k1, sum(k2) as k2, sum(k3) as k3 ,
k4, k5 , "',startTime,'" from DB.NETWORK_USAGE_F1 partition (',partitionKey,')
where clientKey in (',selectedClientKey,')
group by clientKey,k4,k5
[mysqld_safe]
socket= /var/run/mysqld/mysqld.sock
nice = 0
[mysqld]
#
# * Basic Settings
#
innodb_buffer_pool_size=20G
innodb_buffer_pool_instances=20
max_connections=100
query_cache_size=0
query_cache_type=0
query_cache_limit=2M
innodb_log_file_size=3G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 2000
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
back_log = 1000
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
innodb_lru_scan_depth=100
table_open_cache=4000
max_allowed_packet=1G
innodb_file_per_table=1
character-set-server = utf8
collation-server = utf8_unicode_ci
event_scheduler = ON
transaction_isolation = READ-COMMITTED
user = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket= /var/run/mysqld/mysqld.sock
port = 3306
basedir = /usr
datadir = /var/lib/mysql
tmpdir= /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking
key_buffer_size = 1G
thread_stack = 128M
thread_cache_size = 8
myisam-recover-options = BACKUP
log_error = /var/log/mysql/error.log
expire_logs_days = 10
max_binlog_size= 100M
显示CREATE TABLE
CREATE TABLE `NETWORK_USAGE_F1` (
`id` char(15) NOT NULL,
`clientKey` int(11) NOT NULL,
`k4` int(11) NOT NULL,
`k5` char(50) NOT NULL,
`createdAt` datetime NOT NULL,
`partitionKey` int(11) NOT NULL,
`k1` bigint(20) NOT NULL,
`k2` bigint(20) NOT NULL,
`k3` int(11) NOT NULL,
PRIMARY KEY (`id`,`partitionKey`),
KEY `key2` (`clientKey`,`k4`,`k5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (partitionKey)
*/
@回复里克更新:
聚合代码一次在100个客户端键(限制)上运行,一个小时内将存在大约100K个唯一的客户端键,但数据库中一个小时/分区的总行数约为4000万行(因此,每个客户端键大约有400行)
使用的ID仅为15个字符长度
目前我有5个分区,分区键格式为YYYYMMMDDHH
不使用MyISAM
推荐答案
让我们看看聚合代码。和SHOW CREATE TABLE
。
可尝试的其他内容:
改为每5分钟汇总一次。
不断地聚合";。也就是说,执行自上次执行聚合以来的所有行。
将行收集到"临时"表中。然后进行聚合,最后将行复制到主表中。以下是有关乒乓球技术的讨论:http://mysql.rjweb.org/doc.php/staging_table
每秒11K行是很多,但也不是不可能。
其他问题:
您有多少内存?
innodb_buffer_pool_size
的设置是什么?
您有固态硬盘吗?
同时进行哪些类型的读取?(从大桌子上看书?是否正在读取汇总表?)
(re my.cnf)
我建议请问这些限制为内存的1%(除非您有确定的较大原因):
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
max_allowed_packet=1G
key_buffer_size = 1G -- for MyISAM only
我希望您没有使用MyISAM。如果不是将其降低到50M。
thread_stack = 128M
糟糕!保留默认设置!
(架构)
使用VARCHAR
,而不是CHAR
,除非字符串确实是固定长度的。
由于您的桌子将很大,请使用实际最小的INT
大小。
有多少个PARTITIONs
?哪个版本的MySQL?(分区太多本身可能会拖累性能。)
partitionkey
设置为什么?
(其他)
是拼写错误,还是我被它的内容搞糊涂了?
此外,这似乎与每小时4,000万条记录冲突。
好了关于MySQL处理海量数据的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。