MySQL处理海量数据

本教程将介绍MySQL处理海量数据的处理方法,这篇教程是从别的地方看到的,然后加了一些国外程序员的疑问与解答,希望能对你有所帮助,好了,下面开始学习吧。

MySQL处理海量数据 教程 第1张

问题描述

目前,我的应用程序每小时大约生成4000万条记录,我已经为每小时创建了一个分区,这样我就可以更容易地在需要时删除该分区,还可以使用该分区来聚合数据。

如果没有发生任何查询,我的聚合过程将运行得很好,但一旦启动这些查询,聚合代码就需要一个多小时才能完成。

在MySQL中,是否有基于对数据库发生的查询而冻结且不影响的进程?

回复@Rick

    内存:32 GB

    Innodb_Buffer_Pool_Size:20 GB

    固态硬盘:是

    读取类型:它是
    GROUP BY和UPDATE混合覆盖主键

我不想每隔5分钟进行一次聚合,因为这也会生成大量记录,应用程序无法实现,我实际上每小时保存5个分区并运行最旧的分区,我的应用程序至少需要5小时的非聚合数据。

对于我的应用程序,不需要ACID类型的特征,因此将默认隔离更改为Read-Unmitted,并将自动提交更改为0,这提高了聚合代码的运行速度,但插入受到了影响,这需要2秒以上的时间。

此处更新聚合查询的配置文件信息

+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| QUERY_ID | SEQ | STATE| DURATION  | CPU_USER| CPU_SYSTEM | CONTEXT_VOLUNTARY | CONTEXT_INVOLUNTARY | BLOCK_OPS_IN | BLOCK_OPS_OUT | MESSAGES_SENT | MESSAGES_RECEIVED | PAGE_FAULTS_MAJOR | PAGE_FAULTS_MINOR | SWAPS | SOURCE_FUNCTION | SOURCE_FILE | SOURCE_LINE |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+
| 50754 |2 | continuing inside routine |  0.000015 |0.000197 |0.000036 |  1 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | NULL| NULL  |  NULL |
| 50754 |3 | checking permissions|  0.000007 |0.000005 |0.000001 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | check_access | sql_authorization.cc |809 |
| 50754 |4 | checking permissions|  0.000006 |0.000006 |0.000000 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | check_access | sql_authorization.cc |809 |
| 50754 |5 | Opening tables|  0.000017 |0.000013 |0.000003 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | open_tables  | sql_base.cc |  5815 |
| 50754 |6 | init |  0.000260 |0.000400 |0.000073 |  1 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | handle_query | sql_select.cc  |128 |
| 50754 |7 | System lock|  0.000011 |0.000009 |0.000001 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | mysql_lock_tables  | lock.cc  |330 |
| 50754 |8 | optimizing |  0.000115 |0.000098 |0.000017 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | optimize  | sql_optimizer.cc  |158 |
| 50754 |9 | statistics |  0.001624 |0.003051 |0.000552 |  3 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | optimize  | sql_optimizer.cc  |374 |
| 50754 |  10 | preparing  |  0.000158 |0.000134 |0.000024 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | optimize  | sql_optimizer.cc  |482 |
| 50754 |  11 | Sorting result|  0.000009 |0.000007 |0.000001 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | make_tmp_tables_info  | sql_select.cc  |  3849 |
| 50754 |  12 | executing  |  0.000006 |0.000005 |0.000001 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | exec| sql_executor.cc|126 |
| 50754 |  13 | Sending data  | 40.298694 | 144.161765 |  12.297466 |596361 |  261826 | 265128 | 2899384 | 0 |  0 |  0 |328 |  0 | exec| sql_executor.cc|202 |
| 50754 |  14 | end  |  0.000031 |0.000024 |0.000005 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | handle_query | sql_select.cc  |206 |
| 50754 |  15 | query end  |  0.000016 |0.000013 |0.000003 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | mysql_execute_command | sql_parse.cc|  4959 |
| 50754 |  16 | closing tables|  0.000055 |0.000048 |0.000007 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | mysql_execute_command | sql_parse.cc|  5018 |
| 50754 |  17 | query end  |  0.000007 |0.000005 |0.000002 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | mysql_execute_command | sql_parse.cc|  4959 |
| 50754 |  18 | closing tables|  0.000012 |0.000009 |0.000002 |  0 | 0 |0 | 0 | 0 |  0 |  0 |  0 |  0 | mysql_execute_command | sql_parse.cc|  5018 |
+----------+-----+---------------------------+-----------+------------+------------+-------------------+---------------------+--------------+---------------+---------------+-------------------+-------------------+-------------------+-------+-----------------------+----------------------+-------------+

聚合代码如下所示,每次占用约100个客户端密钥,每小时可用客户端密钥约为100K

insert into DB.NETWORK_USAGE_FINAL(clientKey,k1,k2,k3,k4,k5,createdAt)
select clientKey, sum(k1) as k1, sum(k2) as k2, sum(k3) as k3 ,
k4,  k5 , "',startTime,'" from DB.NETWORK_USAGE_F1 partition (',partitionKey,') 
where clientKey in (',selectedClientKey,')
group by clientKey,k4,k5

[mysqld_safe]
socket= /var/run/mysqld/mysqld.sock
nice  = 0

[mysqld]
#
# * Basic Settings
#
innodb_buffer_pool_size=20G
innodb_buffer_pool_instances=20
max_connections=100
query_cache_size=0
query_cache_type=0
query_cache_limit=2M
innodb_log_file_size=3G
innodb_read_io_threads = 8
innodb_write_io_threads = 8
innodb_io_capacity = 2000
sql_mode=STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
back_log = 1000
tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
innodb_lru_scan_depth=100
table_open_cache=4000
max_allowed_packet=1G
innodb_file_per_table=1
character-set-server = utf8
collation-server = utf8_unicode_ci
event_scheduler = ON
transaction_isolation = READ-COMMITTED

user  = mysql
pid-file = /var/run/mysqld/mysqld.pid
socket= /var/run/mysqld/mysqld.sock
port  = 3306
basedir  = /usr
datadir  = /var/lib/mysql
tmpdir= /tmp
lc-messages-dir = /usr/share/mysql
skip-external-locking


key_buffer_size  = 1G
thread_stack  = 128M
thread_cache_size = 8

myisam-recover-options  = BACKUP

log_error = /var/log/mysql/error.log

expire_logs_days = 10
max_binlog_size= 100M

显示CREATE TABLE

CREATE TABLE `NETWORK_USAGE_F1` (
  `id` char(15) NOT NULL,
  `clientKey` int(11) NOT NULL,
  `k4` int(11) NOT NULL,
  `k5` char(50) NOT NULL,
  `createdAt` datetime NOT NULL,
  `partitionKey` int(11) NOT NULL,
  `k1` bigint(20) NOT NULL,
  `k2` bigint(20) NOT NULL,
  `k3` int(11) NOT NULL,
  PRIMARY KEY (`id`,`partitionKey`),
  KEY `key2` (`clientKey`,`k4`,`k5`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4
/*!50100 PARTITION BY RANGE (partitionKey)
 */

@回复里克更新:

    聚合代码一次在100个客户端键(限制)上运行,一个小时内将存在大约100K个唯一的客户端键,但数据库中一个小时/分区的总行数约为4000万行(因此,每个客户端键大约有400行)

    使用的ID仅为15个字符长度

    目前我有5个分区,分区键格式为YYYYMMMDDHH

    不使用MyISAM

推荐答案

让我们看看聚合代码。和SHOW CREATE TABLE

可尝试的其他内容:

    改为每5分钟汇总一次。

    不断地聚合"。也就是说,执行自上次执行聚合以来的所有行。

    将行收集到"临时"表中。然后进行聚合,最后将行复制到主表中。以下是有关乒乓球技术的讨论:http://mysql.rjweb.org/doc.php/staging_table

每秒11K行是很多,但也不是不可能。

其他问题:

    您有多少内存?

    innodb_buffer_pool_size的设置是什么?

    您有固态硬盘吗?

    同时进行哪些类型的读取?(从大桌子上看书?是否正在读取汇总表?)

(re my.cnf)

我建议请问这些限制为内存的1%(除非您有确定的较大原因):

tmp_table_size = 1G
max_heap_table_size = 1G
join_buffer_size=1G
sort_buffer_size=512M
max_allowed_packet=1G
key_buffer_size  = 1G -- for MyISAM only

我希望您没有使用MyISAM。如果不是将其降低到50M。

thread_stack  = 128M

糟糕!保留默认设置!

(架构)

    使用VARCHAR,而不是CHAR,除非字符串确实是固定长度的。

    由于您的桌子将很大,请使用实际最小的INT大小。

    有多少个PARTITIONs?哪个版本的MySQL?(分区太多本身可能会拖累性能。)

    partitionkey设置为什么?

(其他)

    是拼写错误,还是我被它的内容搞糊涂了?

    此外,这似乎与每小时4,000万条记录冲突。

好了关于MySQL处理海量数据的教程就到这里就结束了,希望趣模板源码网找到的这篇技术文章能帮助到大家,更多技术教程可以在站内搜索。