Debian11 (Bullseye)首先使用发行版的MariaDB版本10.5,现在使用来自MariaDB存储库的10.6.7版本。
我无法从遗传学数据库(ensembl homo_sapiens_variation_106_37)的转储中获得一些大表的正确索引:[37]
其中一个表是variation_feature:
CREATE TABLE `variation_feature` (
`variation_feature_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`seq_region_id` int(10) unsigned NOT NULL,
`seq_region_start` int(11) NOT NULL,
`seq_region_end` int(11) NOT NULL,
`seq_region_strand` tinyint(4) NOT NULL,
`variation_id` int(10) unsigned NOT NULL,
`allele_string` varchar(50000) DEFAULT NULL,
`ancestral_allele` varchar(50) DEFAULT NULL,
`variation_name` varchar(255) DEFAULT NULL,
`map_weight` int(11) NOT NULL,
`flags` set('genotyped') DEFAULT NULL,
`source_id` int(10) unsigned NOT NULL,
`consequence_types` set('intergenic_variant','splice_acceptor_variant','splice_donor_variant','stop_lost','coding_sequence_variant','missense_variant','stop_gained','synonymous_variant','frameshift_variant','non_coding_transcript_variant','non_coding_transcript_exon_variant','mature_miRNA_variant','NMD_transcript_variant','5_prime_UTR_variant','3_prime_UTR_variant','incomplete_terminal_codon_variant','intron_variant','splice_region_variant','downstream_gene_variant','upstream_gene_variant','start_lost','stop_retained_variant','inframe_insertion','inframe_deletion','transcript_ablation','transcript_fusion','transcript_amplification','transcript_translocation','TFBS_ablation','TFBS_fusion','TFBS_amplification','TFBS_translocation','regulatory_region_ablation','regulatory_region_fusion','regulatory_region_amplification','regulatory_region_translocation','feature_elongation','feature_truncation','regulatory_region_variant','TF_binding_site_variant','protein_altering_variant','start_retained_variant') NOT NULL DEFAULT 'intergenic_variant',
`variation_set_id` set('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64') NOT NULL DEFAULT '',
`class_attrib_id` int(10) unsigned DEFAULT '0',
`somatic` tinyint(1) NOT NULL DEFAULT '0',
`minor_allele` varchar(50) DEFAULT NULL,
`minor_allele_freq` float DEFAULT NULL,
`minor_allele_count` int(10) unsigned DEFAULT NULL,
`alignment_quality` double DEFAULT NULL,
`evidence_attribs` set('367','368','369','370','371','372','418','421','573','585') DEFAULT NULL,
`clinical_significance` set('uncertain significance','not provided','benign','likely benign','likely pathogenic','pathogenic','drug response','histocompatibility','other','confers sensitivity','risk factor','association','protective','affects') DEFAULT NULL,
`display` int(1) DEFAULT '1',
PRIMARY KEY (`variation_feature_id`),
KEY `pos_idx` (`seq_region_id`,`seq_region_start`,`seq_region_end`),
KEY `variation_idx` (`variation_id`),
KEY `variation_set_idx` (`variation_set_id`),
KEY `consequence_type_idx` (`consequence_types`),
KEY `source_idx` (`source_id`)
) ENGINE=MyISAM AUTO_INCREMENT=743963234 DEFAULT CHARSET=latin1;它有700 000 000多条记录,在磁盘上占据:
# ls -lh variation_feature.*
-rw-rw---- 1 mysql mysql 56K Mai 3 09:41 variation_feature.frm
-rw-rw---- 1 mysql mysql 55G Mai 2 20:44 variation_feature.MYD
-rw-rw---- 1 mysql mysql 61G Mai 2 22:27 variation_feature.MYI尽管在导入variation_feature.txt时没有得到任何错误,但一些基本的索引却无法工作。在这种情况下,基于variation_id选择已知的数据行不会返回任何内容。
SELECT *
FROM variation_feature
WHERE variation_id = 617544728;
--> nothing值617544728似乎不在索引中,因为
SELECT variation_id
FROM variation_feature
WHERE variation_id = 617544728;
--> nothing禁用索引并等待长表扫描返回行:
ALTER TABLE variation_feature ALTER INDEX variation_idx IGNORED;
SELECT *
FROM variation_feature
WHERE variation_id = 617544728;
variation_feature_id seq_region_id seq_region_start seq_region_end seq_region_strand variation_id allele_string ancestral_allele variation_name map_weight flags source_id consequence_types variation_set_id class_attrib_id somatic minor_allele minor_allele_freq minor_allele_count alignment_quality evidence_attribs clinical_significance display
-------------------- ------------- ---------------- -------------- ----------------- ------------ ------------- ---------------- -------------- ---------- ------ --------- ----------------- ------------------------------------------------------------------------------------------- --------------- ------- ------------ ----------------- ------------------ ----------------- ------------------------------- --------------------- -------
632092737 27511 230845794 230845794 1 617544728 A/G G rs699 1 <null> 1 missense_variant 2,5,6,9,10,11,12,13,15,16,17,23,24,25,26,30,40,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57 2 false A 0.2949 1477 <null> 368,370,371,372,418,421,573,585 benign,risk factor 1myisamchk没有错误地修复索引,但是索引"variation_idx“不能工作。
DROPping和重新创建一个索引运行时没有出错,但是索引不能工作。
其他索引也没问题。
在这个数据库的另一个基因组中(ensembl homo_sapiens_variation_106_38 -稍大一点):[38]也有同样的问题(在另一台计算机上,但程序安装相同)。有一个不同之处:主键(variation_feature_id)也不起作用。
myisamchk也没有错误地运行,但没有效果。
mysqlcheck ( 10.6版本,运行速度比10.5慢)在第一台计算机上返回:
homo_sapiens_variation_106_37.variation_feature
error : Key in wrong position at page 22405134336
error : Corrupt现在,这是我们知道的,但没有任何修复工具能够真正修复或提示,有什么问题。
我已经在CREATEd上建立了一个关于variation_name的索引:它正在工作。
为了适应巨大的数据库和ensembl的mysql版本,我对mariadb.cnf进行了修改:
[mysqld]
## this was some time ago - because some bug mysql or mariadb didn't took this from the system
default-time-zone = Europe/Berlin
## ensembl has mysql version 5.6
## because of table creation scripts I need compatibility:
show_compatibility_56 = ON
performance_schema
## ensembl writes in DATETIME fields: "0000-00-00 00:00:00"
## the new sql_mode in 5.7 doesn't allow it any more
## SHOW VARIABLES LIKE 'sql_mode' ;
## ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
## so change sql_mode deleting NO_ZERO_IN_DATE,NO_ZERO_DATE
sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
datadir = /mnt/SRVDATA/var/lib/mysql
tmpdir = /mnt/WORK/tmp
[mariadb]
[mariadb-10.6]
## MyISAM for building ensembl homo_sapiens
lower_case_table_names=1
bulk_insert_buffer_size = 1G
myisam_sort_buffer_size = 56G
sort_buffer_size = 56G谢谢你耐心地阅读这一切。
发布于 2022-05-03 19:53:55
sort_buffer_size应限制在内存的1%。你所拥有的巨大设置可能会导致交换,这对性能来说是很糟糕的。myisam_sort_buffer_size也是如此,尽管它可能与这个问题无关。
如果没有其他原因,您应该考虑从MyISAM迁移到InnoDB,以避免索引损坏。但是,它将使磁盘占用(数据+索引)增加到300 of以上。
InnoDB在并发访问大型表方面要好得多。
https://stackoverflow.com/questions/72100831
复制相似问题