首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如果具有WHERE条件的列有索引,则不会返回任何内容。

如果具有WHERE条件的列有索引,则不会返回任何内容。
EN

Stack Overflow用户
提问于 2022-05-03 14:19:37
回答 1查看 64关注 0票数 0

Debian11 (Bullseye)首先使用发行版的MariaDB版本10.5,现在使用来自MariaDB存储库的10.6.7版本。

我无法从遗传学数据库(ensembl homo_sapiens_variation_106_37)的转储中获得一些大表的正确索引:[37]

其中一个表是variation_feature:

代码语言:javascript
复制
CREATE TABLE `variation_feature` (
  `variation_feature_id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `seq_region_id` int(10) unsigned NOT NULL,
  `seq_region_start` int(11) NOT NULL,
  `seq_region_end` int(11) NOT NULL,
  `seq_region_strand` tinyint(4) NOT NULL,
  `variation_id` int(10) unsigned NOT NULL,
  `allele_string` varchar(50000) DEFAULT NULL,
  `ancestral_allele` varchar(50) DEFAULT NULL,
  `variation_name` varchar(255) DEFAULT NULL,
  `map_weight` int(11) NOT NULL,
  `flags` set('genotyped') DEFAULT NULL,
  `source_id` int(10) unsigned NOT NULL,
  `consequence_types` set('intergenic_variant','splice_acceptor_variant','splice_donor_variant','stop_lost','coding_sequence_variant','missense_variant','stop_gained','synonymous_variant','frameshift_variant','non_coding_transcript_variant','non_coding_transcript_exon_variant','mature_miRNA_variant','NMD_transcript_variant','5_prime_UTR_variant','3_prime_UTR_variant','incomplete_terminal_codon_variant','intron_variant','splice_region_variant','downstream_gene_variant','upstream_gene_variant','start_lost','stop_retained_variant','inframe_insertion','inframe_deletion','transcript_ablation','transcript_fusion','transcript_amplification','transcript_translocation','TFBS_ablation','TFBS_fusion','TFBS_amplification','TFBS_translocation','regulatory_region_ablation','regulatory_region_fusion','regulatory_region_amplification','regulatory_region_translocation','feature_elongation','feature_truncation','regulatory_region_variant','TF_binding_site_variant','protein_altering_variant','start_retained_variant') NOT NULL DEFAULT 'intergenic_variant',
  `variation_set_id` set('1','2','3','4','5','6','7','8','9','10','11','12','13','14','15','16','17','18','19','20','21','22','23','24','25','26','27','28','29','30','31','32','33','34','35','36','37','38','39','40','41','42','43','44','45','46','47','48','49','50','51','52','53','54','55','56','57','58','59','60','61','62','63','64') NOT NULL DEFAULT '',
  `class_attrib_id` int(10) unsigned DEFAULT '0',
  `somatic` tinyint(1) NOT NULL DEFAULT '0',
  `minor_allele` varchar(50) DEFAULT NULL,
  `minor_allele_freq` float DEFAULT NULL,
  `minor_allele_count` int(10) unsigned DEFAULT NULL,
  `alignment_quality` double DEFAULT NULL,
  `evidence_attribs` set('367','368','369','370','371','372','418','421','573','585') DEFAULT NULL,
  `clinical_significance` set('uncertain significance','not provided','benign','likely benign','likely pathogenic','pathogenic','drug response','histocompatibility','other','confers sensitivity','risk factor','association','protective','affects') DEFAULT NULL,
  `display` int(1) DEFAULT '1',
  PRIMARY KEY (`variation_feature_id`),
  KEY `pos_idx` (`seq_region_id`,`seq_region_start`,`seq_region_end`),
  KEY `variation_idx` (`variation_id`),
  KEY `variation_set_idx` (`variation_set_id`),
  KEY `consequence_type_idx` (`consequence_types`),
  KEY `source_idx` (`source_id`)
) ENGINE=MyISAM AUTO_INCREMENT=743963234 DEFAULT CHARSET=latin1;

它有700 000 000多条记录,在磁盘上占据:

代码语言:javascript
复制
# ls -lh variation_feature.*
-rw-rw---- 1 mysql mysql 56K Mai  3 09:41 variation_feature.frm
-rw-rw---- 1 mysql mysql 55G Mai  2 20:44 variation_feature.MYD
-rw-rw---- 1 mysql mysql 61G Mai  2 22:27 variation_feature.MYI

尽管在导入variation_feature.txt时没有得到任何错误,但一些基本的索引却无法工作。在这种情况下,基于variation_id选择已知的数据行不会返回任何内容。

代码语言:javascript
复制
SELECT * 
FROM variation_feature 
WHERE variation_id = 617544728;

--> nothing

值617544728似乎不在索引中,因为

代码语言:javascript
复制
SELECT variation_id 
FROM variation_feature 
WHERE variation_id = 617544728;

--> nothing

禁用索引并等待长表扫描返回行:

代码语言:javascript
复制
ALTER TABLE variation_feature ALTER INDEX variation_idx IGNORED;

SELECT * 
FROM variation_feature 
WHERE variation_id = 617544728;

variation_feature_id  seq_region_id  seq_region_start  seq_region_end  seq_region_strand  variation_id  allele_string  ancestral_allele  variation_name  map_weight  flags   source_id  consequence_types  variation_set_id                                                                             class_attrib_id  somatic  minor_allele  minor_allele_freq  minor_allele_count  alignment_quality  evidence_attribs                 clinical_significance  display
--------------------  -------------  ----------------  --------------  -----------------  ------------  -------------  ----------------  --------------  ----------  ------  ---------  -----------------  -------------------------------------------------------------------------------------------  ---------------  -------  ------------  -----------------  ------------------  -----------------  -------------------------------  ---------------------  -------
           632092737          27511         230845794       230845794                  1     617544728  A/G            G                 rs699                    1  <null>          1  missense_variant   2,5,6,9,10,11,12,13,15,16,17,23,24,25,26,30,40,42,43,44,45,47,48,49,50,51,52,53,54,55,56,57                2    false  A                        0.2949                1477             <null>  368,370,371,372,418,421,573,585  benign,risk factor           1

myisamchk没有错误地修复索引,但是索引"variation_idx“不能工作。

DROPping和重新创建一个索引运行时没有出错,但是索引不能工作。

其他索引也没问题。

在这个数据库的另一个基因组中(ensembl homo_sapiens_variation_106_38 -稍大一点):[38]也有同样的问题(在另一台计算机上,但程序安装相同)。有一个不同之处:主键(variation_feature_id)也不起作用。

myisamchk也没有错误地运行,但没有效果。

mysqlcheck ( 10.6版本,运行速度比10.5慢)在第一台计算机上返回:

代码语言:javascript
复制
homo_sapiens_variation_106_37.variation_feature
error    : Key in wrong position at page 22405134336
error    : Corrupt

现在,这是我们知道的,但没有任何修复工具能够真正修复或提示,有什么问题。

我已经在CREATEd上建立了一个关于variation_name的索引:它正在工作。

为了适应巨大的数据库和ensembl的mysql版本,我对mariadb.cnf进行了修改:

代码语言:javascript
复制
[mysqld]

## this was some time ago - because some bug mysql or mariadb didn't took this from the system
default-time-zone = Europe/Berlin

## ensembl has mysql version 5.6
## because of table creation scripts I need compatibility:
show_compatibility_56 = ON
performance_schema

## ensembl writes in DATETIME fields: "0000-00-00 00:00:00"
## the new sql_mode in 5.7 doesn't allow it any more
## SHOW VARIABLES LIKE 'sql_mode' ;
## ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION
## so change sql_mode deleting NO_ZERO_IN_DATE,NO_ZERO_DATE
sql_mode = ONLY_FULL_GROUP_BY,STRICT_TRANS_TABLES,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION

datadir = /mnt/SRVDATA/var/lib/mysql
tmpdir  = /mnt/WORK/tmp

[mariadb]

[mariadb-10.6]

## MyISAM for building ensembl homo_sapiens
lower_case_table_names=1

bulk_insert_buffer_size = 1G
myisam_sort_buffer_size = 56G
sort_buffer_size = 56G

谢谢你耐心地阅读这一切。

EN

回答 1

Stack Overflow用户

发布于 2022-05-03 19:53:55

sort_buffer_size应限制在内存的1%。你所拥有的巨大设置可能会导致交换,这对性能来说是很糟糕的。myisam_sort_buffer_size也是如此,尽管它可能与这个问题无关。

如果没有其他原因,您应该考虑从MyISAM迁移到InnoDB,以避免索引损坏。但是,它将使磁盘占用(数据+索引)增加到300 of以上。

InnoDB在并发访问大型表方面要好得多。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/72100831

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档