我是新来MongoDB的,所以请耐心听我说。我有一个CSV文件example.csv,如下所示:
Sample,Chromosome,Position,Reference,Mutation,ReadDepth
testfile_snp,chr1,69511,A,G,10
testfile_snp,chr1,924024,C,G,12
testfile_snp,chr1,924533,A,G,13
testfile_snp,chr1,942451,T,C,22
testfile_snp,chr1,946247,G,A,44
testfile_snp,chr1,952421,A,G,32
testfile_snp,chr1,953259,T,C,37
testfile_snp,chr1,953279,T,C,23
testfile_snp,chr1,961945,G,C,40
testfile_snp,chr1,966227,C,G,35我有许多文件,每个文件大约有25k行。我想查询MongoDB中的每一行。在我的数据库中,Sample,Chromosome,Position,Reference,Mutation被索引为compound indexes。我试着四处寻找解决方案,我发现唯一相关的东西是下面的thread。我可以使用以下命令将CSV的格式更改为查询:
gawk -i inplace -F',' '{print "db.TestCollection.find({\"Sample\": \"" $1 "\", \"Chromosome\": \"" $2 "\", \"Position\": " $3 ", \"Reference\": \"" $4 "\", \"Mutation\": \"" $5 "\"})"}' example.csv
sed -i "1s/.*/use TestDatabase/" example.csv
mv example.csv example.js它将输出:
use TestDatabase
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 69511, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924024, "Reference": "C", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924533, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 942451, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 946247, "Reference": "G", "Mutation": "A"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 952421, "Reference": "A", "Mutation": "G"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 953259, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 953279, "Reference": "T", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 961945, "Reference": "G", "Mutation": "C"})
db.TestCollection.find({"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 966227, "Reference": "C", "Mutation": "G"})然后,我可以使用此文件将其提供给MongoDB:
mongo < example.js目前,这是我到目前为止查询每一行的方式。但是,我发现了另一个可以使用IN操作符执行批量查询的thread。问题是它在给定的所有字段中都表现为OR:
use TestDatabase
db.TestCollection({"Sample": { $in : ["testfile_snp", "sv37213_hg38"] }, "Chromosome": "chr1", "Position": { $in : [69270,182585422]}, "Reference" : {$in : ["A", "C"]}, "Mutation" : {$in : ["G", "T"]} } )会给出:
MongoDB shell version v4.0.8
connecting to: mongodb://127.0.0.1:27017/?gssapiServiceName=mongodb
Implicit session: session { "id" : UUID("fb07f25a-3a4f-4c32-bd4e-70f3c3129435") }
MongoDB server version: 4.0.8
switched to db TestDatabase
{ "_id" : ObjectId("5ca47c1e0953f323b3b9cac5"), "Sample" : "sv37213_hg38", "Chromosome" : "chr1", "Position" : 69270, "Reference" : "A", "Mutation" : "G", "ReadDepth" : 19 }
{ "_id" : ObjectId("5ca47c1e0953f323b3b9e10f"), "Sample" : "sv37213_hg38", "Chromosome" : "chr1", "Position" : 182585422, "Reference" : "C", "Mutation" : "T", "ReadDepth" : 66 }
{ "_id" : ObjectId("5ca47bca0953f323b39019b1"), "Sample" : "test-exome-1_hg38", "Chromosome" : "chr1", "Position" : 69270, "Reference" : "A", "Mutation" : "G", "ReadDepth" : 17 }
bye如您所见,此查询为sv37213_hg38返回2个文档,这不是我所希望的。我只想打印位置182585422。
在mongo中有没有什么功能可以批量查询我的文件的全部内容,或者我必须对每一行都这样做?
发布于 2019-04-04 00:09:49
您可以使用$or,而不是使用$in,只需将最初执行的原始查询逐个放入。
$or: [
{"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924024, "Reference": "C", "Mutation": "G"}
{"Sample": "testfile_snp", "Chromosome": "chr1", "Position": 924533, "Reference": "A", "Mutation": "G"}
]https://stackoverflow.com/questions/55497069
复制相似问题