首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >仅当使用Python 3.3在特定范围内比较两个csv文件后才打印这些值

仅当使用Python 3.3在特定范围内比较两个csv文件后才打印这些值
EN

Stack Overflow用户
提问于 2014-03-18 01:03:20
回答 1查看 91关注 0票数 0

我是编程新手,我有两个CSV文件,我正在尝试比较它们。第一个文件snp.csv如下所示:

代码语言:javascript
复制
chrom   position    ref var gene        var
1       21421       G   T   WASH7P      snp.LOH
1       1251593     T   C   CPSF3L      snp.somatic
6       107474777   -   A   PDSS2       indel.somatic
14      106586168   G   T   ADAM6       snp.LOH

第二个文件quad.csv如下所示:

代码语言:javascript
复制
chrom   Start   End     Sequence
1       21420   21437   GGGACGGGGAGGGTTGGG
1       23058   23078   GGGCTGGGGCGGGGGGAGGG
1       23515   23534   GGGAAGGGACAGGGCAGGG
1       45098   45118   GGGAAAGGGCAGGGCCCGGG
3       1148    1173    GGGCCGGGCAAGGCCGGGTGCAGGG

我想要比较这两个文件,如果两个chrom值匹配,我只想打印那些位置值(在snp.csv文件中)在起始值和结束值(在quad.csv文件中)范围内的文件。因此,我正在寻找一种解决方案,它将给我类似于以下内容(基本上是带有quad.csv文件的开始、结束和序列值的snp.csv文件)

代码语言:javascript
复制
chrom   position    ref var gene    var     Start   End     Sequence
1       21421       G   T   WASH7P  snp.LOH 21420   21437   GGGACGGGGAGGGTTGGG

我搜索了一些帖子,发现了一些有趣的答案,这对我有很大帮助,但我仍然遇到一些问题。我还在学习Python…

这是我到目前为止的脚本,我知道我的range函数有问题...我被卡住了

代码语言:javascript
复制
import csv

snp_file = open("snp.csv", "r")
quad_file = open("quad.csv", "r")
out_file = open("results.csv", "wb")

snp = csv.reader(snp_file, delimiter='\t')
quad = csv.reader(quad_file, delimiter='\t')
out = csv.reader(out_file, delimiter='\t')



quadlist = [row for row in quad]

for snp_row in snp:
    row = 1
    found = False
    for quad_row in quadlist:
        results_row = snp_row
        if snp_row[0] == quad_row[0]:
            quad_pos = range(quad_row[1], quad_row[2])
            if snp_row[1] in quad_pos:
                results_row.append(quad_row)
                found = True
                break
        row = row + 1
    if not found:
        pass
    print (results_row)



snp.close()
quad.close()
out.close()
EN

回答 1

Stack Overflow用户

发布于 2014-03-18 04:13:16

代码语言:javascript
复制
from bisect import bisect_right
from collections import defaultdict
import csv

TOO_HIGH = 2147483647   # higher than any actual gene position
SNP_FMT  = "{0:<7} {1:<11} {2:3} {3:3} {4:11} {5:15}".format
QUAD_FMT = " {1:<7} {2:<7} {3}".format

def line_to_quad(line):
    row = line.split()
    return int(row[0]), int(row[1]), int(row[2]), row[3]

def line_to_snp(line):
    row = line.split()
    return int(row[0]), int(row[1]), row[2], row[3], row[4], row[5]

class Quads:
    @classmethod
    def from_file(cls, fname):
        with open(fname, "rU") as inf:
            next(inf, None)   # skip header line
            quads = (line_to_quad(line) for line in inf)
            return cls(quads)

    def __init__(self, rows):
        self.chromosomes = defaultdict(list)
        for row in rows:
            self.chromosomes[row[0]].append(row[1:])
        for segs in self.chromosomes.values():
            segs.sort()

    def find_match(self, chromosome, position):
        segs = self.chromosomes[chromosome]
        index = bisect_right(segs, (position, TOO_HIGH, "")) - 1
        try:
            seg = segs[index]
            if seg[0] <= position <= seg[1]:
                return (chromosome,) + seg
        except IndexError:
            pass

def main():
    quads = Quads.from_file("quad.csv")

    print(  # header
        SNP_FMT("chrom", "position", "ref", "var", "gene", "var") +
        QUAD_FMT("chrom", "Start", "End", "Sequence")
    )

    with open("snp.csv") as inf:
        next(inf, None)   # skip header line
        for line in inf:
            snp = line_to_snp(line)
            quad = quads.find_match(snp[0], snp[1])
            if quad:
                print(SNP_FMT(*snp) + QUAD_FMT(*quad))

if __name__=="__main__":
    main()

这给了我们

代码语言:javascript
复制
chrom   position    ref var gene        var             Start   End     Sequence
1       21421       G   T   WASH7P      snp.LOH         21420   21437   GGGACGGGGAGGGTTGGG
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/22460648

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档