文章/答案/技术大牛

发布

社区首页 >问答首页 >用Python从多个文件中求和和平均值

问用Python从多个文件中求和和平均值
EN

Stack Overflow用户

提问于 2014-03-25 18:33:40

回答 1查看 232关注 0票数 0

我有一个关于Python输出的问题。

我有以下3个文件作为输入数据：

文件A

abc with-1-rosette-n    2
abc with-1-tyre-n   1
abc with-1-weight-n 2

档案B

def with-1-rosette-n 1
def with-1-tyre-n   2
def about-bit-n 1

文件C

ghi with-1-rosette-n  2
ghi as+n-produce-v   1
ghi then-damage-v  1

我首先尝试创建一个脚本，其中我将考虑Col 2交集的值之和(Col 3)。

这很好--正确地输出所有行。

我试图修改脚本，以考虑Col 2交集的Col 3值的平均值，这就是我遇到麻烦的地方。

基本上，脚本不输出交叉口的行。

脚本A

def sumVectors(classA_infile, classB_infile, outfile):

        class_dictA = {}

        with open(classA_infile, "rb") as opened_infile_A:
                for line in opened_infile_A:
                        items = line.split()
                        classA, feat, valuesA = items[:3]
                        class_dictA[feat] = float(valuesA)


        class_dictB = {}

        with open(classB_infile, "rb") as opened_infile_B:
                for line in opened_infile_B:
                        items = line.split()
                        classB, feat, valuesB = items[:3]
                        class_dictB[feat] = float(valuesB)

        with open(outfile, "wb") as output_file:
                for key in class_dictA:
                        if key in class_dictB:
                                weight = (class_dictA[key] + class_dictB[key])/2
                                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
                                print outstring
                        else:
                                weight = class_dictA[key]
                                outstring = "\t".join([classA + "-" +  classB, key, str(weight)])
                output_file.write(outstring + "\n")

                for key in class_dictB:
                        if key not in class_dictA:
                                weight = class_dictB[key]
                                outstring = "\t".join([classA + "-" + classB, key, str(weight)])
                                output_file.write(outstring + "\n")

当我试图合并第三个文件时:我遇到了一个关键问题。在这里，我试图看看文件C中的一个键是否也在文件A和B中，如果是的话，我们取这三个文件的平均值。在这种情况下，它给了我一个关键错误，就在它进入第一个if块时，所以我很难解决这个问题。

下面是考虑3个文件的脚本示例。

脚本B

def sumVectors(classA_infile, classB_infile, classC_infile, outfile):

        class_dictA = {}

        with open(classA_infile, "rb") as opened_infile_A:
                for line in opened_infile_A:
                        items = line.split()
                        classA, feat, valuesA = items[:3]
                        class_dictA[feat] = float(valuesA)


        class_dictB = {}

        with open(classB_infile, "rb") as opened_infile_B:
                for line in opened_infile_B:
                        items = line.split()
                        classB, feat, valuesB = items[:3]
                        class_dictB[feat] = float(valuesB)

        class_dictC = {}

        with open(classC_infile, "rb") as opened_infile_C:
                for line in opened_infile_C:
                        items = line.split()
                        classC, feat, valuesC = items[:3]
                        class_dictC[feat] = float(valuesC)

        with open(outfile, "wb") as output_file:
                for key in class_dictC:
                        if key in class_dictA and class_dictB:
                                weight = (class_dictA[key] + class_dictB[key]+ class_dictC[key])/3
                                outstring = "\t".join([classA + "-" +  classB + "-" +  classC, key, str(weight)])
                                print outstring
                        else:
                                weight = class_dictC[key]
                                outstring = "\t".join([classA + "-" +  classB + "-" +  classC,  key, str(weight)])
                                output_file.write(outstring + "\n")

对于脚本A，需要的输出是：

(其中我们考虑Col 2中共同要素的平均值)：

abc-def with-1-rosette-n    1.5
abc-def with-1-tyre-n   1
abc-def with-1-weight-n 2
def with-1-tyre-n   2
def about-bit-n 1

对于脚本B，所需的输出是：

文件B(其中我们考虑了Col 2中所有3个文件的公共元素的平均值)：

abc-def-ghi with-1-rosette-n    1.667
abc-def-ghi with-1-tyre-n   1.5
abc-def-ghi with-1-weight-n 2
abc-def-ghi with-1-rosette-n 1.5
abc-def-ghi about-bit-n 1
abc-def-ghi as+n-produce-v   1
abc-def-ghi then-damage-v  1

有人能帮我看出我哪里出了问题吗?我不确定解决问题的最佳途径是什么.谢谢。

mean

python

add

回答 1

Stack Overflow用户

回答已采纳

发布于 2014-03-25 20:07:58

from collections import defaultdict

# Because you are looking for a union of files, we can treat
#  the input data as a simple concatenation of all input files;
# If you were after intersection, we would have to deal with
#  each input file separately.
def chain_from_files(*filenames):
    for fname in filenames:
        with open(fname, "rb") as inf:
            for line in inf:
                yield line

# get the key and all related data for each line
def get_item(line):
    row = line.split()
    return row[1], (row[0], int(row[2]))    # <= returns a tuple ('abc', 2)

# iterate through the input,
# collect a list of related values for each key
def collect_items(lines, get_item):
    result = defaultdict(list)
    for line in lines:
        key, value = get_item(line)
        result[key].append(value)
    return result

# make an output-string for each key
# and its list of related values
def show_item(key, values):
    classes, nums = zip(*values)          # <= unpacks the tuples
    classes = '-'.join(sorted(set(classes)))
    average = float(sum(nums)) / len(nums)
    return "{} {} {}\n".format(classes, key, average)

def main():
    lines = chain_from_files(classA_infile, classB_infile, classC_infile)
    data  = collect_items(lines, get_item)

    with open(outputfile, "wb") as outf:
        for key,value in data.items():
            outf.write(show_item(key, value))

if __name__=="__main__":
    main()

作为输出

ghi then-damage-v 1.0
abc-def with-1-tyre-n 1.5
abc-def-ghi with-1-rosette-n 1.66666666667
ghi as+n-produce-v 1.0
abc with-1-weight-n 2.0
def about-bit-n 1.0

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/22643389

复制

相似问题

问用Python从多个文件中求和和平均值
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python从多个文件中求和和平均值EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用Python从多个文件中求和和平均值
EN