我有一个关于Python输出的问题。
我有以下3个文件作为输入数据:
文件A
abc with-1-rosette-n 2
abc with-1-tyre-n 1
abc with-1-weight-n 2档案B
def with-1-rosette-n 1
def with-1-tyre-n 2
def about-bit-n 1文件C
ghi with-1-rosette-n 2
ghi as+n-produce-v 1
ghi then-damage-v 1我首先尝试创建一个脚本,其中我将考虑Col 2交集的值之和(Col 3)。
这很好--正确地输出所有行。
我试图修改脚本,以考虑Col 2交集的Col 3值的平均值,这就是我遇到麻烦的地方。
基本上,脚本不输出交叉口的行。
脚本A
def sumVectors(classA_infile, classB_infile, outfile):
class_dictA = {}
with open(classA_infile, "rb") as opened_infile_A:
for line in opened_infile_A:
items = line.split()
classA, feat, valuesA = items[:3]
class_dictA[feat] = float(valuesA)
class_dictB = {}
with open(classB_infile, "rb") as opened_infile_B:
for line in opened_infile_B:
items = line.split()
classB, feat, valuesB = items[:3]
class_dictB[feat] = float(valuesB)
with open(outfile, "wb") as output_file:
for key in class_dictA:
if key in class_dictB:
weight = (class_dictA[key] + class_dictB[key])/2
outstring = "\t".join([classA + "-" + classB, key, str(weight)])
print outstring
else:
weight = class_dictA[key]
outstring = "\t".join([classA + "-" + classB, key, str(weight)])
output_file.write(outstring + "\n")
for key in class_dictB:
if key not in class_dictA:
weight = class_dictB[key]
outstring = "\t".join([classA + "-" + classB, key, str(weight)])
output_file.write(outstring + "\n")当我试图合并第三个文件时:我遇到了一个关键问题。在这里,我试图看看文件C中的一个键是否也在文件A和B中,如果是的话,我们取这三个文件的平均值。在这种情况下,它给了我一个关键错误,就在它进入第一个if块时,所以我很难解决这个问题。
下面是考虑3个文件的脚本示例。
脚本B
def sumVectors(classA_infile, classB_infile, classC_infile, outfile):
class_dictA = {}
with open(classA_infile, "rb") as opened_infile_A:
for line in opened_infile_A:
items = line.split()
classA, feat, valuesA = items[:3]
class_dictA[feat] = float(valuesA)
class_dictB = {}
with open(classB_infile, "rb") as opened_infile_B:
for line in opened_infile_B:
items = line.split()
classB, feat, valuesB = items[:3]
class_dictB[feat] = float(valuesB)
class_dictC = {}
with open(classC_infile, "rb") as opened_infile_C:
for line in opened_infile_C:
items = line.split()
classC, feat, valuesC = items[:3]
class_dictC[feat] = float(valuesC)
with open(outfile, "wb") as output_file:
for key in class_dictC:
if key in class_dictA and class_dictB:
weight = (class_dictA[key] + class_dictB[key]+ class_dictC[key])/3
outstring = "\t".join([classA + "-" + classB + "-" + classC, key, str(weight)])
print outstring
else:
weight = class_dictC[key]
outstring = "\t".join([classA + "-" + classB + "-" + classC, key, str(weight)])
output_file.write(outstring + "\n")对于脚本A,需要的输出是:
(其中我们考虑Col 2中共同要素的平均值):
abc-def with-1-rosette-n 1.5
abc-def with-1-tyre-n 1
abc-def with-1-weight-n 2
def with-1-tyre-n 2
def about-bit-n 1对于脚本B,所需的输出是:
文件B(其中我们考虑了Col 2中所有3个文件的公共元素的平均值):
abc-def-ghi with-1-rosette-n 1.667
abc-def-ghi with-1-tyre-n 1.5
abc-def-ghi with-1-weight-n 2
abc-def-ghi with-1-rosette-n 1.5
abc-def-ghi about-bit-n 1
abc-def-ghi as+n-produce-v 1
abc-def-ghi then-damage-v 1有人能帮我看出我哪里出了问题吗?我不确定解决问题的最佳途径是什么.谢谢。
发布于 2014-03-25 20:07:58
from collections import defaultdict
# Because you are looking for a union of files, we can treat
# the input data as a simple concatenation of all input files;
# If you were after intersection, we would have to deal with
# each input file separately.
def chain_from_files(*filenames):
for fname in filenames:
with open(fname, "rb") as inf:
for line in inf:
yield line
# get the key and all related data for each line
def get_item(line):
row = line.split()
return row[1], (row[0], int(row[2])) # <= returns a tuple ('abc', 2)
# iterate through the input,
# collect a list of related values for each key
def collect_items(lines, get_item):
result = defaultdict(list)
for line in lines:
key, value = get_item(line)
result[key].append(value)
return result
# make an output-string for each key
# and its list of related values
def show_item(key, values):
classes, nums = zip(*values) # <= unpacks the tuples
classes = '-'.join(sorted(set(classes)))
average = float(sum(nums)) / len(nums)
return "{} {} {}\n".format(classes, key, average)
def main():
lines = chain_from_files(classA_infile, classB_infile, classC_infile)
data = collect_items(lines, get_item)
with open(outputfile, "wb") as outf:
for key,value in data.items():
outf.write(show_item(key, value))
if __name__=="__main__":
main()作为输出
ghi then-damage-v 1.0
abc-def with-1-tyre-n 1.5
abc-def-ghi with-1-rosette-n 1.66666666667
ghi as+n-produce-v 1.0
abc with-1-weight-n 2.0
def about-bit-n 1.0https://stackoverflow.com/questions/22643389
复制相似问题