首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >python命令dict问题

python命令dict问题
EN

Stack Overflow用户
提问于 2015-05-25 20:26:53
回答 3查看 476关注 0票数 1

如果我有一个CSV文件,它对每一行都有一个字典值(列为"Location“、"MovieDate”、"Formatted_Address“、"Lat”、"Lng"),那么如果我想按Location分组并附加到共享相同Location值的所有MovieDate值,就会被告知使用OrderDict。

数据外:

代码语言:javascript
复制
Location,MovieDate,Formatted_Address,Lat,Lng
    "Edgebrook Park, Chicago ",Jun-7 A League of Their Own,"Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672
    "Edgebrook Park, Chicago ","Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

对于每个具有相同位置的行(如本例中的^),我希望生成这样的输出,这样就不会有重复的位置。

代码语言:javascript
复制
 "Edgebrook Park, Chicago ","Jun-7 A League of Their Own Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

我的代码使用ordereddict做这件事有什么问题?

代码语言:javascript
复制
from collections import OrderedDict

od = OrderedDict()
import csv
with open("MovieDictFormatted.csv") as f,open("MoviesCombined.csv" ,"w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc,rest = row[0], row[1]
        od.setdefault(loc, []).append(rest)
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc]+vals)

我的结局是这样的:

代码语言:javascript
复制
['Edgebrook Park, Chicago ', 'Jun-7 A League of Their Own']
['Gage Park, Chicago ', "Jun-9 It's a Mad, Mad, Mad, Mad World"]
['Jefferson Memorial Park, Chicago ', 'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers ']
['Commercial Club Playground, Chicago ', 'Jun-12 Despicable Me 2']

问题是,在这种情况下,我不会让其他专栏出现,我如何才是最好的呢?我还希望将MovieDate值设置为一个长字符串,如下所示:'Jun-12 Monsters University Jul-11 Frozen Aug-8 The Blues Brothers '而不是:

代码语言:javascript
复制
'Jun-12 Monsters University ', 'Jul-11 Frozen ', 'Aug-8 The Blues Brothers '

谢谢各位,非常感谢。我是蟒蛇。

不幸的是,把row[0], row[1]改成row[0], row[1:]并没有给我想要的。我只想在第二列(MovieDate)中添加值,而不是复制所有其他列:

代码语言:javascript
复制
['Jefferson Memorial Park, Chicago ', ['Jun-12 Monsters University ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Jul-11 Frozen ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353'], ['Aug-8 The Blues Brothers ', 'Jefferson Memorial Park, 4822 North Long Avenue, Chicago, IL 60630, USA', '41.76083920000001', '-87.6294353']]
EN

回答 3

Stack Overflow用户

回答已采纳

发布于 2015-05-25 21:14:09

您只需要进行一些更改,您就需要加入lat和long,以删除dupe lat和long,我们还需要使用它作为关键:

代码语言:javascript
复制
with open("data.csv") as f,open("new.csv" ,"w") as out:
    r = csv.reader(f)
    wr= csv.writer(out)
    header = next(r)
    for row in r:
        od.setdefault((row[0], row[-2], row[-1]), []).append(" ".join(row[1:-2]))
    wr.writerow(header)
    for loc,vals in od.items():
        wr.writerow([loc[0]] + vals+list(loc[1:]))

输出:

代码语言:javascript
复制
Location,MovieDate,Formatted_Address,Lat,Lng
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA","Jun-9 It's a Mad, Mad, Mad, Mad World Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

A League of Their Own是第一个,因为它出现在mad、mad行之前,row[1:-2]得到所有东西(除了lat、long和location ),我们将lat和long存储在我们的键元组中,以避免在每行末尾重复编写它。

使用名称和解压缩可能会使您更容易理解:

代码语言:javascript
复制
with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.reader(f)
    wr = csv.writer(out)
    header = next(r)
    for row in r:
        loc, mov, form, lat, long = row
        od.setdefault((loc, lat, long), []).append("{} {}".format(mov, form))
    wr.writerow(header)
    for loc, vals in od.items():
        wr.writerow([loc[0]] + vals + list(loc[1:]))

使用csv.Dictwriter保留五列:

代码语言:javascript
复制
od = OrderedDict()
import csv

with open("data.csv") as f, open("new.csv", "w") as out:
    r = csv.DictReader(f,fieldnames=['Location', 'MovieDate', 'Formatted_Address', 'Lat', 'Lng'])
    wr = csv.DictWriter(out, fieldnames=r.fieldnames)
    for row in r:
        od.setdefault(row["Location"], dict(Location=row["Location"], Lat=row["Lat"], Lng=row["Lng"],
                                        MovieDate=[], Formatted_Address=row["Formatted_Address"]))

        od[row["Location"]]["MovieDate"].append(row["MovieDate"])
    for loc, vals in od.items():
        od[loc]["MovieDate"]= ", ".join(od[loc]["MovieDate"])
        wr.writerow(vals)

输出:

代码语言:javascript
复制
"Edgebrook Park, Chicago ","Jun-7 A League of Their Own, Jun-9 It's a Mad, Mad, Mad, Mad World","Edgebrook Park, 6525 North Hiawatha Avenue, Chicago, IL 60646, USA",41.9998876,-87.7627672

因此,这五列保持不变,我们将"MovieDate"加入到单个字符串中,而Formatted_Address=form总是唯一的,因此我们不需要更新它。

事实证明,我们所需要做的就是连接MovieDate's并删除位置、Lat、液化天然气和'Formatted_Address'的重复条目。

票数 1
EN

Stack Overflow用户

发布于 2015-05-25 20:41:57

让我们试着改变

代码语言:javascript
复制
od.setdefault(loc, []).append(rest) 

代码语言:javascript
复制
od[loc] = ' '.join([od.get(loc, ''), ' 'join(rest)])

然后保持原样:

代码语言:javascript
复制
wr.writerow([loc]+vals)
票数 0
EN

Stack Overflow用户

发布于 2015-05-25 22:32:23

假设位置是行的第一项:

代码语言:javascript
复制
dict = {}
for line in f:
    if line[0] not in dict:
        dict[line[0]] = []
    dict[line[0]].append(line[1:])

对于每一个位置,您都有整个行的其余部分

代码语言:javascript
复制
for key, value in dict.iteritems():
    out.write(key + value)
票数 -1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/30445574

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档