文章/答案/技术大牛

发布

社区首页 >问答首页 >如何基于多个索引过滤csv文件中的一行？

问如何基于多个索引过滤csv文件中的一行？
EN

Stack Overflow用户

提问于 2021-02-02 11:39:00

回答 1查看 133关注 0票数 3

我有一个像这样的文件：

#This is TEST-data
2020-09-07T00:00:03.230+02:00,ID-10,3,London,Manchester,London,1,1,1
2020-09-07T00:00:03.230+02:00,ID-10,3,London,London,Manchester,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London,1,1
2020-09-07T00:00:03.230+02:00,ID-20,2,London,London1,1
2020-09-07T00:00:03.230+02:00,ID-30,3,Madrid,Sevila,Sevilla,1,1,1
2020-09-07T00:00:03.230+02:00,ID-30,3,Madrid,Sevilla,Madrid,1
2020-09-07T00:00:03.230+02:00,ID-40,2,Madrid,Barcelona,1,1,1,1

每一行中的Index[2]显示该特定行中有多少城市。因此，第一行的值为index2，即London, Manchester, London.

我想做以下几点：

对于每一行，我需要检查第3行+后面提到的城市(根据城市数量)是否存在于cities_to_filter.中。

这是我目前的代码：

path = r'c:\data\ELK\Desktop\test_data_countries.txt'

cities_to_filter = ['Sevilla', 'Manchester']

def filter_row(row):
    # amount_of_cities = row[2]    
    condition_1 = any(city in row for city in cities_to_filter)
    
    return condition_1

with open (path, 'r') as output_file:
    reader = csv.reader(output_file, delimiter = ',')
    next(reader)
    for row in reader:
        if filter_row(row):
            print(row)

对于这个数据集，我的代码工作得很好，但是它的风险很小，因为它查看每一列，甚至那些我知道的列都不是城市。我需要我的代码只检查列，这些列是基于每一行包含的城市数量的城市。

python

list

indexing

filter

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-02-02 11:52:21

城市“列表”总是以相同的偏移量开始，长度从row[2]中得知。因此，只需将其切片，并使用any()表达式检查cites是否要筛选，或者可以使用set操作，但这可能是过分的：

import csv

path = r'c:\data\ELK\Desktop\test_data_countries.txt'

cities_to_filter = ['Sevilla', 'Manchester']

def filter_row(row):
    count = int(row[2])
    cities = row[3:3+count]
    return any(city in cities for city in cities_to_filter)

with open (path, 'r') as input_file:
    reader = csv.reader(input_file, delimiter = ',')
    next(reader)
    for row in reader:
        if filter_row(row):
            print(row)

另外，在读取文件时将output_file重命名为input_file，而不是写入文件。

输出

['2020-09-07T00:00:03.230+02:00', 'ID-10', '3', 'London', 'Manchester', 'London', '1', '1', '1']
['2020-09-07T00:00:03.230+02:00', 'ID-10', '3', 'London', 'London', 'Manchester', '1', '1']
['2020-09-07T00:00:03.230+02:00', 'ID-30', '3', 'Madrid', 'Sevila', 'Sevilla', '1', '1', '1']
['2020-09-07T00:00:03.230+02:00', 'ID-30', '3', 'Madrid', 'Sevilla', 'Madrid', '1']

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/66008974

复制

相似问题

问如何基于多个索引过滤csv文件中的一行？
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何基于多个索引过滤csv文件中的一行？EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何基于多个索引过滤csv文件中的一行？
EN