首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如果存在列表元素,则搜索CSV,然后删除

如果存在列表元素,则搜索CSV,然后删除
EN

Stack Overflow用户
提问于 2018-04-04 11:47:29
回答 2查看 87关注 0票数 2

我对Python很陌生,我试图使用csv.reader导入2个csv文件,然后比较看其中一个元素是否存在于另一行中,如果存在,则删除整个行。

我发现了类似问题的其他问题,这些问题表明列表理解是可行的,但是当我做循环来检查appList是否存在于machine中时,我得到的结果是空括号,如[]。

到目前为止我的代码是:

代码语言:javascript
复制
import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

for app in appList:
     machine = [app for app in machine if app not in machine]
     print(machine)

applist.csv看起来如下(它是macOS标准构建上的应用程序列表)

代码语言:javascript
复制
Adobe Creative Cloud for Enterprise
Adobe Acrobat DC Professional
Adobe Bridge CC
Adobe Extension Manager CC
Adobe Illustrator CC 2015
Adobe InDesign CC 2015
Adobe Photoshop CC 2015
Adobe Media Encoder CC 2015
AirPort Utility 6
App Store
Automator 2
[...]

machine.csv看起来是这样的..。

代码语言:javascript
复制
"Application name";"Metric";"Last used";"Requirement";"Entitlement state";"Remark"
"Adobe Creative Cloud for Enterprise (Mac)";"Installations";"2018-03-28T10:45:00+01:00";"1";"Not covered";""
"Adobe Acrobat DC Professional (Mac)";"Installations";"2018-03-22T17:08:00+00:00";"0";"No requirement";"Installation included in software bundle"
"Adobe Bridge CC (Mac)";"No license required";"2018-03-12T13:45:00+00:00";"";"";"Installation included in software bundle"
"Adobe Extension Manager CC (Mac)";"No license required";"";"";"";"Installation included in software bundle"
"Adobe Illustrator CC 2015 (Mac)";"Installations";"2018-03-12T13:41:00+00:00";"0";"No requirement";"Installation included in software bundle"

更新为添加

我目前的代码:

代码语言:javascript
复制
#!/usr/local/bin/python3

import os
import csv

def csv_reader(machine_dir, machine):
    mach_list = list(csv.reader(open(machine_dir + "/" + machine, encoding="ISO-8859-1"), delimiter=";"))
    return mach_list

def main():
    # Get the paths to the csv files
    csvFile = input("drop the app list csv here: ")
    machine_dir = input("drop the machines csv folder here: ")

    # Import appList csv
    app_list = list(csv.reader(open(csvFile, encoding = "ISO-8859-1")))

    # Get list of machine csv
    machines = os.listdir(machine_dir)

    for machine in machines:
        machine_list = csv_reader(machine_dir, machine)

        new_machine = [app for app in app_list if app not in machine_list]

        print(new_machine)



if __name__ == '__main__': main()

我目前正在一台计算机csv文件上测试它,返回结果不是从app_list中减去machine_list后剩下的结果。

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2018-04-04 11:53:54

你正在使用一个传统的循环,然后做一个列表理解,我不认为这是你需要的。

在您的列表理解中,您正在循环遍历machine中的值,如果这些值不在machine中,则将这些值附加到列表中。所以你的逻辑有点错误。实际上,您需要遍历列表理解中的appList值,并查看它们是否出现在列表machine中。

代码语言:javascript
复制
import csv

appList = csv.reader(open('applist.csv', encoding = "ISO-8859-1"))
appList = list(appList)

machine = csv.reader(open('machine.csv', encoding = "ISO-8859-1"))
machine = list(machine)

new_machine = [app for app in appList if app not in machine]

编辑:

打开文件时,如果检查它们是嵌套列表。一种解决方案可能是平顺列表,然后使用相同的列表理解:

代码语言:javascript
复制
import csv

appList = csv.reader(open('applist.csv'))
appList = list(appList)

machine = csv.reader(open('machine.csv'))
machine = list(machine)

# Flatten both appList and machine
flat_appList = [item for sublist in appList for item in sublist]
flat_machine = [item for sublist in machine for item in sublist]

new_machine = [app for app in flat_machine if app not in flat_appList]

注意:要小心-在示例csv文件中,appList.csv包含例如Adobe Creative Cloud for Enterprise,它是,而不是,与machine.csv Adobe Creative Cloud for Enterprise (Mac)中包含的内容相同

票数 3
EN

Stack Overflow用户

发布于 2018-04-04 12:19:36

或者,您可以使用pandas (https://pandas.pydata.org/pandas-docs/stable/api.html) (假设每个文件中没有重复的行要保留)。

代码语言:javascript
复制
import pandas

app = pandas.read_csv('applist.csv', encoding="ISO-8859-1")
machine = pandas.read_csv('machine.csv', encoding="ISO-8859-1")

# Combine both dataframes into one
dataframe = app.append(machine, ignore_index=True)

# Only keep the first of each set of duplicates
# This should give us the machine list (without any of the lines
# duplicated in the applist) plus the full applist
dataframe.drop_duplicates(keep='first', inplace=True)
# Now add the applist again
dataframe = dataframe.append(app, ingore_index=True)
# Now drop all the duplicates
# (since the applist was added again, this should drop the entire applist)
dataframe.drop_duplicates(keep=False, inplace=True)
dataframe.reset_index(inplace=True)

# Now 'dataframe' should be the machine list without any lines from applist

如果这些文件相对较小,那么使用循环将与使用熊猫的时间大致相同,但如果这些文件大熊猫应该要快得多。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/49650214

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档