首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何使用python 3.7.5查找其他CSV文件中不存在的行

如何使用python 3.7.5查找其他CSV文件中不存在的行
EN

Stack Overflow用户
提问于 2022-07-18 12:31:14
回答 2查看 90关注 0票数 0

我有一个文件ua.csv,它有2行,还有一个文件pr.csv,它有4行。我想知道pr.csv和ua.csv中没有的行是什么,需要在输出中有pr.csv中的额外行数。

代码语言:javascript
复制
ua.csv
代码语言:javascript
复制
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134 
Jane Lee|248 Another St.|Boston|US|02130 
代码语言:javascript
复制
pr.csv
代码语言:javascript
复制
Name|Address|City|Country|Pincode
Jim Smith|123 Any Street|Boston|US|02134 
Smoet|coffee shop|finland|Europe|3453335
Jane Lee|248 Another St.|Boston|US|02130 
Jack|long street|malasiya|Asia|585858

预期产出如下:

代码语言:javascript
复制
pr.csv has 2 rows extra

Name|Address|City|Country|Pincode
Smoet|coffee shop|finland|Europe|3453335
Jack|long street|malasiya|Asia|585858
EN

回答 2

Stack Overflow用户

发布于 2022-07-18 12:38:29

我想您可以使用set数据结构:

代码语言:javascript
复制
ua_set = set()
pr_set = set()

# Code to populate the sets reading the csv files (use the `add` method of sets)
...

# Find the difference
diff = pr_set.difference(ua_set)

print(f"pr.csv has {len(diff)} rows extra")

# It would be better to not hardcode the name of the columns in the output 
# but getting the info depends on the package you use to read csv files
print("Name|Address|City|Country|Pincode")  

for row in diff:
    print(row)

使用pandas模块的更好解决方案:

代码语言:javascript
复制
import pandas as pd

df_ua = pd.read_csv("ua.scv") # Must modify path to ua.csv
df_pr = pd.read_csv("pr.csv") # Must modify path to pr.csv

df_diff = df_pr.merge(df_ua, how="outer", indicator=True).loc[lambda x: x["_merge"] == "left_only"].drop("_merge", axis=1)

print(f"pr.csv has {len(df_diff)} rows extra")

print(df_diff)
票数 0
EN

Stack Overflow用户

发布于 2022-07-19 06:47:07

代码语言:javascript
复制
import csv
ua_dic={}
with open('ua.csv') as ua:
  data=csv.reader(ua,delimiter=',')

  for i in data:
    if str(i) not in ua_dic:
        ua_dic[str(i)]=1

output=[]
with open('pr.csv') as pr:
  data=csv.reader(pr,delimiter=',')

  for j in data:
    if str(j) not in ua_dic:
        output.append(j)

  print(output)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/73022341

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档