我需要创建一个程序,输出的行的列2的== 'Kashiwa“的值。在csv格式的行是通过标准输入提供的。我还需要删除",”‘“,换行符和其他特殊字符,如果他们包括在”名称“列的值。以下是输入的示例:
2
Kashiwa
Name,Campus,LabName
Shin MORISHIA,Kashiwa,Laboratory of Omics
Kioshi ASAy,Kashiwa,Laboratory of Genome Informatics
Yukihido Tomari,Yayoi,Laboratory of RNA Function
Masao Kanobe ,Kashiwa,Laboratory of Large-Scale Bioinformatics下面是我的代码:
#!usr/bin/env python3
import sys
import csv
data = sys.stdin.readlines()
chars = ('$','%','^','*', '\n', '"', "," )
for line in data:
for c in chars:
line = ''.join(line.split(c))
reader = csv.reader(data)
next(reader)
next(reader)
print(",".join(next(reader)))
for row in reader:
if row[1] == 'Kashiwa':
print(",".join(row))我的程序似乎没有从列名的值中删除特殊字符。我该怎么做呢?
发布于 2019-05-21 22:52:02
在data = sys.stdin.readlines()之后,data是一个字符串列表。
你是这样处理它的:
for line in data: # ok line is a variable pointing to a string from data
for c in chars: # ok you process all of your special characters
line = ''.join(line.split(c)) # line is now a brand new clean string...
# that you forget at once without changing data!无论如何,Python字符串是一个不可变的对象,所以您必须更改列表以包含新行:
for i, line in enumerate(data): # ok line is a variable pointing to a string from data
for c in chars: # ok you process all of your special characters
line = ''.join(line.split(c)) # line is now a brand new clean string...
data[i] = line # and data uses this new line但如果您只想清理第一列,则不需要将所有内容都加载到内存中:
#!usr/bin/env python3
import sys
import csv
next(sys.stdin)
next(sys.stdin)
print(next(sys.stdin))
reader = csv.reader(sys.stdin)
chars = ('$','%','^','*', '\n', '"', "," )
for row in reader:
line = row[0]
for c in chars:
line = ''.join(line.split(c))
row[0] = line
if row[1] == 'Kashiwa':
print(",".join(row))发布于 2019-05-21 23:06:27
得到的印象是,你正在看一些东大的页面。这就是我得到的。我将csv文件与您给我们的数据放在一起,使其更易于阅读。
import pandas
chars = ['$','%','^','*', '\n', '"', "," ]
dataframe = pandas.read_csv("data.csv")
dataframe = dataframe[dataframe.Campus == 'Kashiwa']
for c in chars:
dataframe["Name"] = dataframe["Name"].str.replace(c, '')
print(dataframe)我在这里使用的是pandas,它在快速读取csvs时是最好的,并且当你检查chars表中是否存在字符时,它有方便的方法来更改所有行。在第三行中,您可以看到,如果实验室不在岸和校区,也可以很容易地删除所有行。我试过了,它起作用了。希望这能有所帮助!
csv文件如下所示:
Name,Campus,LabName
Shi$n MORISHIA,Kashiwa,Laboratory of Omics
Kio%s$hi ASAy,Kashiwa,Laboratory of Genome Informatics
Yuki%hi**do Tomari,Kashiwa,Laboratory of RNA Function
Masao Kanobe ,Kashiwa,Laboratory of Large-Scale Bioinformatics下面是输出:
Name Campus LabName
0 Shin MORISHIA Kashiwa Laboratory of Omics
1 Kioshi ASAy Kashiwa Laboratory of Genome Informatics
2 Yukihido Tomari Kashiwa Laboratory of RNA Function
3 Masao Kanobe Kashiwa Laboratory of Large-Scale Bioinformaticshttps://stackoverflow.com/questions/56240251
复制相似问题