我有这个csv文件,只有两个条目。下面是:
Meat One,['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']第一个是标题,第二个是商业标题。
问题在于第二项。
这是我的代码:
import csv
with open('phonebookCOMPK-Directory.csv', "rt") as textfile:
reader = csv.reader(textfile)
for row in reader:
row5 = row[5].replace("[", "").replace("]", "")
listt = [(''.join(row5))]
print (listt[0])它打印:
'Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers'我需要做的是创建一个包含这些单词的列表,然后像这样打印它们,使用for循环分别打印每一项:
Abattoirs
Exporters
Food Delivery
Butchers Retail
Meat Dealers-Retail
Meat Freezer
Meat Packers实际上,我正在尝试重新格式化我当前的csv文件并清理它,这样它就可以更加精确和易懂了。
csv完整的第一行如下:
Meat One,+92-21-111163281,Al Shaheer Corporation,Retailers,2008,"['Abattoirs', 'Exporters', 'Food Delivery', 'Butchers Retail', 'Meat Dealers-Retail', 'Meat Freezer', 'Meat Packers']","[[' Outlets Address : Shop No. Z-10, Station Shopping Complex, MES Market, Malir-Cantt, Karachi. Landmarks : MES Market, Station Shopping Complex City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi. Landmarks : Nadra Chowrangi, Sky Garden, Tipu Sultan Road City : Karachi UAN : +92-21-111163281 '], ["" Outlets Address : Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi. Landmarks : Boat Basin, Jans Broast, Khayaban-e-Roomi City : Karachi UAN : +92-21-111163281 View Map ""], [' Outlets Address : Gulistan-e-Johar, Karachi. Landmarks : Perfume Chowk City : Karachi UAN : +92-21-111163281 '], [' Outlets Address : Tee Emm Mart, Creek Vista Appartments, Khayaban-e-Shaheen, Phase VIII, DHA, Karachi. Landmarks : Creek Vista Appartments, Nueplex Cinema, Tee Emm Mart, The Place City : Karachi Mobile : 0302-8333666 '], [' Outlets Address : Y-Block, DHA, Lahore. Landmarks : Y-Block City : Lahore UAN : +92-42-111163281 '], [' Outlets Address : Adj. PSO, Main Bhittai Road, Jinnah Supermarket, F-7 Markaz, Islamabad. Landmarks : Bhittai Road, Jinnah Super Market, PSO Petrol Pump City : Islamabad UAN : +92-51-111163281 ']]","Agriculture, fishing & Forestry > Farming equipment & services > Abattoirs in Pakistan"
First column is Name
Second column is Number
Third column is Owner
Forth column is Business type
Fifth column is Y.O.E
Sixth column is Business Headings
Seventh column is Outlets (List of lists containing every branch address)
Eighth column is classification没有使用csv.reader的限制,我愿意接受任何可用的技术来清理我的文件。
发布于 2016-06-04 10:15:20
从两个不同的任务来考虑它:
处理脏CSV
一种方法是使用函数deserialize_business()从CSV中的每个传入行中提取结构化的业务信息。这个函数可能很复杂,因为这是任务的本质,但仍然建议将其拆分为包含自包含的更小的函数(如get_outlets()、get_headings()等)。这个函数可以返回一个字典,但取决于你想要什么,它可以是一个命名元组,一个自定义对象,等等。
这个函数将是这个特定CSV数据源的“适配器”。
反序列化函数示例:
def deserialize_business(csv_line):
"""
Distills structured business information from given raw CSV line.
Returns a dictionary like {name, phone, owner,
btype, yoe, headings[], outlets[], category}.
"""
pieces = [piece.strip("[[\"\']] ") for piece in line.strip().split(',')]
name = pieces[0]
phone = pieces[1]
owner = pieces[2]
btype = pieces[3]
yoe = pieces[4]
# after yoe headings begin, until substring Outlets Address
headings = pieces[4:pieces.index("Outlets Address")]
# outlets go from substring Outlets Address until category
outlet_pieces = pieces[pieces.index("Outlets Address"):-1]
# combine each individual outlet information into a string
# and let ``deserialize_outlet()`` deal with that
raw_outlets = ', '.join(outlet_pieces).split("Outlets Address")
outlets = [deserialize_outlet(outlet) for outlet in raw_outlets]
# category is the last piece
category = pieces[-1]
return {
'name': name,
'phone': phone,
'owner': owner,
'btype': btype,
'yoe': yoe,
'headings': headings,
'outlets': outlets,
'category': category,
}调用它的例子:
with open("phonebookCOMPK-Directory.csv") as f:
lineno = 0
for line in f:
lineno += 1
try:
business = deserialize_business(line)
except:
# Bad line formatting?
log.exception(u"Failed to deserialize line #%s!", lineno)
else:
# All is well
store_business(business)存储数据
您将让store_business()函数使用您的数据结构并将其写入某个地方。它可能是另一个结构更好的CSV,可能是多个CSV,一个JSON文件,或者您可以使用SQLite关系数据库工具,因为它已经内置了。
这完全取决于你以后想做什么。
关系示例
在这种情况下,您的数据将被分割为多个表。(我使用的是“table”一词,但它可以是一个CSV文件,尽管您也可以使用SQLite DB,因为Python有内置的。)
列出所有可能的业务标题的表格:
business heading ID, name
1, Abattoirs
2, Exporters
3, Food Delivery
4, Butchers Retail
5, Meat Dealers-Retail
6, Meat Freezer
7, Meat Packers列出所有可能类别的表格:
category ID, parent category, name
1, NULL, "Agriculture, fishing & Forestry"
2, 1, "Farming equipment & services"
3, 2, "Abattoirs in Pakistan"确定企业的表格:
business ID, name, phone, owner, type, yoe, category
1, Meat One, +92-21-111163281, Al Shaheer Corporation, Retailers, 2008, 3表中描述了它们的出口:
business ID, city, address, landmarks, phone
1, Karachi UAN, "Shop 13, Ground Floor, Plot 14-D, Sky Garden, Main Tipu Sultan Road, KDA Scheme No.1, Karachi", "Nadra Chowrangi, Sky Garden, Tipu Sultan Road", +92-21-111163281
1, Karachi UAN, "Near Jan's Broast, Boat Basin, Khayaban-e-Roomi, Block 5, Clifton, Karachi", "Boat Basin, Jans Broast, Khayaban-e-Roomi", +92-21-111163281描述其标题的表格:
business ID, business heading ID
1, 1
1, 2
1, 3
…处理所有这些都需要一个复杂的store_business()函数。如果采用保留数据的关系方式,可能值得研究SQLite和一些ORM框架。
发布于 2016-06-04 08:48:16
您只需替换行:
print(listt[0])有:
print(*listt[0], sep='\n')https://stackoverflow.com/questions/37628184
复制相似问题