我真的被卡住了。我在这里的任务是过滤5000条记录CSV的日期,找到一个特定的日期范围,按升序排序,然后取一个不同列的字段来创建一个句子。我已经能够成功地对日期进行排序,但我现在的问题是我不知道如何获得与该行相对应的单词。代码如下:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', encoding="utf8") as csvfile:
reader = csv.DictReader(csvfile)
r = [] # This will hold our ID numbers for rows
c = [] # This will hold our initial dates that are filtered out from the main csv
l = [] # This will hold our sorted dates from c
w = [] # This will hold our words
sentence = '' #This will be our sentence
# Filter out created_at dates we don't care about
def filterDates():
for row in reader:
createdOn = float(row['created_at'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d < '2014-06-22':
pass
else:
c.append(d)
filterDates()
def sort(c):
for i in c:
if i > '2014-06-22' and i < '2014-07-22':
l.append(i)
l.sort(reverse=False)
else:
pass
sort(c)
def findWords(l):
for row in reader:
words = row['word']
for x in range(l):
print(words[0])
findWords(l)
finder()我知道这段代码可能很草率,到处都是。我认为这是一项工作的挑战,并认为我可以很容易地完成它,但显然我的Python还不能完全达到标准。我以前没有用过Python CSV。我会直截了当地说,我不打算再申请这份工作了,但如果我想不出来,这会让我发疯的。我已经花了几个小时尝试不同的东西,我的问题在于如何获取具有正确日期的行并获得单词。
感谢所有的建议和帮助!为了我自己的理智,我需要弄清楚这件事。
谢谢,RDD
数据示例:
id created_at first_name last_name email gender company currency word drug_brand drug_name drug_company pill_color frequency token keywords
1 1309380645 Stephanie Franklin sfranklin0@sakura.ne.jp Female Latz IDR transitional SUNLEYA Age minimizing sun care AVOBENZONE, OCTINOXATE, OCTISALATE, OXYBENZONE C.F.E.B. Sisley Maroon Yearly ______T______h__e________ _______N__e__z_____p______e_____________d______i______a_____n__ _____h__i__v__e___-_____m___i____n__d__ _____________f ________c_______h__a__________s_.__ _Z________a_____l_____g________o__._ est risus auctor sed tristique in
2 1237178109 Michelle Fowler mfowler1@oracle.com Female Skipstorm EUR flexibility Medulla Arnica Medulla Arnica Uriel Pharmacy Inc. Yellow Once _____ morbi vestibulum velit id
3 1303585711 Betty Barnes bbarnes2@howstuffworks.com Female Skibox IDR workforce Rash Relief Zinc Oxide Dimethicone Touchless Care Concepts LLC Purple Monthly ___ ac est lacinia
4 1231175716 Jerry Rogers jrogers3@canalblog.com Male Cogibox IDR content-based up and up acid controller complete Famotidine, Calcium Carbonate, Magnesium Hydroxide Target Corporation Maroon Daily NIL augue a suscipit nulla elit
5 1236709011 Harry Garrett hgarrett4@mlb.com Male Yotz RUB coherent Vistaril HYDROXYZINE PAMOATE Pfizer Laboratories Div Pfizer Inc Orange Never �_nb_l_ _u___ __olop __ __oq_l _n _unp_p__u_ _od___ po_sn__ op p_s '__l_ _u__s_d_p_ _n_____suo_ '____ __s _olop _nsd_ ___o_ morbi ut odio cras
6 1400030214 Lori Martin lmartin5@apache.org Female Aivee EUR software Fluorouracil Fluorouracil Taro Pharmaceutical Industries Ltd. Pink Daily _ dui vel sem
7 1368791435 Joe Turner jturner6@elpais.com Male Mycat IRR tangible Sulfacetamide Sodium Sulfacetamide Sodium Paddock Laboratories, LLC Aquamarine Often 1;DROP TABLE users nulla facilisi cras non velit
8 1394919241 Ruth Bryant rbryant7@dell.com Female Browsecat IDR incremental Pollens - Trees, Mesquite, Prosopis juliflora Mesquite, Prosopis juliflora Jubilant HollisterStier LLC Aquamarine Weekly ___________ et magnis dis
9 1352948920 Cynthia Lopez clopez8@gov.uk Female Twitterbeat USD Up-sized Ideal Flawless Octinoxate, Titanium Dioxide Avon Products, Inc Red Daily (_�_�___ ___) purus eu magna
10 1319910259 Phillip Ross pross9@ehow.com Male Buzzshare VEF data-warehouse Serotonin Serotonin BioActive Nutritional Orange Weekly __ vel sem好的,在韦斯特利·怀特的帮助下进行了一些调整后,我能够让这个功能正常工作了!我把它压缩成一个嵌套的函数,它正在做它应该做的事情!代码如下:
#/usr/bin/python3
import csv
import time
def finder():
with open('sample_data.csv', 'r', encoding='latin-1') as csvfile:
reader = csv.DictReader(csvfile)
def dates(reader):
# Set up variables
date_range = []
sentence = []
# Initiate iteration through CSV
for row in reader:
createdOn = float(row['created_at'])
words = str(row['word'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
if d >= '2014-06-22' and d <= '2014-07-22':
date_range.append(d)
date_range.sort()
for word in words:
if d in date_range:
sentence.append(word)
print(sentence)
dates(reader)
finder()只剩下一个问题了。当sentence[]追加时,它会一次追加一个字符。我不知道如何将CSV专栏中的字母组合到单词中,而不是将它们全部组合在一起。有什么想法吗?
谢谢!
发布于 2017-01-12 06:53:01
我不知道数据是如何格式化的,但这里是我的尝试。
导入时间
def finder(start_date='2014-06-22', end_date='2014-07-22'):
"""
:param start_date: Starting date
:param end_date: Ending date
"""
def filterDates(reader):
datelist = []
for row in reader:
created_on = float(row['created_at'])
d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates
# Is between starting and ending dates
if d >= start_date and d <= end_date:
# Going to use the created_on value so we dont have to reformat it again
datelist.append(created_on)
return datelist
def findWords(reader, datelist):
for row in reader:
if float(row['created_at']) in datelist:
words = row['word']
for word in words:
print(word)
with open('sample_data.csv', encoding="utf8") as csvfile:
reader = csv.DictReader(csvfile)
dates = filterDates(reader)
dates = dates.sort()
findWords(reader, dates)
finder('2014-06-22', '2014-07-22')EDIT:如果你想把每个单词添加到一个列表中,使用这个
将此代码添加到循环之外
sentence_list = []变化
words = row['word'] 至
word = row['word']然后改变
for word in words:
print(word)至
sentence_list.append(word)如果要使用字符串,请在循环之外添加以下内容
sentence = ""然后,当你打印单词时,只需将其添加到句子中
# adding a Word to the sentence
sentence = "{} {}".format(sentence, word)最后将此代码添加到循环外部的底部
print(sentence)https://stackoverflow.com/questions/41601871
复制相似问题