首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >需要根据条件在CSV文件中查找单词

需要根据条件在CSV文件中查找单词
EN

Stack Overflow用户
提问于 2017-01-12 06:28:14
回答 1查看 787关注 0票数 0

我真的被卡住了。我在这里的任务是过滤5000条记录CSV的日期,找到一个特定的日期范围,按升序排序,然后取一个不同列的字段来创建一个句子。我已经能够成功地对日期进行排序,但我现在的问题是我不知道如何获得与该行相对应的单词。代码如下:

代码语言:javascript
复制
#/usr/bin/python3

import csv
import time


def finder():
    with open('sample_data.csv', encoding="utf8") as csvfile:
        reader = csv.DictReader(csvfile)
        r = [] # This will hold our ID numbers for rows
        c = [] # This will hold our initial dates that are filtered out from the main csv
        l = [] # This will hold our sorted dates from c
        w = [] # This will hold our words 
        sentence = '' #This will be our sentence

        # Filter out created_at dates we don't care about

        def filterDates():
            for row in reader:
                createdOn = float(row['created_at'])
                d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

                if d < '2014-06-22':
                    pass
                else:
                    c.append(d)

        filterDates()

        def sort(c):
            for i in c:
                if i > '2014-06-22' and i < '2014-07-22':
                    l.append(i)
                    l.sort(reverse=False)
                else:
                    pass

        sort(c)

        def findWords(l):
            for row in reader:
                words = row['word']
                for x in range(l):
                    print(words[0])

        findWords(l)

finder()

我知道这段代码可能很草率,到处都是。我认为这是一项工作的挑战,并认为我可以很容易地完成它,但显然我的Python还不能完全达到标准。我以前没有用过Python CSV。我会直截了当地说,我不打算再申请这份工作了,但如果我想不出来,这会让我发疯的。我已经花了几个小时尝试不同的东西,我的问题在于如何获取具有正确日期的行并获得单词。

感谢所有的建议和帮助!为了我自己的理智,我需要弄清楚这件事。

谢谢,RDD

数据示例:

代码语言:javascript
复制
id  created_at  first_name  last_name   email   gender  company currency    word    drug_brand  drug_name   drug_company    pill_color  frequency   token   keywords
1   1309380645  Stephanie   Franklin    sfranklin0@sakura.ne.jp Female  Latz    IDR transitional    SUNLEYA Age minimizing sun care AVOBENZONE, OCTINOXATE, OCTISALATE, OXYBENZONE  C.F.E.B. Sisley Maroon  Yearly  ______T______h__e________ _______N__e__z_____p______e_____________d______i______a_____n__ _____h__i__v__e___-_____m___i____n__d__ _____________f ________c_______h__a__________s_.__ _Z________a_____l_____g________o__._   est risus auctor sed tristique in
2   1237178109  Michelle    Fowler  mfowler1@oracle.com Female  Skipstorm   EUR flexibility Medulla Arnica  Medulla Arnica  Uriel Pharmacy Inc. Yellow  Once    _____   morbi vestibulum velit id
3   1303585711  Betty   Barnes  bbarnes2@howstuffworks.com  Female  Skibox  IDR workforce   Rash Relief Zinc Oxide Dimethicone  Touchless Care Concepts LLC Purple  Monthly ___ ac est lacinia
4   1231175716  Jerry   Rogers  jrogers3@canalblog.com  Male    Cogibox IDR content-based   up and up acid controller complete  Famotidine, Calcium Carbonate, Magnesium Hydroxide  Target Corporation  Maroon  Daily   NIL augue a suscipit nulla elit
5   1236709011  Harry   Garrett hgarrett4@mlb.com   Male    Yotz    RUB coherent    Vistaril    HYDROXYZINE PAMOATE Pfizer Laboratories Div Pfizer Inc  Orange  Never   �_nb_l_ _u___ __olop __ __oq_l _n _unp_p__u_ _od___ po_sn__ op p_s '__l_ _u__s_d_p_ _n_____suo_ '____ __s _olop _nsd_ ___o_   morbi ut odio cras
6   1400030214  Lori    Martin  lmartin5@apache.org Female  Aivee   EUR software    Fluorouracil    Fluorouracil    Taro Pharmaceutical Industries Ltd. Pink    Daily   _   dui vel sem
7   1368791435  Joe Turner  jturner6@elpais.com Male    Mycat   IRR tangible    Sulfacetamide Sodium    Sulfacetamide Sodium    Paddock Laboratories, LLC   Aquamarine  Often   1;DROP TABLE users  nulla facilisi cras non velit
8   1394919241  Ruth    Bryant  rbryant7@dell.com   Female  Browsecat   IDR incremental Pollens - Trees, Mesquite, Prosopis juliflora   Mesquite, Prosopis juliflora    Jubilant HollisterStier LLC Aquamarine  Weekly  ___________ et magnis dis
9   1352948920  Cynthia Lopez   clopez8@gov.uk  Female  Twitterbeat USD Up-sized    Ideal Flawless  Octinoxate, Titanium Dioxide    Avon Products, Inc  Red Daily   (_�_�___ ___)   purus eu magna
10  1319910259  Phillip Ross    pross9@ehow.com Male    Buzzshare   VEF data-warehouse  Serotonin   Serotonin   BioActive Nutritional   Orange  Weekly  __  vel sem

好的,在韦斯特利·怀特的帮助下进行了一些调整后,我能够让这个功能正常工作了!我把它压缩成一个嵌套的函数,它正在做它应该做的事情!代码如下:

代码语言:javascript
复制
#/usr/bin/python3

import csv
import time

def finder():

    with open('sample_data.csv', 'r', encoding='latin-1') as csvfile:
        reader = csv.DictReader(csvfile)
        def dates(reader):
            # Set up variables
            date_range = []
            sentence = []

            # Initiate iteration through CSV
            for row in reader:
                createdOn = float(row['created_at'])
                words = str(row['word'])
                d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

                if d >= '2014-06-22' and d <= '2014-07-22':
                    date_range.append(d)

                date_range.sort()

                for word in words:
                    if d in date_range:
                        sentence.append(word)

            print(sentence)

        dates(reader)

finder()

只剩下一个问题了。当sentence[]追加时,它会一次追加一个字符。我不知道如何将CSV专栏中的字母组合到单词中,而不是将它们全部组合在一起。有什么想法吗?

谢谢!

EN

回答 1

Stack Overflow用户

发布于 2017-01-12 06:53:01

我不知道数据是如何格式化的,但这里是我的尝试。

导入时间

代码语言:javascript
复制
def finder(start_date='2014-06-22', end_date='2014-07-22'):
    """ 
    :param start_date: Starting date
    :param end_date: Ending date
    """

    def filterDates(reader):
        datelist = []
        for row in reader:
            created_on = float(row['created_at'])
            d = time.strftime('%Y-%m-%d', time.localtime(createdOn)) # Converts dates

            # Is between starting and ending dates
            if d >= start_date  and d <= end_date:
                # Going to use the created_on value so we dont have to reformat it again
                datelist.append(created_on)
        return datelist

    def findWords(reader, datelist):
        for row in reader:
            if  float(row['created_at']) in datelist:
                words = row['word']
                for word in words:      
                    print(word)

    with open('sample_data.csv', encoding="utf8") as csvfile:
        reader = csv.DictReader(csvfile)

    dates = filterDates(reader)
    dates = dates.sort()
    findWords(reader, dates)     

finder('2014-06-22', '2014-07-22')

EDIT:如果你想把每个单词添加到一个列表中,使用这个

将此代码添加到循环之外

代码语言:javascript
复制
sentence_list = []

变化

代码语言:javascript
复制
words = row['word'] 

代码语言:javascript
复制
word = row['word']

然后改变

代码语言:javascript
复制
for word in words:      
    print(word)

代码语言:javascript
复制
sentence_list.append(word)

如果要使用字符串,请在循环之外添加以下内容

代码语言:javascript
复制
sentence = ""

然后,当你打印单词时,只需将其添加到句子中

代码语言:javascript
复制
# adding a Word to the sentence
sentence = "{} {}".format(sentence, word)

最后将此代码添加到循环外部的底部

代码语言:javascript
复制
print(sentence)
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/41601871

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档