文章/答案/技术大牛

发布

问pandas for循环复制行
EN

Stack Overflow用户

提问于 2020-11-13 06:04:36

回答 1查看 22关注 0票数 0

我尝试在一个长数据帧上运行for循环，并计算给定文本中的英语单词和非英语单词的数量(每个文本都是一个新行)。

+-------+--------+----+
| Index |  Text  | ID |
+-------+--------+----+
|     1 | Text 1 |  1 |
|     2 | Text 2 |  2 |
|     3 | Text 3 |  3 |
+-------+--------+----+

这是我的代码

c = 0
for text in df_letters['Text_clean']:
    # Counters
    CTEXT= text
    c +=1
    eng_words = 0
    non_eng_words = 0
    text = " ".join(text.split())
    # For every word in text
    for word in text.split(' '):
      # Check if it is english
      if english_dict.check(word) == True:
        eng_words += 1
      else:
        non_eng_words += 1
    # Print the result
    # NOTE that these results are discarded each new text
    df_letters.at[text, 'eng_words'] = eng_words
    df_letters.at[text, 'non_eng_words'] = non_eng_words
    df_letters.at[text, 'Input'] = CTEXT
    #print('Index: {}; EN: {}; NON-EN: {}'.format(c, eng_words, non_eng_words))

但是，我没有使用3个新列作为输入，而是获得了相同的数据帧

+-------+--------+----+---------+-------------+---------+
| Index |  Text  | ID | English | Non-English |  Input  |
+-------+--------+----+---------+-------------+---------+
|     1 | Text 1 |  1 |       1 |           0 | Text 1  |
|     2 | Text 2 |  2 |       1 |           0 | Text 2  |
|     3 | Text 3 |  3 |       0 |           1 | Text 3  |
+-------+--------+----+---------+-------------+---------+

数据帧在长度上重复，为每个新文本添加新行。像这样

+--------+--------+-----+---------+-------------+--------+
| Index  |  Text  | ID  | English | Non-English | Input  |
+--------+--------+-----+---------+-------------+--------+
| 1      | Text 1 | 1   | nan     | nan         | nan    |
| 2      | Text 2 | 2   | nan     | nan         | nan    |
| 3      | Text 3 | 3   | nan     | nan         | nan    |
| Text 1 | nan    | nan | 1       | 0           | Text 1 |
| text 2 | nan    | nan | 1       | 0           | Text 2 |
| Text 3 | nan    | nan | 0       | 1           | Text 3 |
+--------+--------+-----+---------+-------------+--------+

我在这里做错了什么？

python

pandas

dataframe

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-11-13 06:17:54

Series.at通过索引值访问DataFrame。您的DataFrame的索引是[1,2,3]而不是[Text 1, Text 2, Text 3]。我认为对您来说最好的解决方案是将您的循环替换为如下所示：

for index, text in df_letters['Text_clean'].iteritems():

在索引所在的位置，您可以执行以下操作：

df_letters.at[index, 'eng_words'] = eng_words

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/64812469

复制

相似问题

问pandas for循环复制行
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pandas for循环复制行EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问pandas for循环复制行
EN