文章/答案/技术大牛

发布

社区首页 >问答首页 >迭代panda序列将以字母开头的行追加到以数字开头的行的末尾

问迭代panda序列将以字母开头的行追加到以数字开头的行的末尾
EN

Stack Overflow用户

提问于 2020-09-02 02:22:47

回答 1查看 31关注 0票数 1

我在使用python来理解这个for循环时遇到了一些麻烦。下面是一个单列数据框作为示例。我发现的大多数pandas示例都是一次处理整个数据帧。或者搜索一个单词并附加到前一行。

What I am trying to do: Forgive me trying to sound it out in a logical way.
1-Start at (0,Test) in the series.
2-Check element at (0,Test) for number at first position (0). If True, then hold and (store) 
  pre_number_line.
3-Goto next line down.
4-Check element (1,Test) for number at first position (0). If False, then check first position for 
  letter.
5-If first character True for letter, concatenate current line at the end of the pre_num_line or 
  (0,Test) position line in this case.
6-Delete current row & shift rows up.(instead maybe change string(line) to NaN and delete all NaN at 
  end of code). Not sure which is easier.
7-Analyze next row down at (2,Test) repeat process starting at step 2. 
8-End loop when all rows with letters (at 1st position) have been appended to the pre_num_line.
9-Next row down, should start with numbers. This will be the new pre_num_line.

列出的只是字符串的开头。尽管如此，字符串可以始终包含数字和字母。每行的第一个位置始终是数字或字母(不区分大小写)。每个字母行必须(在末尾)与其上方的编号行组合。在处理结束时，只存在编号的行。

import pandas as pd
from pandas import DataFrame, Series


dat = {'Test': ['123456ab', 'coff-4', 'eat 8', 'bagle6', '345678-edh', 'wine', 'bread','567890 tfs', 
       'grape']}
df = pd.DataFrame(dat)

letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
numbers = '0123456789'


#------------------- 
pre_num_lin = None

for line in df.Test:
    if line[0] in numbers:
        pre_num_lin = df['Test']

if line[0] in letters:
    pre_num_lin = pre_num_lin + ' ' + line

#------------------

print(df)



What it should look like at end:
Test
0 123456ab coff-4 eat 8 bagle6
1 345678-edh wine 4 bread
2 567890 tfs grape

我感谢你们所有人的时间和知识。如果你有任何问题请告诉我。

pandas

loops

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-09-02 02:29:07

试试这个：

df.groupby(df['Test'].str[0].str.isnumeric().cumsum())['Test'].agg(' '.join)

输出：

Test
1    123456ab coff-4 eat 8 bagle6
2           345678-edh wine bread
3                567890 tfs grape
Name: Test, dtype: object

详细信息：

使用字符串访问器和索引器0来获得等于df['Test'].str.get(0)的第一个字母df['Test'].str[0] (只需更少的输入)

接下来，使用带有isnumeric方法的字符串访问器来检查该字符是否为数字。这将返回一个布尔级数。

df['Test'].str[0].str.isnumeric()

0     True
1    False
2    False
3    False
4     True
5    False
6    False
7     True
8    False
Name: Test, dtype: bool

现在，我们可以使用cumsum创建行分组，如下所示：

df['Test'].str[0].str.isnumeric().cumsum()

0    1
1    1
2    1
3    1
4    2
5    2
6    2
7    3
8    3
Name: Test, dtype: int32

最后，我们可以使用生成分组序列进行分组，并应用字符串join的聚合

df.groupby(df['Test'].str[0].str.isnumeric().cumsum())['Test'].agg(' '.join)

Test
1    123456ab coff-4 eat 8 bagle6
2           345678-edh wine bread
3                567890 tfs grape
Name: Test, dtype: object

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/63693371

复制

相似问题

问迭代panda序列将以字母开头的行追加到以数字开头的行的末尾
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代panda序列将以字母开头的行追加到以数字开头的行的末尾EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问迭代panda序列将以字母开头的行追加到以数字开头的行的末尾
EN