我理解如何从空格的第一次出现中分割字符串。我的问题是如何分割第二次第三次出现的空格,并捕获之前的所有字符串。
df = pd.DataFrame({"cid" : {0 : "cd1", 1 : "cd2", 2 : "cd3"},
"Name" : {0 : "John Maike Leiws", 1 : "Katie Sue Adam", 2 : "Tanaka Ubri Kse Suri"}}).set_index(['cid'])
Name
cid
cd1 John Maike Leiws
cd2 Katie Sue Adam
cd3 Tanaka Ubri Kse Suri
df['split_one'] = df.Name.str.split().str[0]预期产出:
Name split_one split_two split_three
cid
cd1 John Maike Leiws John John Maike John Maike Leiws
cd2 Katie Sue Adam Katie Katie Sue Katie Sue Adam
cd3 Tanaka Ubri Kse Suri Tanaka Tanaka Ubri Tanaka Ubri Kse发布于 2022-02-18 14:19:38
将索引与str一起使用,然后使用Series.str.join
s = df.Name.str.split()
df['split_one'] = s.str[0]
df['split_two'] = s.str[:2].str.join(' ')
df['split_three'] = s.str[:3].str.join(' ')
print (df)
Name split_one split_two split_three
cid
cd1 John Maike Leiws John John Maike John Maike Leiws
cd2 Katie Sue Adam Katie Katie Sue Katie Sue Adam
cd3 Tanaka Ubri Kse Suri Tanaka Tanaka Ubri Tanaka Ubri Kse发布于 2022-02-18 14:23:39
使用regex的一种简单方法是使用嵌套捕获组:
df['Name'].str.extract('(((\S+)\s\S+)\s\S+)').iloc[:,::-1]产出:
0 1 2
cid
cd1 John Maike Leiws John Maike John
cd2 Katie Sue Adam Katie Sue Katie
cd3 Tanaka Ubri Kse Tanaka Ubri Tanaka要补充的是,只需颠倒顺序:
df[['split_one', 'split_two', 'split_three']] = df['Name'].str.extract('(((\S+)\s\S+)\s\S+)').iloc[:,::-1]产出:
Name split_one split_two split_three
cid
cd1 John Maike Leiws John John Maike John Maike Leiws
cd2 Katie Sue Adam Katie Katie Sue Katie Sue Adam
cd3 Tanaka Ubri Kse Suri Tanaka Tanaka Ubri Tanaka Ubri Kse发布于 2022-02-18 14:36:11
我不知道你是在找普通的还是简单的东西。这是一种简单的方法。
df = pd.DataFrame({"cid" : {0 : "cd1", 1 : "cd2", 2 : "cd3"},
"Name" : {0 : "John Maike Leiws", 1 : "Katie Sue Adam", 2 : "Tanaka Ubri Kse Suri"}}).set_index(['cid'])
s = df.Name.str.split().str
df['split_one'] = s[0]
df['split_two'] = s[0] + ' ' + s[1]
df['split_three'] = s[0] + ' ' + s[1] + ' ' + s[2]https://stackoverflow.com/questions/71174953
复制相似问题