我有以下数据帧:
import pandas as pd
data = {'URL': ['https://weibo.com/6402575118/Iy0zjtMNZ', 'https://weibo.com/6402575118/Hellothere', 'https://weibo.com/6402575118/hithere']}
df = pd.DataFrame(data, columns=['URL'])我想要获取第二次出现"/“之后直到第四个字符的所有子字符串,这样:
data = {'URL': ['https://weibo.com/6402575118/Iy0z', 'https://weibo.com/6402575118/Hell', 'https://weibo.com/6402575118/hith']}
df = pd.DataFrame(data, columns=['URL'])我该如何做到这一点?
我知道如何拆分和获取字符串的第一部分,即
df['URL'] = df['URL'].str.split("/").str[0]但我不确定如何施加发生条件?
发布于 2020-10-13 19:11:10
只需改变拆分的方式即可。使用alphanumerics immediately to the left of digit、特殊字符/和4 alphanumerics进行拆分,然后选择结果列表中的第一个字符串
df['URL']=df.URL.str.split('(?<=\d\/\w{4})\w+').str[0]
URL
0 https://weibo.com/6402575118/Iy0z
1 https://weibo.com/6402575118/Hell
2 https://weibo.com/6402575118/hith发布于 2020-10-13 18:58:11
如果在第4次出现/后需要替换,请将Series.str.split与n=4一起使用,然后连接并添加为str.cat中的第一个5th值过滤的4字符串
s = df['URL'].str.split("/", n=4)
df['URL'] = s.str[:4].str.join('/').str.cat(s.str[4].str[:4], '/')
print (df)
URL
0 https://weibo.com/6402575118/Iy0z
1 https://weibo.com/6402575118/Hell
2 https://weibo.com/6402575118/hith另一个想法是由rsplit从右侧拆分出来的:
s = df['URL'].str.rsplit("/", n=1)
df['URL'] = s.str[0].str.cat(s.str[-1].str[:4], '/')
print (df)
URL
0 https://weibo.com/6402575118/Iy0z
1 https://weibo.com/6402575118/Hell
2 https://weibo.com/6402575118/hithhttps://stackoverflow.com/questions/64333847
复制相似问题