首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >需要拆分列,但只删除字符

需要拆分列,但只删除字符
EN

Stack Overflow用户
提问于 2021-12-27 14:59:55
回答 2查看 39关注 0票数 0

早上好。在我的df和代码的前20行下面。

当我试图按“<”分隔以从链接中移除强标记时,split只删除字符,split('<')[0]返回一个KeyError。

有什么办法让这件事起作用吗?

第一个想要的链接:

http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux

代码语言:javascript
复制
    0
0   <a class="back" href="http://africa.espn.com/college-sports/football/recruiting/rankings">Back to Ranking Index</a>
1   <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux" name=""></a>
2   <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux"><strong>Kayvon Thibodeaux</strong></a>
3   <a href="http://insider.espn.com/college-sports/football/recruiting/player/evaluation/_/id/222687/kayvon-thibodeaux">Scouts Report</a>
4   <a href="http://africa.espn.com/college-sports/football/recruiting/playerrankings/_/view/rn300/sort/rank/class/2019"><img border="0" class="floatleft" src="https://a.espncdn.com/i/recruiting/logos/2012/sml/rn-300_sml.png" title="ESPN 300"/></a>
5   <a href="http://africa.espn.com/college-sports/football/recruiting/school/_/id/2483/class/2019/oregon-ducks"><img class="valign-logo" src="https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/2483.png?w=110&amp;h=110&amp;transparent=true" style="width: 50px"/></a>
6   <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/226752/nolan-smith" name=""></a>
7   <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/226752/nolan-smith"><strong>Nolan Smith</strong></a>
8   <a href="http://insider.espn.com/college-sports/football/recruiting/player/evaluation/_/id/226752/nolan-smith">Scouts Report</a>
9   <a href="http://africa.espn.com/college-sports/football/recruiting/playerrankings/_/view/rn300/sort/rank/class/2019"><img border="0" class="floatleft" src="https://a.espncdn.com/i/recruiting/logos/2012/sml/rn-300_sml.png" title="ESPN 300"/></a>
10  <a href="http://africa.espn.com/college-sports/football/recruiting/school/_/id/61/class/2019/georgia-bulldogs"><img class="valign-logo" src="https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/61.png?w=110&amp;h=110&amp;transparent=true" style="width: 50px"/></a>
11  <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/216987/kenyon-green" name=""></a>
12  <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/216987/kenyon-green"><strong>Kenyon Green</strong></a>
13  <a href="http://insider.espn.com/college-sports/football/recruiting/player/evaluation/_/id/216987/kenyon-green">Scouts Report</a>
14  <a href="http://africa.espn.com/college-sports/football/recruiting/playerrankings/_/view/rn300/sort/rank/class/2019"><img border="0" class="floatleft" src="https://a.espncdn.com/i/recruiting/logos/2012/sml/rn-300_sml.png" title="ESPN 300"/></a>
15  <a href="http://africa.espn.com/college-sports/football/recruiting/school/_/id/245/class/2019/texas-aggies"><img class="valign-logo" src="https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/245.png?w=110&amp;h=110&amp;transparent=true" style="width: 50px"/></a>
16  <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222156/evan-neal" name=""></a>
17  <a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222156/evan-neal"><strong>Evan Neal</strong></a>
18  <a href="http://insider.espn.com/college-sports/football/recruiting/player/evaluation/_/id/222156/evan-neal">Scouts Report</a>
19  <a href="http://africa.espn.com/college-sports/football/recruiting/playerrankings/_/view/rn300/sort/rank/class/2019"><img border="0" class="floatleft" src="https://a.espncdn.com/i/recruiting/logos/2012/sml/rn-300_sml.png" title="ESPN 300"/></a>
20  <a href="http://africa.espn.com/college-sports/football/recruiting/school/_/id/333/class/2019/alabama-crimson-tide"><img class="valign-logo" src="https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/333.png?w=110&amp;h=110&amp;transparent=true" style="width: 50px"/></a>



#players.to_excel('Player_Links.xlsx')
players = pd.read_excel('Player_Links.xlsx')
players['Links'] = players.iloc[:,1]
players = players[players['Links'].str.contains('strong')]
players['Links'] = players['Links'].str.replace('<a href="','')
players['Links'] = players['Links'].str.split('<')
print(players)
EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2021-12-27 15:14:29

过滤数据帧以获得带有<strong>标记的行。然后只有我们BeautifulSoup来解析html。在lambda函数中使用它:

代码语言:javascript
复制
from bs4 import BeautifulSoup
import pandas as pd


df = pd.DataFrame( [   
['<a class="back" href="http://africa.espn.com/college-sports/football/recruiting/rankings">Back to Ranking Index</a>'],
['<a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux" name=""></a>'],
['<a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux"><strong>Kayvon Thibodeaux</strong></a>'],
['<a href="http://insider.espn.com/college-sports/football/recruiting/player/evaluation/_/id/222687/kayvon-thibodeaux">Scouts Report</a>'],
['<a href="http://africa.espn.com/college-sports/football/recruiting/playerrankings/_/view/rn300/sort/rank/class/2019"><img border="0" class="floatleft" src="https://a.espncdn.com/i/recruiting/logos/2012/sml/rn-300_sml.png" title="ESPN 300"/></a>'],
['<a href="http://africa.espn.com/college-sports/football/recruiting/school/_/id/2483/class/2019/oregon-ducks"><img class="valign-logo" src="https://a.espncdn.com/combiner/i?img=/i/teamlogos/ncaa/500/2483.png?w=110&amp;h=110&amp;transparent=true" style="width: 50px"/></a>'],
['<a href="http://africa.espn.com/college-sports/football/recruiting/player/_/id/226752/nolan-smith" name=""></a>']],
    columns=[0])

df_filter = df[df[0].str.contains('<strong>')]

df_filter[0] = df_filter[0].apply(lambda row: BeautifulSoup(row, 'html.parser').find('a')['href'])

输出:

这就给我们留下了上面使用的示例集中的如下内容:

代码语言:javascript
复制
print(df_filter.to_string())
                                                                                                0
2  http://africa.espn.com/college-sports/football/recruiting/player/_/id/222687/kayvon-thibodeaux
票数 2
EN

Stack Overflow用户

发布于 2021-12-27 15:34:42

您还可以使用正则表达式完成任何事情:

代码语言:javascript
复制
players = pd.read_excel('Player_Links.xlsx')
players['Links'] = players.iloc[:,1]
regex = r"(http:.*)\">.*<strong>"
players = players.Links.str.findall(regex)
# only keep the rows for which the regex hit
players = players[players.apply(lambda li: len(li) == 1)]
# flatten the list
players = players.apply(lambda li: li[0])
print(players)
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/70497103

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档