文章/答案/技术大牛

发布

社区首页 >问答首页 >从Wiki获取与特定文本匹配的表

问从Wiki获取与特定文本匹配的表
EN

Stack Overflow用户

提问于 2020-12-05 17:57:02

回答 1查看 36关注 0票数 0

我是Python和BeautifulSoup的新手，我已经尝试了几个小时了……

首先，我想从标题中带有“大选”的以下链接中提取所有表格数据：

https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)

我确实有另一个数据帧，每个表的名称(例如，“1961年大选”，“1965年大选”)，但我希望通过在每个表格上搜索“大选”来确认它是否是我所需要的。

然后我想要得到所有在Bold中的名字(这表明他们赢了)，最后我想要另一个按原始顺序排列的"Count 1“(有时是Count)列表，我想将它与"Bold”列表进行比较。我甚至还没有看过这篇文章，因为我还没有通过第一个障碍。

url = "https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)"
res = requests.get(url)
soup = BeautifulSoup(res.content,'lxml')
my_tables = soup.find_all("table", {"class":"wikitable"})
for table in my_tables:
    rows = table.find_all('tr', text="general election")
    print(rows)

在这方面的任何帮助都将非常感谢……

python

beautifulsoup

wiki

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-12-06 00:26:07

这个页面需要一些技巧，但它可以做到：

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

req = requests.get('https://en.wikipedia.org/wiki/Carlow%E2%80%93Kilkenny_(D%C3%A1il_constituency)')
soup = bs(req.text,'lxml')
#first - select all the tables on the page
tables = soup.select('table.wikitable')

for table in tables:        
    ttr = table.select('tbody tr')
    #next, filter out any table that doesn't involve general elections
    if "general election" in ttr[0].text: 
        #clean up the rows
        s_ttr = ttr[1].text.replace('\n','xxx').strip()
        #find and clean up column headings
        columns = [col.strip() for col in s_ttr.split('xxx') if len(col.strip())>0 ]
        rows = [] #initialize a list to house the table rows
        for c in ttr[2:]:
        #from here, start processing each row and loading it into the list
            row = [a.text.strip() if len(a.text.strip())>0 else 'NA' for a in  c.select('td')  ]
            if (row[0])=="NA":
                row=row[1:]
            columns = [col.strip() for col in s_ttr.split('xxx') if len(col.strip())>0 ]
            if len(row)>0:
                rows.append(row)
        #load the whole thing into a dataframe
        df = pd.DataFrame(rows,columns=columns)
        print(df)

输出应该是页面上的所有普选表。

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/65155732

复制

相似问题

问从Wiki获取与特定文本匹配的表
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从Wiki获取与特定文本匹配的表EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从Wiki获取与特定文本匹配的表
EN