我正在使用Python从气球百科(https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress)中抓取美国国会的名字。我当前的代码为我提供了两个表(参议院和众议院)中每个表中的所有四列。下面是我当前的代码:
import requests
from bs4 import BeautifulSoup
import pandas as pd
list = ['https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress']
all_tables = pd.read_html(list[0])
senators = all_tables[3]
house_members = all_tables[6]
congress = senators.append(house_members)
congress.to_csv('3-New Congressmen.csv')显然,我一直在尝试使用第7-10行,但没有找到一种只获取立法者姓名的方法。我只对name列感兴趣。
我忽略了Ballotpedia页面的inspect功能,这是错误的吗?或者是否需要额外的一行代码来指定我想要的列?非常感谢您的帮助!
发布于 2021-04-02 04:39:30
要仅获取立法者的姓名,您可以执行以下操作:
import pandas as pd
url = "https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress"
dfs = pd.read_html(url)
legislators_df = dfs[3]["Name"]
house_members = dfs[6]["Name"]
pd.concat([legislators_df, house_members]).to_csv("out.csv", index=False)创建out.csv
0 Richard Shelby
1 Tommy Tuberville
2 Lisa Murkowski
3 Daniel S. Sullivan
4 Mark Kelly
5 Kyrsten Sinema
6 John Boozman
7 Tom Cotton
8 Dianne Feinstein
9 Alex Padilla
10 Michael Bennet
...

https://stackoverflow.com/questions/66911239
复制相似问题