我已经编写了下面的代码来从http://acuratings.conservative.org/acu-federal-legislative-ratings/?year1=1975&chamber=11&state1=0&sortable=1中抓取表格。目标是将所有表保存到一个数据帧中
import pandas as pd
import time
from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
acu_browser = webdriver.Chrome(ChromeDriverManager().install())
acu_browser.get('http://acuratings.conservative.org/acu-federal-legislative-ratings/?year1=1975&chamber=11&state1=0&sortable=1').
time.sleep(10)
acu_html = acu_browser.page_source
acu_tables = pd.read_html(acu_html)
acu_tables = pd.concat(acu_tables)但是,最后一行给出了以下错误:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-16e0df40412a> in <module>
13 acu_html = acu_browser.page_source
14 acu_tables = pd.read_html(acu_html)
---> 15 acu_tables = pd.concat(acu_tables)
16
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in concat(objs, axis, join, ignore_index, keys, levels, names, verify_integrity, sort, copy)
282 )
283
--> 284 return op.get_result()
285
286
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/reshape/concat.py in get_result(self)
490 obj_labels = mgr.axes[ax]
491 if not new_labels.equals(obj_labels):
--> 492 indexers[ax] = obj_labels.reindex(new_labels)[1]
493
494 mgrs_indexers.append((obj._data, indexers))
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in reindex(self, target, method, level, limit, tolerance)
2423 else:
2424 # hopefully?
-> 2425 target = MultiIndex.from_tuples(target)
2426
2427 if (
/usr/local/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/multi.py in from_tuples(cls, tuples, sortorder, names)
487 tuples = tuples._values
488
--> 489 arrays = list(lib.tuples_to_object_array(tuples).T)
490 elif isinstance(tuples, list):
491 arrays = list(lib.to_object_array_tuples(tuples).T)
pandas/_libs/lib.pyx in pandas._libs.lib.tuples_to_object_array()
TypeError: Expected tuple, got str任何帮助都将不胜感激。
发布于 2020-07-01 12:16:17
到目前为止,我还没有一个好的答案。
绕过这一问题的一种简单方法是执行类似以下操作:
accumulator_df = acu_tables[1]
for i in range(2, len(acu_tables)):
accumulator_df = pd.concat((accumulator_df, acu_tables[i]), ignore_index = True)然而,这不会直接起作用。由于列名不相同,因此无法正确连接。
由于所有表都有35列,因此一种解决方法是简单地将列重命名为一些固定值,然后进行连接。
https://stackoverflow.com/questions/62668219
复制相似问题