我正在尝试从"https://www.nseindia.com/companies-listing/corporate-filings-event-calendar?days=7days“网站上抓取表格,因为python的输出正在抓取表格。
import requests
from bs4 import BeautifulSoup
url = 'https://www.nseindia.com/companies-listing/corporate-filings-event-calendar?days=7days'
headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.117 Safari/537.36'}
response = requests.get(url, headers=headers)
print(response)
soup = BeautifulSoup(response.text, 'lxml')
print(soup)
data_array = soup.find(id='table-wrap my-3 borderSet maxHeight-900 scrollWrap').get_text().strip().split(":")
type(data_array)输出结果是打印HTML标记而不是表格。
关于karthi
发布于 2020-10-27 11:27:19
如果你想要表格,有一个下载链接。它以csv文件的形式提供。你不需要任何代码。你为什么不直接用它呢?
发布于 2020-11-19 21:35:30
这段代码将以列表的形式返回所有表,并通过xpath将data_table作为定位器:
data_table = self.find_element(table_locator).get_attribute('innerHTML').replace('<th></th>', '')
soup = BeautifulSoup(data_table, 'lxml')
data_rows = soup.find_all('tr')
rows_values_scrape = [[td.getText() for td in data_rows[i].findAll('td')]
for i, v in enumerate(data_rows)]
rows_values = [x for x in rows_values_scrape if x]
columns_scrape = [[td.getText() for td in data_rows[i].findAll('th')]
for i, v in enumerate(data_rows)]
columns = [x for x in columns_scrape if x]
table=[]
if columns[1:] != []:
for i, r in enumerate(columns[1:]):
table.append([f'column: {columns[0][j]}, row_title: {columns[1:][i][0]}, cell: {rows_values[i][j]}' for j, c in enumerate(columns[0])])
else:
table=[f'column: {columns[0][j]}, cell: {rows_values[0][j]}' for j, c in enumerate(columns[0]) if columns[1:] == []]
return tablehttps://stackoverflow.com/questions/64547638
复制相似问题