发布于 2020-07-23 16:23:55
要获得“SIC”行的值,可以使用以下示例(还需要指定正确的User-Agent ):
import requests
from bs4 import BeautifulSoup
url = 'https://sec.report/CIK/1418076'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
print( soup.find('td', text="SIC").find_next('td').text )指纹:
7129: Other Business Financing Companies Investors, Not Elsewhere Classified 6799编辑:将解析器更改为lxml,以正确解析HTML:
import requests
from bs4 import BeautifulSoup
url = 'https://sec.report/CIK/1002771'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'lxml')
print( soup.find('td', text="SIC").find_next('td').text )指纹:
1121: Distillery Products Industry Pharmaceutical Preparations 2834发布于 2020-07-23 16:25:05
试试下面的代码:
import requests
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 '}
r = requests.get('https://sec.report/CIK/1418076', headers=headers)
soup = BeautifulSoup(r.content, 'lxml')
sic = soup.select_one('.table:nth-child(5) tr~ tr+ tr td:nth-child(2)')
print(sic.text)输出:
7129: Other Business Financing Companies Investors, Not Elsewhere Classified 6799
https://stackoverflow.com/questions/63058466
复制相似问题