我正在尝试使用Python3从Ancestry.com中提取数据,使用Beautifulsoup和机械汤,但我遇到了一些试图登录的问题。下面是表单的祖先HTML:
<form action="#" id="signInForm" method="post" class="form formLarge" onsubmit="return false" novalidate="novalidate" data-ui-id="ui1591467547206308">
<div class="ancGrid">
<div class="ancCol ancColRow w100">
<label id="usernameLabel" for="username" data-error-0="Required" data-error-1="Please enter a minimum of 5 characters for the username/email" data-error-2="Username/email contains invalid characters">
Email or Username
</label>
<input tabindex="1" aria-required="true" class="success required" id="username" maxlength="64" name="username" placeholder="Email Address or Username" type="text" value="" autocorrect="off" autocapitalize="off">
</div>
<div class="ancCol ancColRow w100">
<label id="passwordLabel" for="password" data-error-0="Required" data-error-1="Please enter a minimum of 5 characters for the password" data-error-2="Password contains invalid characters">
Password
</label> [event]从urllib.request导入urlopen #指定url quote_page = 'https://www.ancestry.com/account/signin?‘#查询网站,并将html返回到变量’页面‘page= urlopen(quote_page) #使用漂亮的汤解析html并存储在变量soup https://www.ancestry.com/account/signin?=BeautifulSoup(页面,'html.parser') len(soup.find_all(' form ')) #Out: 1
browser.select_form('form[action="#"]')时,我得到了错误LinkNotFoundError。我的代码:#import urllib.request
#import time
#pip install beautifulsoup4
#from bs4 import BeautifulSoup
#%pip install mechanicalsoup
#import mechanicalsoup
browser = mechanicalsoup.StatefulBrowser()
browser.open('https://www.ancestry.com/account/signin?')
print(browser.get_url())
#browser.select_form('')
###action="#" id="signInForm"
#browser.select_form('form[action="#" id="signInForm"]')
#browser.select_form('form[action="#"]') #gives LinkNotFound error
browser.select_form('form[action=""]')
browser['username']='USERNAME'
browser['password']='PASSWORD'
browser.submit_selected()
print(browser.get_url())我看到了很多使用机器化的支持,但是对于Python3却不适用,我不知道如何检查Ancestry.com是否使用了Java,因为我不能使用第一种形式。我是一个初学者,所以请假设我什么都不知道,我不会被冒犯。(我还没有找到一个包含action='#‘的教程,因为该查询返回的结果很少)
(这个人使用了一种不同的策略来登录祖先,但是自从这段代码发布到https://github.com/freeseek/getmydnamatches/blob/master/getmyancestrydna.py之后,这个站点已经更新了--他的代码在我的水平上对我来说有点太高级了。)
发布于 2020-06-08 21:08:28
请考虑看一下这个:https://requests.readthedocs.io/projects/requests-html/en/latest/
它非常友好,并且有javascript支持。
https://stackoverflow.com/questions/62271156
复制相似问题