这是一种奇怪的情况,在这种情况下,Powershell调用-WebRequest按预期工作,而Python请求不工作。
我正在尝试使用python来刮一个电子商务网站。抓取的一部分是测试一个项目是否可以添加到购物车中。使用Chrome工具F12,我能够提取以下Powershell脚本。
步骤1-请求客户会话
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
"Accept"="application/json, text/plain, */*"
"Cache-Control"="no-cache"
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} | Select-Object -Expand RawContent这个回复会给我一个"ECOM_SESS“曲奇和其他一堆饼干。
然后,我将把ECOM_SESS cookie传递到下一步。
步骤2-添加到购物车
$session = New-Object Microsoft.PowerShell.Commands.WebRequestSession
$session.UserAgent = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36"
$session.Cookies.Add((New-Object System.Net.Cookie("ECOM_SESS", "XXXXXXXXXXXXXXXX", "/", ".hermes.com")))
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method "POST" `
-WebSession $session `
-Headers @{
"sec-ch-ua"="`" Not A;Brand`";v=`"99`", `"Chromium`";v=`"99`", `"Google Chrome`";v=`"99`""
"Accept"="application/json, text/plain, */*"
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
"Accept-Encoding"="gzip, deflate, br"
"Accept-Language"="en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5"
} `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"使用上面的Powershell脚本,这个过程工作得很好,我将从这两个步骤中的每一个得到响应。注意,这是一个旋转的IP代理,它在每个请求上刷新IP以防止bot检测。
但是,当我试图将它集成到Python代码中时,无论使用哪个代理服务器,我都会在步骤2中遇到captcha的要求。
下面是相关的python代码:
from __future__ import print_function
import bs4
import requests
from requests.cookies import RequestsCookieJar
import jsons
def main():
url1= "https://bck.hermes.com/customer-session?locale=de_de"
url2 = "https://bck.hermes.com/add-to-cart"
proxies1 = {
"http": "xxxxxxxxxxxxxxxxxx"
}
headers1 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'Accept': 'application/json, text/plain, */*',
'Cache-Control': 'no-cache',
'DNT': '1',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Origin': 'https://www.hermes.com',
'Sec-Fetch-Site': 'same-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'document',
'Referer': 'https://www.hermes.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
}
headers2 = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'Accept': 'application/json, text/plain, */*',
'DNT': '1',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Origin': 'https://www.hermes.com',
'Sec-Fetch-Site': 'same-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://www.hermes.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'en-US,en;q=0.9,ja;q=0.8,zh-CN;q=0.7,zh-TW;q=0.6,zh;q=0.5'
}
body2 = {"locale":"de_de","items":[{"category":"direct","sku":"H079082CCAC"}]}
#Step 1
f = requests.get(url1, headers=headers1,proxies=proxies1)
print(f"1Response Body: {f.text}\n")
ECOM_SESS = f.cookies['ECOM_SESS']
cookieJar = RequestsCookieJar()
cookieJar.set('ECOM_SESS', ECOM_SESS, domain='.hermes.com', path='/')
#Step 2
g = requests.post(url2, headers=headers2,cookies=cookieJar,proxies=proxies1,json=body2)
print(f"2Response Body: {g.text}\n")
if __name__ == '__main__':
main()在这里运行Python代码,步骤1将很好地给出传递到步骤2所需的cookie响应,但是,步骤2总是会产生captcha响应。
我只是好奇Powershell Invoke-WebRequest方法和Python Request方法之间的区别,因为前者必须有一些根本不同的东西,才能完全避免captcha,而后者总是会受到captcha的攻击。
会感谢你们的任何想法和见解!谢谢!
发布于 2022-03-26 09:14:12
具体而言,我不确定触发站点上机器人保护的请求是什么,但基于这,您可能会幸运地使用:
requests.request("POST", url2, headers=headers2, cookies=cookieJar, proxies=proxies1, json=body2)或者,您可以尝试urllib3而不是请求。
下面是您的powershell脚本,也简化为一段摘录。
$secPasswd=ConvertTo-SecureString "password" -AsPlainText -Force
$myCreds=New-Object System.Management.Automation.PSCredential -ArgumentList "username",$secPasswd
$headers = @{
"sec-ch-ua"='" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"'
"DNT"="1"
"sec-ch-ua-mobile"="?0"
"sec-ch-ua-platform"="`"Windows`""
"Origin"="https://www.hermes.com"
"Sec-Fetch-Site"="same-site"
"Sec-Fetch-Mode"="cors"
"Sec-Fetch-Dest"="empty"
"Referer"="https://www.hermes.com/"
}
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/customer-session?locale=de_de" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-SessionVariable session `
-Headers $headers
Invoke-WebRequest -UseBasicParsing -Uri "https://bck.hermes.com/add-to-cart" `
-Proxy 'http://proxyaddress' `
-ProxyCredential $mycreds `
-Method POST `
-WebSession $session `
-Headers $headers `
-ContentType "application/json" `
-Body "{`"locale`":`"de_de`",`"items`":[{`"category`":`"direct`",`"sku`":`"H079082CCAC`"}]}"https://stackoverflow.com/questions/71625104
复制相似问题