我试着从一个网站上抓取图片。我设法获得了图像的链接,但当我使用wget下载图像时,我一直收到HTTP Error 403: ModSecurity操作
以下是我的代码
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import wget
import time
import os
import urllib.request
driver = webdriver.Chrome('chromedriver-path')
url = ("https://www.ancuong.com/vi/san-pham/san-pham-chinh/van-mfc--cac-loai-van-phu-melamine/melamine-phu-tren-mdf-melamine-mdf/page-woodgrain.html")
driver.get(url)
n = 0
while n <= 1500:
driver.execute_script("window.scrollTo(0, {})".format(n))
n+=200
time.sleep(0.1)
images = WebDriverWait(driver, 60).until(
EC.visibility_of_all_elements_located((By.CLASS_NAME, 'load-done'))
)
imgLinks = []
for image in images:
imdLink = image.get_attribute('src')
imgLinks.append(imgLink)
print(imgLinks)
time.sleep(1)
driver.quit()
path = os.getcwd()
path = os.path.join(path, "an-cuong-images")
os.mkdir(path)
counter = 0
for imgLink in imgLinks:
save_as = os.path.join(path, "an-cuong-plywood" + str(counter) + '.jpg')
wget.download(imgLink, save_as)
counter += 1我得到的错误是
File "D:\Jobs\dream\scrape info\scrape image- python\selenium_crawling.py", line 43, in <module>
wget.download(imgLink, save_as)
File "C:\Users\My Lap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\wget.py", line 526, in download
(tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 239, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 214, in urlopen
return opener.open(url, data, timeout)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 523, in open
response = meth(req, response)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 632, in http_response
response = self.parent.error(
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 561, in error
return self._call_chain(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 494, in _call_chain
result = func(*args)
File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 641, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: ModSecurity Action我该如何解决这个问题。提前感谢您的帮助!
发布于 2021-07-09 00:58:00
ModSecurity是一个开源的、跨平台的web应用程序防火墙模块。https://modsecurity.org/about.html
因此,每当您看到403 (ModSecurity操作)时,这意味着mod安全防火墙已经阻止了该请求。造成这种情况的常见原因是:
发布为parameter
在这里,您将传递一个自定义生成的URL (安全,因为您知道,它是您创建的!)作为参数;在规则手册中,这是有效负载注入的典型示例。要绕过它,请尝试一种不使用URL作为参数的不同方法。
https://stackoverflow.com/questions/68298543
复制相似问题