首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Selenium crawling HTTP Error 403:使用wget时的ModSecurity操作

Selenium crawling HTTP Error 403:使用wget时的ModSecurity操作
EN

Stack Overflow用户
提问于 2021-07-08 16:57:54
回答 1查看 90关注 0票数 1

我试着从一个网站上抓取图片。我设法获得了图像的链接,但当我使用wget下载图像时,我一直收到HTTP Error 403: ModSecurity操作

以下是我的代码

代码语言:javascript
复制
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
import wget
import time
import os
import urllib.request

driver = webdriver.Chrome('chromedriver-path')
url = ("https://www.ancuong.com/vi/san-pham/san-pham-chinh/van-mfc--cac-loai-van-phu-melamine/melamine-phu-tren-mdf-melamine-mdf/page-woodgrain.html")
driver.get(url)

n = 0
while n <= 1500:
    driver.execute_script("window.scrollTo(0, {})".format(n))
    n+=200
    time.sleep(0.1)

images = WebDriverWait(driver, 60).until(
            EC.visibility_of_all_elements_located((By.CLASS_NAME, 'load-done'))
        )

imgLinks = []
for image in images:
  imdLink = image.get_attribute('src') 
  imgLinks.append(imgLink)

print(imgLinks)
time.sleep(1)
driver.quit()

path = os.getcwd()
path = os.path.join(path, "an-cuong-images")
os.mkdir(path)
counter = 0
for imgLink in imgLinks:
    save_as = os.path.join(path, "an-cuong-plywood" + str(counter) + '.jpg')
    wget.download(imgLink, save_as)
    counter += 1

我得到的错误是

代码语言:javascript
复制
    File "D:\Jobs\dream\scrape info\scrape image- python\selenium_crawling.py", line 43, in <module>
    wget.download(imgLink, save_as)
  File "C:\Users\My Lap\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.9_qbz5n2kfra8p0\LocalCache\local-packages\Python39\site-packages\wget.py", line 526, in download
    (tmpfile, headers) = ulib.urlretrieve(binurl, tmpfile, callback)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 239, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 214, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 523, in open
    response = meth(req, response)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 632, in http_response
    response = self.parent.error(
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 561, in error
    return self._call_chain(*args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 494, in _call_chain
    result = func(*args)
  File "C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.1520.0_x64__qbz5n2kfra8p0\lib\urllib\request.py", line 641, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: ModSecurity Action

我该如何解决这个问题。提前感谢您的帮助!

EN

回答 1

Stack Overflow用户

发布于 2021-07-09 00:58:00

ModSecurity是一个开源的、跨平台的web应用程序防火墙模块。https://modsecurity.org/about.html

因此,每当您看到403 (ModSecurity操作)时,这意味着mod安全防火墙已经阻止了该请求。造成这种情况的常见原因是:

发布为parameter

  • JavaScript attribute violation的
  • 恶意有效负载注入
  • 任何其他跨站点脚本(XSS)尝试

在这里,您将传递一个自定义生成的URL (安全,因为您知道,它是您创建的!)作为参数;在规则手册中,这是有效负载注入的典型示例。要绕过它,请尝试一种不使用URL作为参数的不同方法。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68298543

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档