我试图抓取几个包含事件信息的链接。我正在旋转由UserAgent库生成的付费代理和用户代理。Imperva,它需要一个美国IP,是如此敏感,即使它不允许我的浏览器事件,如果我使用一个免费的美国代理!
我是在一个不和谐的频道里问这个问题的。有人联系我,说可以绕过Imperva,但他不能告诉我怎么做,因为他不想让我成为票务市场的竞争对手:
除了用户代理和代理之外,我还试图模仿浏览器成功的请求头,但它没有工作。我只有405和403。我将尝试抓取事件部分,但我甚至看不到我拥有的27个链接中的任何一个都有200个响应(我在下面添加了一些)。
,你认为Imperva怎么会被刮伤或请求绕过?,也可以推荐我一个学术资源,我可以学习,以发展我的刮除技能。
的一些链接,我试图刮
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=我的蜘蛛代码,它由一个从文件导入代理的类和蜘蛛代码本身组成。我将代理添加为元值,如刮伤文件中所述。我使用下载延迟:
import scrapy
from scrapy import Request
from random_user_agent.user_agent import UserAgent
import random
import pandas as pd
class ProxyFunctions:
(...)
class AlexSpider(scrapy.Spider):
name = 'alex'
s = ProxyFunctions()
s.prox_list_fixer() #proxylerin bulunduğu txt'yi düzelip yeni bir txt oluşturdu.
proxies = s.imp_proxies()
def __init__(self):
self.root = "https://partnercarrier.com"
self.start_url = "https://partnercarrier.com/PA/"
#self.initial_links = self.imp_links() dosyadan tüm linkler eklendiğinde kullanılacak
user_agent_rotator = UserAgent(software_names=['chrome'], operating_systems=['windows', 'linux'])
self.user_agents = user_agent_rotator.get_user_agents()
#self.root_link = "https://www.google.com"
self.UA_rand = random.choice(self.user_agents)['user_agent'] #User Agent set
#self.UA_LIST = open("/home/draco/docs/scraping/scrapyyy/thomas/USER_AGENTS.txt","r") #manual UA importation from text
#dosyadaki proxy listesinden random proxy alır
def imp_randp(self, path="/home/draco/docs/scraping/scrapyyy/thomas/proxies.txt"):
with open (path) as PROXIES:
lines = PROXIES.readlines()
return random.choice(lines).strip()
#dosyadan linkleri alır
def imp_links(self, path="/home/draco/docs/scraping/Selenium/inputs.csv"):
x = pd.read_csv(path)
links = x['Url']
links = [i for i in links]
return links
def start_requests(self):
print("INITIAL REQUEST")
links = self.imp_links()
for link in links:
print(f"---INFO: Requesting page=> {link}")
proxy = random.choice(self.proxies)
#print("---INFO: Using proxy => ", proxy)
h = {
'User-Agent': random.choice(self.user_agents)['user_agent'],
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'tr-TR,tr;q=0.9,en-US;q=0.8,en;q=0.7',
'Cache-Control': 'max-age=0',
'Connection': 'keep-alive',
'Host' : link.split("/")[2],
'Sec-Fetch-Dest': 'document',
'Upgrade-Insecure-Requests': '1',
'Sec-Fetch-Mode': 'navigate',
'sec-ch-ua-platform': '"Linux"',
'sec-ch-ua' : '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
}
b = 'groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode='
yield Request(
url = link ,
callback = self.parse_gen,
headers = {"user-agent": random.choice(self.user_agents)['user_agent']},
meta = {"proxy": proxy},
body = b,
dont_filter= True
)
def parse_gen(self, response):
print("---INFO: General parser opened. PARSER1")我的终端输出:
draco@draco:~/docs/scraping/scrapyyy/upwork$ scrapy crawl alex
https://umasstix.evenue.net
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Scrapy 2.5.1 started (bot: upwork)
2022-03-20 20:23:01 [scrapy.utils.log] INFO: Versions: lxml 4.5.0.0, libxml2 2.9.10, cssselect 1.1.0, parsel 1.6.0, w3lib 1.22.0, Twisted 21.7.0, Python 3.8.10 (default, Nov 26 2021, 20:14:08) - [GCC 9.3.0], pyOpenSSL 22.0.0 (OpenSSL 1.1.1m 14 Dec 2021), cryptography 36.0.1, Platform Linux-5.13.0-35-generic-x86_64-with-glibc2.29
2022-03-20 20:23:01 [scrapy.utils.log] DEBUG: Using reactor: twisted.internet.epollreactor.EPollReactor
2022-03-20 20:23:01 [scrapy.crawler] INFO: Overridden settings:
{'AUTOTHROTTLE_ENABLED': True,
'BOT_NAME': 'upwork',
'CONCURRENT_REQUESTS_PER_DOMAIN': 14,
'HTTPCACHE_ENABLED': True,
'NEWSPIDER_MODULE': 'upwork.spiders',
'SPIDER_MODULES': ['upwork.spiders']}
2022-03-20 20:23:01 [scrapy.extensions.telnet] INFO: Telnet Password: 7f185fdb1347847f
2022-03-20 20:23:01 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.memusage.MemoryUsage',
'scrapy.extensions.logstats.LogStats',
'scrapy.extensions.throttle.AutoThrottle']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats',
'scrapy.downloadermiddlewares.httpcache.HttpCacheMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2022-03-20 20:23:05 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2022-03-20 20:23:05 [scrapy.core.engine] INFO: Spider opened
2022-03-20 20:23:05 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2022-03-20 20:23:05 [scrapy.extensions.httpcache] DEBUG: Using filesystem cache storage in /home/draco/docs/scraping/scrapyyy/upwork/.scrapy/httpcache
2022-03-20 20:23:05 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6024
INITIAL REQUEST
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://umasstix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=MCCON&linkID=umass&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (403) <GET https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=> (referer: None) ['cached']
---INFO: Requesting page=> https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=
2022-03-20 20:23:05 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=> (referer: None) ['cached']
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=COL&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=FOR&linkID=tktldr&shopperContext=&caller=appList&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketleader.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=AMP&linkID=tktldr&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://budweisergardens.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-labatt&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://pplcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-allentown&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ynottix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=global-odu&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://csutickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=WC&linkID=csuwc&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://tsongascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=C&linkID=global-lowell&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://wellsfargocenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-wachovia&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://stridebankcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EGS&linkID=global-enid&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://cureinsurancearena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-sovereign&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ticketstar.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=RCCO&linkID=pmi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://hyveetix.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONC&linkID=global-iowa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://portland5.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=pcpa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://selectyourtickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=PP&linkID=rgp&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://ictickets.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=ICI&linkID=nampa&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
---INFO: General parser opened. PARSER1
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xlcenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=XL&linkID=global-hartford&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://tdplace.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS&linkID=ottawa67&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://liacourascenter.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=global-temple&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://libertyfirstcreditunionarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EVENTS.1&linkID=global-ralston&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://semo.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=twsemo&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://treventscomplex.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CO&linkID=global-bud&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://xtreamarena.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CON&linkID=coralville-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:05 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <403 https://enmaxcentre.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=EC&linkID=lethbridge-multi&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://ticketatlantic.evenue.net/cgi-bin/ncommerce3/SEGetGroupList?groupCode=CONCERT&linkID=halifax&shopperContext=&caller=&appCode=>: HTTP status code is not handled or not allowed
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Closing spider (finished)
2022-03-20 20:23:06 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 15189,
'downloader/request_count': 27,
'downloader/request_method_count/GET': 27,
'downloader/response_bytes': 304575,
'downloader/response_count': 27,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/403': 16,
'downloader/response_status_count/405': 10,
'elapsed_time_seconds': 0.444587,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2022, 3, 20, 17, 23, 6, 67887),
'httpcache/hit': 27,
'httperror/response_ignored_count': 26,
'httperror/response_ignored_status_count/403': 16,
'httperror/response_ignored_status_count/405': 10,
'log_count/DEBUG': 28,
'log_count/INFO': 36,
'memusage/max': 126562304,
'memusage/startup': 126562304,
'response_received_count': 27,
'scheduler/dequeued': 27,
'scheduler/dequeued/memory': 27,
'scheduler/enqueued': 27,
'scheduler/enqueued/memory': 27,
'start_time': datetime.datetime(2022, 3, 20, 17, 23, 5, 623300)}
2022-03-20 20:23:06 [scrapy.core.engine] INFO: Spider closed (finished)发布于 2022-04-14 19:39:46
我绕过imperva使用真正的铬浏览器,使用浏览器扩展来自动化进程和美国移动代理。imperva检查了以下情况
https://stackoverflow.com/questions/71549042
复制相似问题