首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用Scrapy登录Quora

使用Scrapy登录Quora
EN

Stack Overflow用户
提问于 2016-04-24 20:19:48
回答 1查看 426关注 0票数 2

我正在尝试使用Scrapy登录Quora,但是我没有成功,它指示400或500代码,对应于我的表单数据。

我通过Chrome找到了表单数据:

代码语言:javascript
复制
General
Request URL:https://www.quora.com/webnode2/server_call_POST?__instart__
Request Method:POST
Status Code:200
Remote Address:103.243.14.60:443

Form Data
json:{"args":[],"kwargs":{"email":"1liusai253@163.com","password":"XXXX","passwordless":1}}
formkey:750febacf08976a47c82f3e10af83305
postkey:dab46d0df2014d1568ead6b2fbad7297
window_id:dep3300-2420196009402604566
referring_controller:index
referring_action:index
_lm_transaction_id:0.2598935768985011
_lm_window_id:dep3300-2420196009402604566
__vcon_json:["Vn03YsuKFZvHV9"]
__vcon_method:do_login
__e2e_action_id:ee1qmp1iit
js_init:{}

接下来是我的代码示例,一个普通的Scrapy流。我认为问题出在表单数据上。有人能帮上忙吗?

代码语言:javascript
复制
import scrapy
import re

class QuestionsSpider(scrapy.Spider):
    name = 'questions'
    domain = 'https://www.quora.com'
    headers = {
            "Accept": "application/json, text/javascript, */*; q=0.01",
            "Accept-Language": "zh-Hans-CN,zh-Hans;q=0.8,en-US;q=0.5,en;q=0.3",
            "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/49.0.2623.108 Chrome/49.0.2623.108 Safari/537.36",
            "Accept-Encoding": "gzip, deflate",
            "Host": "www.quora.com",
            "Connection": "Keep-Alive",
            "content-type":"application/x-www-form-urlencoded"
        }

def __init__(self, login_url = None):
    self.login_url = 'https://www.quora.com/webnode2/server_call_POST?__instart__' # Here is the login URL of Quora

def start_requests(self):
    body = response.body
    formkey_patt = re.compile(r'.*?"formkey".*?"(.*?)".*?',re.S)
    formkey = re.findall(formkey_patt, body)[0]
    postkey_patt = re.compile('.*?"postkey".*?"(.*?)".*?',re.S)
    postkey = re.findall(postkey_patt, body)[0]
    window_id_patt = re.compile('.*?window_id.*?"(.*?)".*?',re.S)
    window_id = re.findall(window_id_patt, body)[0]

    referring_controller = 'index'
    referring_action = 'index'
    __vcon_method = 'do_login'

    yield scrapy.Request(
        url = self.domain,
        headers = self.headers,
        meta = {'cookiejar':1},
        callback = self.start_login
        )

def start_login(self,response):
    yield scrapy.FormRequest.from_response(
            response,
            url = self.login_url,
            meta = {'cookiejar':response.meta['cookiejar']},
            headers = self.headers,
            formdata = {"json":{"args":[],"kwargs":{"email":"xxxx","password":"xxx"}},
            "formkey":formkey,
            "postkey":postkey,
            "window_id":window_id,
            "referring_controller":referring_controller,
            "referring_action":referring_action,
            "__vcon_method":__vcon_method,
            "__e2e_action_id":"ee1qmp1iit"
            },
            callback = self.after_login
        )

def after_login(self, response):
    print response.body
EN

回答 1

Stack Overflow用户

发布于 2016-04-25 04:41:52

你没有设置或者发送formkey,postkey,window_id,等等。这就是为什么你应该从响应中获取它们的原因。也就是说,您需要使用FormRequest.from_response()

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/36823009

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档