文章/答案/技术大牛

发布

社区首页 >问答首页 >从CTF站点刮点

问从CTF站点刮点
EN

Code Review用户

提问于 2016-09-07 15:16:15

回答 1查看 159关注 0票数 7

我对上课比较陌生。这个用的是一个。

我很肯定这不是正确的方法。但我也不知道正确的方法。

在创建具有certified_secure URL的爬虫实例时，可以使用hackthissite.org函数。

我有很多评论吗？是不是太冗长了？

#! usr/bin/env python

import bs4
import requests

users = ['user1', 'user2']
certified_secure_url = 'https://www.certifiedsecure.com/profile?alias='
hack_this_site_url = 'https://www.hackthissite.org/user/view/'


# this function takes a string as input and outputs a list with all the integers in the string
def get_num(string):

    # get the numbers from string
    lst = ''.join([x if x.isdigit() else ' ' for x in string]).split()

    # change to list of ints instead of strings
    new_lst = []
    for item in lst:
        new_lst.append(int(item))

    return new_lst


class Crawler(object):
    def __init__(self, url):
        self.url = url

    # retrieve data from site and
    def get_site_data(self, user):
        request = requests.get(self.url + user)
        return bs4.BeautifulSoup(request.text, 'lxml')

    def certified_secure(self, user):
        experience = self.get_site_data(user).select('.level_progress_details')[0].getText()
        # get the points from the string
        return get_num(experience)[1]

    def hack_this_site(self, user):
        experience = self.get_site_data(user).select('.blight-td')[1].getText()
        return get_num(experience)[0]


# make to instances to crawl
cs = Crawler(certified_secure_url)
hts = Crawler(hack_this_site_url)

for user in users:
    print cs.certified_secure(user)
    print hts.hack_this_site(user)

python

object-oriented

python-2.x

beautifulsoup

回答 1

Code Review用户

回答已采纳

发布于 2016-09-07 20:00:00

您可以使用get_num中的列表理解：

def get_num(string):
    """this function takes a string as input and outputs a list with all the integers in the string"""
    # get the numbers from string
    numbers = ''.join(x if x.isdigit() else ' ' for x in string).split()

    # change to list of ints instead of strings
    return [int(number) for number in numbers]
    # return map(int, numbers)  # Alternative

还请注意，join可以接受生成器表达式，因此不需要首先转换为list。我还在这里选择了更多的描述性变量名。

您还应该为您的函数(和类)提供一个docstring (就像我前面所做的那样，将一个三重“”分隔字符串作为函数体的第一行)，您可以通过help(function_name)交互地访问它，许多文档构建工具都使用它。

它似乎也有点过于手动，不知道根据url调用哪种方法。你的爬虫可以自己决定：

class Crawler(object):
    sites = {"hackthissite.org": ('.blight-td', 1, 0),
             "certifiedsecure.com": ('.level_progress_details', 0, 1)}

    def __init__(self, url):
        self.url = url
        self.options = self.get_sites_options(Crawler.sites)

    def get_sites_options(self, sites):
        for site, options in sites.items():
            if self.url in site:
                return options

    def get_site_data(self, user):
        """retrieve data from site and"""
        request = requests.get(self.url + user)
        return bs4.BeautifulSoup(request.text, 'lxml')

    def get_experience(self, user):
        select_str, index, out_index = self.options
        experience = self.get_site_data(user).select(select_str)[index].getText()
        return get_num(experience)[out_index]

票数 3

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/140753

复制

相似问题

问从CTF站点刮点
EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从CTF站点刮点EN

回答 1

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问从CTF站点刮点
EN