文章/答案/技术大牛

发布

社区首页 >问答首页 >网站图像下载器

问网站图像下载器
EN

Code Review用户

提问于 2017-04-29 15:46:18

回答 2查看 1.6K关注 0票数 3

此代码获取一个网站，并在网页中下载所有.jpg图像。它只支持具有<img>元素和src包含.jpg链接的网站。

(在这里测试)

import random
import urllib.request
import requests
from bs4 import BeautifulSoup

def Download_Image_from_Web(url):
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    raw_text = r'links.txt'
    with open(raw_text, 'w') as fw:
        for link in soup.findAll('img'):
            image_links = link.get('src')
            if '.jpg' in image_links:
                for i in image_links.split("\\n"):
                    fw.write(i + '\n')
    num_lines = sum(1 for line in open('links.txt'))
    if num_lines == 0:
        print("There is 0 photo in this web page.")
    elif num_lines == 1:
        print("There is", num_lines, "photo in this web page:")
    else:
        print("There are", num_lines, "photos in this web page:")
    k = 0
    while k <= (num_lines-1):
        name = random.randrange(1, 1000)
        fullName = str(name) + ".jpg"
        with open('links.txt', 'r') as f:
            lines = f.readlines()[k]
            urllib.request.urlretrieve(lines, fullName)
            print(lines+fullName+'\n')
        k += 1

Download_Image_from_Web("https://pixabay.com")

python

beginner

python-3.x

web-scraping

回答 2

Code Review用户

发布于 2017-04-29 16:25:55

不必要的文件操作

这是极其低效的：

K=0而k <= (num_ lines -1)：name = random.randrange(1,1000) fullName = str(name) + ".jpg“并打开(‘lines s.txt’，'r')作为f: line= f.readlines()K urllib.request.urlretrieve(line，fullName)打印(line+fullName+‘\n’)k += 1

重新读取相同的文件num_lines时间，下载k！

顺便说一句，你真的需要把urls列表写到文件中吗？为什么不把它们列在单子上呢？即使您想要文件中的urls，您也可以将它们保存在内存中的列表中，而不读取该文件，只需写。

代码组织

与其将所有代码都放在一个执行多项任务的函数中，不如将您的程序组织成更小的函数，每个函数都有一个单独的职责。

Python约定

Python在PEP8中有一组定义良好的编码约定，其中许多在这里被违反。我建议阅读这份文件，并尽可能多地遵循。

票数 4

Code Review用户

发布于 2017-04-29 18:20:28

下面这个怎么样？

import random
import requests
from bs4 import BeautifulSoup

# got from http://stackoverflow.com/a/16696317
def download_file(url):
    local_filename = url.split('/')[-1]
    print("Downloading {} ---> {}".format(url, local_filename))
    # NOTE the stream=True parameter
    r = requests.get(url, stream=True)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return local_filename

def Download_Image_from_Web(url):
    source_code = requests.get(url)
    plain_text = source_code.text
    soup = BeautifulSoup(plain_text, "html.parser")
    for link in soup.findAll('img'):
        image_links = link.get('src')
        if not image_links.startswith('http'):
            image_links = url + '/' + image_links
        download_file(image_links)

Download_Image_from_Web("https://pixabay.com")

票数 1

页面原文内容由Code Review提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://codereview.stackexchange.com/questions/162123

复制

相似问题

问网站图像下载器
EN

回答 2

Code Review用户

不必要的文件操作

代码组织

Python约定

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网站图像下载器EN

回答 2

Code Review用户

不必要的文件操作

代码组织

Python约定

Code Review用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问网站图像下载器
EN