文章/答案/技术大牛

发布

社区首页 >问答首页 >如何使用MP3从网页下载所有MP3 URL？

问如何使用MP3从网页下载所有MP3 URL？
EN

Stack Overflow用户

提问于 2019-12-31 03:20:26

回答 3查看 12.6K关注 0票数 6

我正在尝试学习Python，并试图编写一段代码从我的教会网站下载所有圣经mp3文件，其中有一个mp3超链接列表，如：

第一章，第二章，第三章，第四章，第五章等等。参考链接

在运行我的代码后，我设法将所有的mp3 URL链接显示在shell上，但我似乎根本无法下载它们。

这是我的密码

import requests
import urllib.request
import re
from bs4 import BeautifulSoup

r = requests.get('https://ghalliance.org/resource/bible-reading')
soup = BeautifulSoup(r.content, 'html.parser')

for a in soup.find_all('a', href=re.compile('http.*\.mp3')):
    print(a['href'])

我确实尝试过使用wget，但我似乎无法让wget在运行VSCode Python3.8.1 64位或conda 3.7.4的机器上工作。我检查了conda和cmd，结果显示我的系统中有wget，我甚至手动将wget.exe下载到我的system32目录中，但是每当我尝试运行

wget.download(url)

我总是收到一条错误消息或类似wget之类的东西，它没有属性'download‘之类的东西。

我读了一些关于使用selenium、wget、漂亮汤下载简单图片的入门教程，但是我似乎不能结合他们的方法来解决mine...cuz这个特定的问题--总的来说，我对编程来说还是太新了，所以我很抱歉问了这样愚蠢的问题。

但是现在我已经拥有了所有的MP3 URL链接，所以我的问题是:如何使用下载它们？

python

python-3.x

python-requests

download

mp3

回答 3

Stack Overflow用户

回答已采纳

发布于 2019-12-31 12:52:04

请注意：

要从同一主机下载多个文件，您应该使用requests.Session()来维护TCP连接会话，而不是继续重复打开socket和closing的操作。
您应该使用stream=True来停止已损坏的下载。
在编写内容之前，您应该使用.status_code对response进行状态检查。
另外，您是否知道遗漏了两个文件名？它是Chiv Keeb 22mp3和Cov Thawjtswj 01mp3，其中的扩展应该是.mp3。

下面是实现目标的正确代码。

import requests
from bs4 import BeautifulSoup
import re

r = requests.get("https://ghalliance.org/resource/bible-reading/")
soup = BeautifulSoup(r.text, 'html.parser')

with requests.Session() as req:
    for item in soup.select("#playlist"):
        for href in item.findAll("a"):
            href = href.get("href")
            name = re.search(r"([^\/]+$)", href).group()
            if '.' not in name[-4]:
                name = name[:-3] + '.mp3'
            else:
                pass
            print(f"Downloading File {name}")
            download = req.get(href)
            if download.status_code == 200:
                with open(name, 'wb') as f:
                    f.write(download.content)
            else:
                print(f"Download Failed For File {name}")

票数 1

Stack Overflow用户

发布于 2019-12-31 03:58:45

由于您已经使用了库requests，您也可以使用requests下载mp3 (或任何文件)。

例如，如果您想从URL https://test.ghalliance.org/resources//bible_reading/audio/Chiv Keeb 01.mp3下载文件

doc = requests.get(https://test.ghalliance.org/resources//bible_reading/audio/Chiv%20Keeb%2001.mp3)

如果下载成功。mp3内容将存储在doc.content中，然后您需要打开文件并将数据写入该文件。

with open('myfile.mp3', 'wb') as f:
        f.write(doc.content)

此时，您有了带有文件名"myfile.mp3“的文件名，但是您可能希望保存到与URL中的名称相同的文件名。

让从URL中提取文件名。

filename = a['href'][a['href'].rfind("/")+1:]
with open(filename, 'wb') as f:
        f.write(doc.content)

现在，让我们把所有的东西放在一起。

import requests
import urllib.request
import re
from bs4 import BeautifulSoup

r = requests.get('https://ghalliance.org/resource/bible-reading')
soup = BeautifulSoup(r.content, 'html.parser')

for a in soup.find_all('a', href=re.compile(r'http.*\.mp3')):
    filename = a['href'][a['href'].rfind("/")+1:]
    doc = requests.get(a['href'])
    with open(filename, 'wb') as f:
        f.write(doc.content)

票数 8

Stack Overflow用户

发布于 2019-12-31 04:27:09

import requests
import urllib.request
import re
from bs4 import BeautifulSoup
i=0
r = requests.get('https://ghalliance.org/resource/bible-reading')
soup = BeautifulSoup(r.content, 'html.parser')
for a in soup.find_all('a', href=re.compile('http.*\.mp3')):
    i=i+1
    url = a['href']
    file=url.split()[1]
    urllib.request.urlretrieve(url, f"{file}_{i}.mp3")

使用urllib.request.urlretrieve(url, filename=None)可以将由URL表示的网络对象复制到本地文件中。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59539194

复制

相似问题

问如何使用MP3从网页下载所有MP3 URL？
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用MP3从网页下载所有MP3 URL？EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用MP3从网页下载所有MP3 URL？
EN