我正在尝试从一个网站下载和保存讲座视频。虽然我已经成功地下载了这些文件,但它们不会在我的媒体播放器中播放。下面是我使用的代码:
from bs4 import BeautifulSoup
import re
import urllib2
snippet = open('Python/SNA Page Source Revised.txt', 'r')
soup = BeautifulSoup(snippet)
links = [link.get('href') for link in soup.find_all('a')]
videos = []
for link in links:
match = re.search('.*mp4.*', link)
if match:
videos.append(link)
vidNum = 1
for video in videos:
f = urllib2.urlopen(video)
with open('Data Analysis/Social Network Analysis/Video '+vidNum+'.mp4', 'wb') as code:
code.write(f.read())
vidNum += 1一切看起来都很好,但是当我尝试播放其中一个视频时,我得到了一个错误:"Python (v2.7)需要安装插件来播放以下类型的媒体文件: text/html解码器“。此外,如果我手动从网站下载视频,文件大约为22.8MB,但是当我使用脚本时,文件只有7.8 if。
我下载文件的方式有问题吗?任何帮助都将不胜感激。
另外:我使用Pythonv2.7在Ubuntu12.04LTS操作系统上操作。
****EDIT__****
下面是我根据收到的响应使用的代码:
import requests
r = requests.get('https://class.coursera.org/sna-003/lecture/download.mp4?lecture_id=2', auth=('myUsername', 'myPassword'))
with open('Data Analysis/TestFile.mp4', 'wb') as fd:
fd.write(r.content)下面是r.content的输出:
<!DOCTYPE html>
<html itemtype="http://schema.org" xmlns:fb="http://ogp.me/ns/fb#"><head><meta content="IE=Edge,chrome=IE7" http-equiv="X-UA-Compatible"/><meta content="!" name="fragment"/><meta content="NOODP" name="robots"/><meta charset="utf-8"/><meta content="Coursera" property="og:title"/><meta content="website" property="og:type"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" property="og:image"/><meta content="https://www.coursera.org/" property="og:url"/><meta content="Coursera" property="og:site_name"/><meta content="en_US" property="og:locale"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." property="og:description"/><meta content="727836538,4807654" property="fb:admins"/><meta content="274998519252278" property="fb:app_id"/><meta content="Take free online classes from 80+ top universities and organizations. Coursera is a social entrepreneurship company partnering with Stanford University, Yale University, Princeton University and others around the world to offer courses online for anyone to take, for free. We believe in connecting people to a great education so that anyone around the world can learn without limits." name="description"/><meta content="http://s3.amazonaws.com/coursera/media/Coursera_Computer_Narrow.png" name="image"/><meta content="app-id=736535961" name="apple-itunes-app"/><script>window.onerror = function(message, url, lineNum) {
// First check the URL and line number of the error
url = url || window.location.href;
// 99% of the time, errors without line numbers arent due to our code,
// they are due to third party plugins and browser extensions
if (lineNum === undefined || lineNum == null) return;
// Now figure out the actual error message
// If it's an event, as triggered in several browsers
if (message.target && message.type) {
message = message.type;
}
if (!message.indexOf) {
message = 'Non-string, non-event error: ' + (typeof message);
}
var errorDescrip = {
message: message,
script: url,
line: lineNum,
url: document.URL
}
var err = {
key: 'page.error.javascript',
value: errorDescrip
}
window._204 = window._204 || [];
window._204.push(err);
window._gaq = window._gaq || [];
window._gaq.push(err);
}</script><title>Coursera.org</title><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/css/home.css" rel="stylesheet" type="text/css"/><link href="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/pages/auth/css/auth.css" rel="stylesheet" type="text/css"/><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" id="_mobile">(function(el) {
// Override certian behaviour if the page is for our mobile app.
// TODO(priya) Remove this conditional behaviour once I want to push this behaviour
// for regular authentication pages on mobile/smaller screens as well.
// Currently I'm keeping existing behaviour same and only adding mobile specific
// layouts ot /mobilesignup page (which is what isMobileApp = true signifies).
if ("false" == "true") {
var head = document.getElementsByTagName('head')[0];
// Add viewport meta tag
var viewport = document.querySelector('meta[name=viewport]');
var viewportContent = 'width=device-width, initial-scale=1.0, user-scalable=no';
if (!viewport) {
viewport = document.createElement('meta');
viewport.setAttribute('name', 'viewport');
head.appendChild(viewport);
}
viewport.setAttribute('content', viewportContent);
// Add responsive css
var link = document.createElement('link');
link.rel = 'stylesheet';
link.type = 'text/css';
link.href = el.getAttribute("data-baseurl") + "pages/auth/css/auth_responsive.css";
head.appendChild(link);
}
})(document.getElementById("_mobile"));
</script></head><body><div id="fb-root"></div><div id="origami"><div style="position:absolute;top:0px;left:0px;width:100%;height:100%;background:#f5f5f5;padding-top:5%;"><div id="coursera-loading-nojs" style="text-align:center; margin-bottom:10px;display:none;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div><div><span id="coursera-loading-js" style="display: none; padding-left:45%">loading <img src="https://d2wvvaown1ul17.cloudfront.net/site-static/images/icons/loading.gif"/></span></div><noscript><div style="text-align:center; margin-bottom:10px;">Please use a <a href="/browsers">modern browser </a> with JavaScript enabled to use Coursera.</div></noscript></div></div><!--[if gte IE 8]><script>document.getElementById("coursera-loading-js").style.display = 'block';</script><![endif]-->
<!--[if lte IE 7]><script>document.getElementById("coursera-loading-nojs").style.display = 'block';
window._204 = window._204 || [];
window._gaq = window._gaq || [];
window._gaq.push(
['_setAccount', 'UA-28377374-1'],
['_setDomainName', window.location.hostname],
['_setAllowLinker', true],
['_trackPageview', window.location.pathname]);
window._204.push(
['client', 'home'],
{key:"pageview", value:window.location.pathname});
</script><script src="https://eventing.coursera.org/204.min.js"></script><script src="https://ssl.google-analytics.com/ga.js"></script><![endif]-->
<!--[if !IE]> --><script>document.getElementById("coursera-loading-js").style.display = 'block';</script><!-- <![endif]--><script src="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/js/core/require.js" type="text/javascript"></script><script data-baseurl="https://d1rlkby5e91r2j.cloudfront.net/e47434615f57601f9b9ccaf255a589e8550d328d/" data-debug="0" data-locale="" data-timestamp="1386838999742" data-version="e47434615f57601f9b9ccaf255a589e8550d328d" id="_require" type="text/javascript">if(document.getElementById("coursera-loading-js").style.display == 'block') {
(function(el) {
// prevent throw
require.onError = function(err) {
window._204 = window._204 || [];
window._204.push({key: 'requireErr', value: err});
};
define("pages/auth/authConfig",
function() {
return {"coursera_url": "https://www.coursera.org/",
"environment": "production"};
}
);
require.config({
enforceDefine: false,
waitSeconds: 14,
baseUrl: el.getAttribute("data-baseurl"),
urlArgs: el.getAttribute("data-debug") == "1" ? "v=" + el.getAttribute("data-timestamp") : "",
shim: {
"underscore": {
exports: '_'
},
"backbone": {
deps: ['underscore', 'jquery'],
exports: 'Backbone'
}
},
paths: {
"jquery": "js/core/jquery",
"underscore": "js/core/underscore",
"backbone": "js/core/backbone",
"i18n": "js/core/i18n._t"
},
callback: function() {
require(["pages/auth/routes"]); // bootup coursera
},
config: {
i18n: {
locale: (window.localStorage ? localStorage.getItem("locale") : '') || el.getAttribute("data-locale")
}
}
});
})(document.getElementById("_require"));
}</script><script type="text/javascript">define("pages/home/models/user.json", [], function(){
return null;
});
</script></body></html>不过,我觉得这很奇怪,因为它看起来就像网站的源代码,但是当我看到r.url时,我得到了一个可以在浏览器中加载的实际网站,它会提示我保存或查看视频。即使当我试图传递我从中获得的新的url (我假设它包含了我的cookie信息)时,我仍然会得到相同的内容。我不明白我哪里出了问题。
发布于 2015-02-06 20:52:13
首先,下载并安装请求包。
然后使用以下代码:
import requests
def downloadfile(name,url):
name=name+".mp4"
r=requests.get('url')
print "****Connected****"
f=open(name,'wb');
print "Donloading....."
for chunk in r.iter_content(chunk_size=255):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
print "Done"
f.close()发布于 2013-12-21 21:25:31
您需要有一个有效的cookie,这样才不会下载登录页面。
下面是如何在urllib2上设置cookie
import urllib2
opener = urllib2.build_opener()
opener.addheaders.append(('Cookie', 'cookiename=cookievalue'))
f = opener.open("http://example.com/")此外,您还可以使用炊事来拥有更多类似于web浏览器的行为,以进行登录过程,并获得正确的cookie来下载您的电影。
另一种方法是使用请求 --类似于urllib2 --使自动登录过程变得更容易。
发布于 2013-12-22 00:59:55
我首先将文件保存为.html而不是.mp4,这样您就可以100%肯定它不是登录页/错误页或其他杂项垃圾。有些网站需要cookies、特定的用户代理(阻止机器人/刮刀/自动漏洞扫描器)、推荐人之类的东西。
我个人使用篡改数据或活动http头,以确保我的程序在调试时工作。
如果您收到了cloudfront响应,那么您可能没有正确地处理cookie/用户代理/refferer。
我刚刚检查了链接,还有一个cookie {csrf_token=toNQOP7stgOREzrDcbPc},您将100%需要它来查看任何通过登录页面的内容。
https://stackoverflow.com/questions/20723538
复制相似问题