blocks|key|2894502|text|再也不会了。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2894503|Heroku可接受的使用策略状态在禁止行动第21页中，爬虫必须|offset|length|2894504|通过唯一的用户代理来标识自己|unordered-list-item|2894505|服从robots.txt+(包括爬行延迟指令)|2894506|从第20页起，要求不使用您的爬虫作为“开放代理”。|2894507|NB!+--一天工作时间不得超过18小时。|style|BOLD|2894508|entityMap|0|LINK|mutability|MUTABLE|url|https://www.heroku.com/policy/aup#prohibited_actions|1|https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive^0|0|H|8|0|0|0|G|6|1|0|0|0|3|0^^$0|@$1|2|3|4|5|6|7|10|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|11|8|@]|9|@$D|12|E|13|1|14]]|A|$]]|$1|F|3|G|5|H|7|15|8|@]|9|@]|A|$]]|$1|I|3|J|5|H|7|16|8|@]|9|@$D|17|E|18|1|19]]|A|$]]|$1|K|3|L|5|H|7|1A|8|@]|9|@]|A|$]]|$1|M|3|N|5|6|7|1B|8|@$D|1C|E|1D|O|P]]|9|@]|A|$]]|$1|Q|3|-4|5|6|7|1E|8|@]|9|@]|A|$]]]|R|$S|$5|T|U|V|A|$W|X]]|Y|$5|T|U|V|A|$W|Z]]]]

Not any more.

Heroku Acceptable Use Policy states in <a href="https://www.heroku.com/policy/aup#prohibited_actions" rel="nofollow">Prohibited Actions p.21</a> that crawler must 

<ul>
<li>identify itself via a unique User Agent</li>
<li>obey robots.txt (including <a href="https://en.wikipedia.org/wiki/Robots_exclusion_standard#Crawl-delay_directive" rel="nofollow">crawl-delay directive</a>)</li>
<li>from p.20 stems the requirement not use you crawler as an "open proxy"</li>
</ul>

NB! A free instance must not exceed 18 hours of work a day.

blocks|key|2894462|text|我没有在Heroku使用网络爬虫的任何经验(实际上我有兴趣阅读这方面的内容！)但以下是我的观点：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2894463|这是它的违禁内容。非法活动是禁止的(杜赫)，由于一些网站“禁止”网络爬虫和屏幕刮刀(如IMDb)，这可以被认为是非法的。但让我们暂时忽略这一点。|ordered-list-item|offset|length|2894464|这是它的禁止行动。禁止下列行为：
数据挖掘任何网站属性(包括Heroku)，以查找电子邮件地址或其他用户帐户信息；|2894465|这些是它的使用限制：|2894466|-+Network+Bandwidth:+2TB/month+-+Soft
-+Shared+DB+processing:+Max+200msec+per+second+CPU+time+-+Soft
-+Dyno+RAM+usage:+512MB+-+Hard
-+Slug+Size:+200MB+-+Hard
-+Request+Length:+30+seconds+-+Hard|code-block|syntax|javascript|2894467|​|2894468|在它的TOS点2.5.中，解释了：
重复超过硬或软使用限制可能导致您的帐户终止。|style|BOLD|2894469|重点是我的。Heroku给每个应用程序750个dyno小时。只要你不滥用Heroku的服务，也不使用它来收集个人信息，我相信你是清楚的。我建议：|2894470|以某种方式限制你的网络爬虫。正如您应该限制API请求的速率一样，您应该有共同的礼貌限制爬虫的速度。|2894471|注意你的工作时间。你可以这样做，这里。|2894472|entityMap|0|LINK|mutability|MUTABLE|url|https://policy.heroku.com/aup#prohibited_content|1|http://www.imdb.com/help/show_article?conditions|2|https://policy.heroku.com/aup#prohibited_actions|3|https://policy.heroku.com/aup#quota|4|https://policy.heroku.com/tos|5|https://api.heroku.com/invoices/current^0|0|4|4|0|17|4|1|0|4|4|2|0|0|2|3|0|0|0|I|2|3|3|4|0|0|0|G|2|5|0^^$0|@$1|2|3|4|5|6|7|1J|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|1K|8|@]|9|@$E|1L|F|1M|1|1N]|$E|1O|F|1P|1|1Q]]|A|$]]|$1|G|3|H|5|D|7|1R|8|@]|9|@$E|1S|F|1T|1|1U]]|A|$]]|$1|I|3|J|5|D|7|1V|8|@]|9|@$E|1W|F|1X|1|1Y]]|A|$]]|$1|K|3|L|5|M|7|1Z|8|@]|9|@]|A|$N|O]]|$1|P|3|Q|5|6|7|20|8|@]|9|@]|A|$]]|$1|R|3|S|5|D|7|21|8|@$E|22|F|23|T|U]]|9|@$E|24|F|25|1|26]]|A|$]]|$1|V|3|W|5|6|7|27|8|@]|9|@]|A|$]]|$1|X|3|Y|5|D|7|28|8|@]|9|@]|A|$]]|$1|Z|3|10|5|D|7|29|8|@]|9|@$E|2A|F|2B|1|2C]]|A|$]]|$1|11|3|-4|5|6|7|2D|8|@]|9|@]|A|$]]]|12|$13|$5|14|15|16|A|$17|18]]|19|$5|14|15|16|A|$17|1A]]|1B|$5|14|15|16|A|$17|1C]]|1D|$5|14|15|16|A|$17|1E]]|1F|$5|14|15|16|A|$17|1G]]|1H|$5|14|15|16|A|$17|1I]]]]

I don't have any experience with using web crawlers in Heroku (I would actually be interested in reading about that!). But here are my points:

<ol>
<li>This is its <a href="https://policy.heroku.com/aup#prohibited_content" rel="nofollow">prohibited content</a>. Illegal activity is prohibited (duh) and since some sites "prohibit" web crawlers and screen scrapers (such as <a href="http://www.imdb.com/help/show_article?conditions" rel="nofollow">IMDb</a>), that could be considered illegal. But let's ignore this for now.</li>
<li>These are its <a href="https://policy.heroku.com/aup#prohibited_actions" rel="nofollow">prohibited actions</a>. The following is prohibited:

<blockquote>
 data mining any web property (including Heroku) to find email addresses or other user account information;
</blockquote></li>
<li><a href="https://policy.heroku.com/aup#quota" rel="nofollow">These</a> are its usage limits:

<ul>
<li>Network Bandwidth: 2TB/month - Soft</li>
<li>Shared DB processing: Max 200msec per second CPU time - Soft</li>
<li>Dyno RAM usage: 512MB - Hard</li>
<li>Slug Size: 200MB - Hard</li>
<li>Request Length: 30 seconds - Hard</li>
</ul></li>
<li>In its <a href="https://policy.heroku.com/tos" rel="nofollow">TOS</a>, point 2.5., it's explained:

<blockquote>
 Repeated exceeding of the hard or soft usage limits may lead to termination of your account.
</blockquote></li>
</ol>

Emphasis is mine. Heroku gives each app 750 dyno hours. As long as you don't abuse Heroku's services and don't use it to gather personal info, I believe you're in the clear. I suggest:

<ol>
<li>Somehow cap your web crawler. Just as you should limit the rate of API requests, you should have the common courtesy of limiting the speed of your crawler.</li>
<li>Keep an eye on your dyno hours. You can do so <a href="https://api.heroku.com/invoices/current" rel="nofollow">here</a>.</li>
</ol>

Does anybody have experience coding web crawlers with gems such as anemone and deploying them to heroku for your own person use? Would such a continuously running programs violate any of heroku's TOA/TOS?

Experience with web crawlers on heroku

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

是否有人有使用海葵等宝石编写网页爬虫的经验，并将它们部署到heroku供您自己使用？这样一个持续运行的程序会违反heroku的TOA/TOS吗？

问在heroku上使用网络爬虫的经验
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在heroku上使用网络爬虫的经验EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在heroku上使用网络爬虫的经验
EN