blocks|key|454460|text|我不太确定命令行选项。然而，你可以这样写你的蜘蛛。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|454461|class+MySpider(BaseSpider):

++++name+=+'my_spider'++++

++++def+__init__(self,+*args,+**kwargs):+
++++++super(MySpider,+self).__init__(*args,+**kwargs)+

++++++self.start_urls+=+[kwargs.get('start_url')]+|code-block|syntax|javascript|454462|并启动如下：scrapy+crawl+my_spider+-a+start_url="http://some_url"|offset|length|style|CODE|454463|entityMap^0|0|0|6|1H|0^^$0|@$1|2|3|4|5|6|7|O|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|Q|8|@$I|R|J|S|K|L]]|9|@]|A|$]]|$1|M|3|-4|5|6|7|T|8|@]|9|@]|A|$]]]|N|$]]

I'm not really sure about the commandline option. However, you could write your spider like this.

<pre><code>class MySpider(BaseSpider):

 name = 'my_spider' 

 def __init__(self, *args, **kwargs): 
 super(MySpider, self).__init__(*args, **kwargs) 

 self.start_urls = [kwargs.get('start_url')] 
</code></pre>

And start it like:
<code>scrapy crawl my_spider -a start_url="http://some_url"</code>

blocks|key|445473|text|允许多个url-参数比Peter建议的更简单的方法是将它们作为一个字符串，其中url由逗号分隔，如下所示：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|445474|-a+start_urls="http://example1.com,http://example2.com"|code-block|syntax|javascript|445475|在蜘蛛中，您只需将字符串拆分到'，‘，并获得一个urls数组：|445476|self.start_urls+=+kwargs.get('start_urls').split(',')|445477|entityMap^0|0|0|0|0^^$0|@$1|2|3|4|5|6|7|M|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|N|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|O|8|@]|9|@]|A|$]]|$1|I|3|J|5|D|7|P|8|@]|9|@]|A|$E|F]]|$1|K|3|-4|5|6|7|Q|8|@]|9|@]|A|$]]]|L|$]]

An even easier way to allow multiple url-arguments than what Peter suggested is by giving them as a string with the urls separated by a comma, like this:

<pre><code>-a start_urls="http://example1.com,http://example2.com"
</code></pre>

In the spider you would then simply split the string on ',' and get an array of urls:

<pre><code>self.start_urls = kwargs.get('start_urls').split(',')
</code></pre>

blocks|key|445409|text|使用scrapy解析命令。你可以用你的蜘蛛解析一个网址。url从命令中传递。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|445410|$+scrapy+parse+http://www.example.com/+--spider=spider-name|code-block|syntax|javascript|445411|http://doc.scrapy.org/en/latest/topics/commands.html#parse|offset|length|445412|entityMap|0|LINK|mutability|MUTABLE|url^0|0|0|0|1M|0|0^^$0|@$1|2|3|4|5|6|7|R|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|S|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|T|8|@]|9|@$I|U|J|V|1|W]]|A|$]]|$1|K|3|-4|5|6|7|X|8|@]|9|@]|A|$]]]|L|$M|$5|N|O|P|A|$Q|H]]]]

Use scrapy parse command. You can parse a url with your spider. url is passed from command.

<pre><code>$ scrapy parse http://www.example.com/ --spider=spider-name
</code></pre>

<a href="http://doc.scrapy.org/en/latest/topics/commands.html#parse" rel="noreferrer">http://doc.scrapy.org/en/latest/topics/commands.html#parse</a>

blocks|key|924808|text|Sjaak+Trekhaak有正确的想法，下面是如何允许倍数：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|924809|class+MySpider(scrapy.Spider):
++++"""
++++This+spider+will+try+to+crawl+whatever+is+passed+in+`start_urls`+which
++++should+be+a+comma-separated+string+of+fully+qualified+URIs.

++++Example:+start_urls=http://localhost,http://example.com
++++"""
++++def+__init__(self,+name=None,+**kwargs):
++++++++if+'start_urls'+in+kwargs:
++++++++++++self.start_urls+=+kwargs.pop('start_urls').split(',')
++++++++super(Spider,+self).__init__(name,+**kwargs)|code-block|syntax|javascript|924810|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|I|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|J|8|@]|9|@]|A|$E|F]]|$1|G|3|-4|5|6|7|K|8|@]|9|@]|A|$]]]|H|$]]

Sjaak Trekhaak has the right idea and here is how to allow multiples:

<pre><code>class MySpider(scrapy.Spider):
 """
 This spider will try to crawl whatever is passed in `start_urls` which
 should be a comma-separated string of fully qualified URIs.

 Example: start_urls=http://localhost,http://example.com
 """
 def __init__(self, name=None, **kwargs):
 if 'start_urls' in kwargs:
 self.start_urls = kwargs.pop('start_urls').split(',')
 super(Spider, self).__init__(name, **kwargs)
</code></pre>

blocks|key|445437|text|这是这个线程中对Sjaak+Trekhaak给出的方法的一个扩展。到目前为止，这种方法只能在提供一个url的情况下起作用。例如，如果您想提供这样的多个url，例如：|type|unstyled|depth|inlineStyleRanges|entityRanges|offset|length|data|445438|-a+start_url=http://url1.com,http://url2.com|code-block|syntax|javascript|445439|然后Scrapy+(我使用的是当前的稳定版本0.14.4)将终止，但有以下例外：|445440|error:+running+'scrapy+crawl'+with+more+than+one+spider+is+no+longer+supported|445441|但是，您可以通过为每个start+url选择一个不同的变量以及一个保存传递url数量的参数来解决这个问题。就像这样：|445442|-a+start_url1=http://url1.com+
-a+start_url2=http://url2.com+
-a+urls_num=2|445443|然后，您可以在您的蜘蛛中执行以下操作：|445444|class+MySpider(BaseSpider):

++++name+=+'my_spider'++++

++++def+__init__(self,+*args,+**kwargs):+
++++++++super(MySpider,+self).__init__(*args,+**kwargs)+

++++++++urls_num+=+int(kwargs.get('urls_num'))

++++++++start_urls+=+[]
++++++++for+i+in+xrange(1,+urls_num):
++++++++++++start_urls.append(kwargs.get('start_url{0}'.format(i)))

++++++++self.start_urls+=+start_urls|445445|这是一个有点丑陋的黑客，但它的工作。当然，显式地为每个url写下所有命令行参数是很乏味的。因此，将scrapy+crawl命令包装在Python中并在循环或其他地方生成命令行参数是有意义的。|style|CODE|445446|希望能帮上忙。:)|445447|entityMap|0|LINK|mutability|MUTABLE|url|https://stackoverflow.com/a/9682714/1125413^0|8|J|0|0|0|0|0|0|0|0|0|1D|C|0|0^^$0|@$1|2|3|4|5|6|7|18|8|@]|9|@$A|19|B|1A|1|1B]]|C|$]]|$1|D|3|E|5|F|7|1C|8|@]|9|@]|C|$G|H]]|$1|I|3|J|5|6|7|1D|8|@]|9|@]|C|$]]|$1|K|3|L|5|F|7|1E|8|@]|9|@]|C|$G|H]]|$1|M|3|N|5|6|7|1F|8|@]|9|@]|C|$]]|$1|O|3|P|5|F|7|1G|8|@]|9|@]|C|$G|H]]|$1|Q|3|R|5|6|7|1H|8|@]|9|@]|C|$]]|$1|S|3|T|5|F|7|1I|8|@]|9|@]|C|$G|H]]|$1|U|3|V|5|6|7|1J|8|@$A|1K|B|1L|W|X]]|9|@]|C|$]]|$1|Y|3|Z|5|6|7|1M|8|@]|9|@]|C|$]]|$1|10|3|-4|5|6|7|1N|8|@]|9|@]|C|$]]]|11|$12|$5|13|14|15|C|$16|17]]]]

This is an extension to <a href="https://stackoverflow.com/a/9682714/1125413">the approach given by Sjaak Trekhaak</a> in this thread. The approach as it is so far only works if you provide exactly one url. For example, if you want to provide more than one url like this, for instance: 

<pre><code>-a start_url=http://url1.com,http://url2.com
</code></pre>

then Scrapy (I'm using the current stable version 0.14.4) will terminate with the following exception:

<pre><code>error: running 'scrapy crawl' with more than one spider is no longer supported
</code></pre>

However, you can circumvent this problem by choosing a different variable for each start url, together with an argument that holds the number of passed urls. Something like this:

<pre><code>-a start_url1=http://url1.com 
-a start_url2=http://url2.com 
-a urls_num=2
</code></pre>

You can then do the following in your spider:

<pre><code>class MySpider(BaseSpider):

 name = 'my_spider' 

 def __init__(self, *args, **kwargs): 
 super(MySpider, self).__init__(*args, **kwargs) 

 urls_num = int(kwargs.get('urls_num'))

 start_urls = []
 for i in xrange(1, urls_num):
 start_urls.append(kwargs.get('start_url{0}'.format(i)))

 self.start_urls = start_urls
</code></pre>

This is a somewhat ugly hack but it works. Of course, it's tedious to explicitly write down all command line arguments for each url. Therefore, it makes sense to wrap the <code>scrapy crawl</code> command in a Python <a href="http://docs.python.org/library/subprocess.html" rel="nofollow noreferrer">subprocess</a> and generate the command line arguments in a loop or something.

Hope it helps. :)

blocks|key|445506|text|您也可以尝试这样做：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|445507|>>>+scrapy+view+http://www.sitename.com|code-block|syntax|javascript|445508|它将在请求URL的浏览器中打开一个窗口。|445509|entityMap^0|0|0|0^^$0|@$1|2|3|4|5|6|7|K|8|@]|9|@]|A|$]]|$1|B|3|C|5|D|7|L|8|@]|9|@]|A|$E|F]]|$1|G|3|H|5|6|7|M|8|@]|9|@]|A|$]]|$1|I|3|-4|5|6|7|N|8|@]|9|@]|A|$]]]|J|$]]

You can also try this:

<pre><code>&gt;&gt;&gt; scrapy view http://www.sitename.com
</code></pre>

It will open a window in browser of requested URL.

I want to use scrapy for crawling web pages. Is there a way to pass the start URL from the terminal itself?

It is given in the <a href="http://doc.scrapy.org/en/0.12/topics/commands.html#std:command-crawl" rel="noreferrer">documentation</a> that either the name of the spider or the URL can be given, but when i given the url it throws an error:

//name of my spider is example, but i am giving url instead of my spider name(It works fine if i give spider name).

<blockquote>
 scrapy crawl example.com 
</blockquote>

ERROR:

<blockquote>
 File
 "/usr/local/lib/python2.7/dist-packages/Scrapy-0.14.1-py2.7.egg/scrapy/spidermanager.py",
 line 43, in create
 raise KeyError("Spider not found: %s" % spider_name) KeyError: 'Spider not found: example.com'
</blockquote>

How can i make scrapy to use my spider on the url given in the terminal??

How to give URL to scrapy for crawling?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云AI代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

功能1上新10个字符

功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符功能2描述100个字符。

功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符功能2上新100个字符。

功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符功能5描述100个字符

功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符功能5上新100个字符

功能4上新

文章&问答评论现已支持表情

全新交互，全新视觉，新增快捷键、悬浮工具栏、高亮块等功能并同时优化现有功能，全面提升创作效率和体验

社区富文本编辑器全新改版！诚邀体验～ 

精选全网热门MCP server，让你的AI更好用 🚀

💥开发者 MCP广场重磅上线！

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我想用刮擦来爬行网页。有办法从终端本身传递起始URL吗？在中，可以给出蜘蛛的名称或URL，但是当我给出url时它会抛出一个错误：//我的蜘蛛的名字就是例子，但是我给出的是url而不是我的蜘蛛名(如果我给蜘蛛命名的话，效果会很好)。抓取性爬行example.com错误：文件"/usr/local/lib/python2.7/dist-packages/Scrapy-0.14.1-py2.7.egg/

问如何给抓取URL抓取抓取？
EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何给抓取URL抓取抓取？EN

回答 6

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何给抓取URL抓取抓取？
EN