文章/答案/技术大牛

发布

问用漂亮汤抓取网页数据
EN

Stack Overflow用户

提问于 2017-05-16 17:05:49

回答 1查看 448关注 0票数 0

我尝试过抓取商店位置的文本详细信息，并使用BeautifulSoup将它们写入csv。阿拉巴马州的2家商店属于LocationSecContent类，亚利桑那州的17家商店属于另一类LocationSecContent。在佐治亚州，第一家商店机场在名为location的单一类中，在LocationSecContent类中，其余4家在LocationSecContent中的另一个类位置中。我想要抓取文本详细信息，并将商店详细信息，如名称、位置、街道、电话、传真、小时内容和所有详细信息写入csv文件。我在firefox中使用firebug。对不起，如果有任何错误，我是一个美食汤的初学者。

以下是我尝试过的方法：

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.text, 'html.parser')
d={}
for table in soup.find_all("div", {"class":"content freshvites-location"}):
    table
for col in table.find_all("td"):

    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    Location=col.find_all("div",{'class':'location'})


dt="LocationSecHdr:%s,Location: %s" %(LocationSecHdr, Location)
zx=BeautifulSoup(dt, "html.parser")

print zx.get_text()

我不能遍历行和抓取文本。

方法二：

from bs4 import BeautifulSoup

import requests


page = requests.get('http://freshvites.com/store-locator/')
#print page


soup = BeautifulSoup(page.text, 'html.parser')
#print soup.find_all('a')

for table in soup.find_all("div",{'class':'content freshvites-location'}):
    table


LocationSecHdr=''
LocationSecContent=''
Location=''
LocationTitle=''
Phone=''
Fax=''
HoursTitle=''
HoursContent=''


for col in table.find_all("td"):      
    LocationSecHdr=col.find_all("div",{'class':'LocationSecHdr'})
    #LocationSecContent= col.find_all("div",{'class':'LocactionSecContent'})
    #Location= col.find_all("div",{'class':'location'})
    LocationTitle= col.find_all("div",{'class':'locationTitle'})
    Phone= col.find_all("div",{'class':'Phone'})
    Fax= col.find_all("div",{'class':'Fax'})
    HoursContent=col.find_all("div",{'class':'HoursContent'})

    data="LocationSecHdr: %s, LocationSecContent: %s, Location:%s, LocationTitle : %s, Phone:%s, Fax :%s, HoursContent:%s " %(LocationSecHdr, LocationSecContent, Location, LocationTitle, Phone, Fax, HoursContent)
    zax=BeautifulSoup(data,"html.parser")

print zax.get_text()

如果我尝试这个代码，我不能得到商店的地址，我也不知道如何获得这些细节作为字典

python

csv

web-scraping

beautifulsoup

回答 1

Stack Overflow用户

发布于 2017-05-16 17:39:52

我想我现在有足够的信息来回答你的问题。

您正在寻找错误的标记/类组合。位置的所有信息都包含在<div class="location">中。下面是一个示例：

<div class="location">
<div class="locationTitle">32nd Street &amp; Thunderbird</div>
Fresh Vitamins<br> 
13802 N. 32nd St #11<br> 
Phoenix, AZ 85032<br>
<div class="Phone">&nbsp;</div>
<div class="Fax">877.935.6902</div>
<div class="HoursTitle">Hours:</div>
<div class="HoursContent">9am - 7pm M-F<br> 9am - 6pm Sat<br> 11am - 4pm Sun</div>
</div>

正如您在示例中看到的，没有<tr>或<td>，因此查找它没有实际意义。

下面是我用来查找所有位置的一个简短的python脚本：

from bs4 import BeautifulSoup
import requests

page = requests.get('http://freshvites.com/store-locator/')

soup = BeautifulSoup(page.content, 'html.parser')

for div in soup.find_all("div", {"class":"location"}):
    print(div)

现在，您只需要从div过滤所需的信息。你所需要的一切都应该很容易在上面找到。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/43996974

复制

相似问题

问用漂亮汤抓取网页数据
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用漂亮汤抓取网页数据EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用漂亮汤抓取网页数据
EN