首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >嵌套的同级For循环,输出到列表

嵌套的同级For循环,输出到列表
EN

Stack Overflow用户
提问于 2017-02-21 23:48:13
回答 1查看 184关注 0票数 0

当我迭代以下内容时,在将数据追加到列表时遇到问题:

代码语言:javascript
复制
import urllib
import urllib.request
from bs4 import BeautifulSoup
import pandas

def make_soup(url):
    thepage = urllib.request.urlopen(url)
    thepage.addheaders = [('User-Agent', 'Mozilla/5.0')]
    soupdata = BeautifulSoup(thepage, 'html.parser')
    return soupdata

soup = make_soup('https://www.wellstar.org/locations/pages/default.aspx')

locationdata = []

for table in soup.findAll('table', class_ = 's4-wpTopTable'):
   for name in table.findAll('div', 'PurpleBackgroundHeading'):
       name = name.get_text(strip = True)
   for loc_type in table.findAll('h3', class_ = 'WebFont SpotBodyGreen'):
       loc_type = loc_type.get_text()
   for address in table.findAll('div', class_ = ['WS_Location_Address', 'WS_Location_Adddress']):
       address = address.get_text(strip = True, separator = ' ')
       locationdata.append([name, loc_type, address])

df = pandas.DataFrame(columns = ['name', 'loc_type', 'address'], data = locationdata)
print(df)

所生成的数据帧包括所有唯一地址,但是仅对应于name的最后可能的文本。

例如,尽管“WellStar Windy Hill医院”是医院类别/类型中的最后一家医院,但它显示为所有医院的名称。如果可能的话,我更喜欢list.append解决方案,因为我有几个更多,类似的步骤来完成这个项目。

EN

回答 1

Stack Overflow用户

发布于 2017-02-22 00:29:20

之所以会出现这种情况,是因为您在附加到locationdata之前遍历了所有的名称和loc_types。

相反,您可以这样做:

代码语言:javascript
复制
import itertools as it
from pprint import pprint as pp

for table in soup.findAll('table', class_='s4-wpTopTable'):
  names = [name.get_text(strip=True) for 
           name in table.findAll('div', 'PurpleBackgroundHeading')]
  loc_types = [loc_type.get_text() for 
               loc_type in table.findAll('h3', class_='WebFont SpotBodyGreen')]
  addresses = [address.get_text(strip=True, separator=' ') for 
               address in table.findAll('div', class_=['WS_Location_Address',  
                                                       'WS_Location_Adddress'])]

for name, loc_type, address in it.izip_longest(names,loc_types,addresses):
  locationdata.append([name, loc_type, address])

结果:

代码语言:javascript
复制
>>> pp.pprint(locationdata)
[[u'WellStar Urgent Care in Acworth',
  u'WellStar Urgent Care Centers',
  u'4550 Cobb Parkway NW Suite 101 Acworth, GA 30101 770-917-8140'],
 [u'WellStar Urgent Care in Kennesaw',
  None,
  u'3805 Cherokee Street Kennesaw, GA 30144 770-426-5665'],
 [u'WellStar Urgent Care in Marietta - Delk Road',
  None,
  u'2890 Delk Road Marietta, GA 30067 770-955-8620'],
 [u'WellStar Urgent Care in Marietta - East Cobb',
  None,
  u'3747 Roswell Road Ne Suite 107 Marietta, GA 30062 470-956-0150'],
 [u'WellStar Urgent Care in Marietta - Kennestone',
  None,
  u'818 Church Street Suite 100 Marietta, GA 30060 770-590-4190'],
 [u'WellStar Urgent Care in Marietta - Sandy Plains Road',
  None,
  u'3600 Sandy Plains Road Marietta, GA 30066 770-977-4547'],
 [u'WellStar Urgent Care in Smyrna',
  None,
  u'4480 North Cooper Lake Road SE Suite 100 Smryna, GA 30082 770-333-1300'],
 [u'WellStar Urgent Care in Woodstock',
  None,
  u'120 Stonebridge Parkway Suite 310 Woodstock, GA 30189 678-494-2500']]
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/42371895

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档