首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何在第二个for循环中使用append

如何在第二个for循环中使用append
EN

Stack Overflow用户
提问于 2021-01-11 04:19:06
回答 1查看 31关注 0票数 0

因此,我使用append来扩展我的抓取公寓列表。在这段代码中,我遇到了一个问题,因为我创建了第二个for循环来更改网站中的页面。因此,第一个for循环将新页面提供给下一个要抓取的for循环。但当一个页面完成时,它只会覆盖最后一个列表。我做错了什么?

代码语言:javascript
复制
for page in range(1, 4):  # Gives new page to scrape

  r = requests.get( url + str(page))      
  soup = bs(r.content)
  apartments = soup.select(".ListPage__cardContainer__39dKQ")
  base_path = "https://www.etuovi.com"
  x = []
  apartment_list = []

  for index ,apartment in enumerate(apartments):

    if index == 2:  # Just to not scrape every item
      break

    relative_path = apartment.a['href']
    full_path = base_path + relative_path
    id_number = apartment.a['id']
    apartment_list.append(get_apartment_data(full_path))   #This works for one page


x.append(apartment_list)     # Tried to make this work.. Think one list should be enaught.

和函数:

代码语言:javascript
复制
def get_content_value(info_list_data):

  if info_list_data.find("li"):
  return [li.get_text(" ", strip=True).replace("\xa0", "").replace("€", "").replace("/ kk", 
  "").replace("\n", "") for li in info_list_data.find_all("li")]

  else:
  return info_list_data.get_text(" ", strip=True).replace("\xa0" , "").replace("€", "").replace("/ 
  kk", "").replace("\n", "")

最后:

代码语言:javascript
复制
def get_apartment_data(url):

  r = requests.get(url)      
  soup = bs(r.content)
  all_info_list = soup.find_all(class_ = "CompactInfoRow__infoRow__2hjs_ flexboxgrid__row__wfmuy")

  for info_list in all_info_list:
 
    info_list.prettify()

  info = {}
  for index, info_list in enumerate(all_info_list):

    content_key = info_list.find(class_ = "flexboxgrid__col-xs-12__1I1LS flexboxgrid__col-sm-4__3RH7g 
    ItemHeader__itemHeader__32xAv").get_text(" ", strip=True)
    content_value = get_content_value(info_list.find(class_ = "flexboxgrid__col-xs-12__1I1LS 
    flexboxgrid__col-sm-8__2jfMv CompactInfoRow__content__3jGt4"))
    info[content_key] = content_value

return info
EN

回答 1

Stack Overflow用户

发布于 2021-01-11 04:20:36

代码语言:javascript
复制
for page in range(1, 4):  # Gives new page to scrape

  r = requests.get( url + str(page))      
  soup = bs(r.content)
  apartments = soup.select(".ListPage__cardContainer__39dKQ")
  base_path = "https://www.etuovi.com"
  x = []
  apartment_list = []

  for index ,apartment in enumerate(apartments):

    if index == 2:  # Just to not scrape every item
      break

    relative_path = apartment.a['href']
    full_path = base_path + relative_path
    id_number = apartment.a['id']
    apartment_list.append(get_apartment_data(full_path))   #This works for one page


x.append(apartment_list.copy())

您需要使用copy()方法来制作独立的副本。否则,每次你创建一个新的apartment_list时,它也会在你的x列表中改变。就像双胞胎名单。

更一般地说:

代码语言:javascript
复制
x = []
lst = [1,2,3]

x.append(lst)

print (x)

lst[0] = 0

x.append(lst)

print (x)

输出:

代码语言:javascript
复制
[[1,2,3]]

[[0,2,3],[0,2,3]]

正确的方法是:

代码语言:javascript
复制
x = []
lst = [1,2,3]

x.append(lst.copy())

print (x)

lst[0] = 0

x.append(lst.copy())

print (x)

输出:

代码语言:javascript
复制
[[1,2,3]]

[[1,2,3],[0,2,3]]
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65658070

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档