我是一个新的程序员,刚刚开始学习一些关于numpy的知识。我想将从网络抓取的信息插入到一个数据框架中。这是我到目前为止想出的。
from urllib2 import urlopen as uReq
from bs4 import BeautifulSoup as soup
import numpy as np
import pandas as pd
from pandas import Series, DataFrame
my_url = 'http://www.bestbuy.com/site/searchpage.jsp?st=laptops&_dyncharset=UTF-8&id=pcat17071&type=page&sc=Global&cp=1&nrp=&sp=&qp=&list=n&af=true&iht=y&usc=All+Categories&ks=960&keys=keys'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.find_all("div",{"class":"row"})
DF_obj = DataFrame(np.arange(len(containers)*2).reshape((len(containers),2)), columns = ['Name', 'Price'])#index=['row 1', 'row 2', 'row 3', 'row 4', 'row 5', 'row 6'], columns=['column 1', 'column 2', 'column 3', 'column 4', 'column 5', 'column 6', ]
#print(DF_obj)
for container in containers:
try:
title_container = container.find_all("h4")
product_name = title_container[0].text
price_container = container.find_all("div", {"class", "pb-hero-price pb-purchase-price"})
product_price = price_container[0].text
#print("Product Name: "+ product_name)
#print("Price:"+ product_price)
DF_obj['Name'] = product_name # Here is the problem
DF_obj['Price'] = product_price# Here is the problem
except Exception:
pass
print(DF_obj)我现在所做的只是允许显示最后一次擦伤的产品。这是因为我使产品的名称和价格相等于整个专栏。我想找到一种方法,使它只等于I‘’th列,迭代到下一列。
基本上,我想在(容器,1)和(容器,2)上插入一个值。由于容器将迭代,下一个名称将转到下一列。这样做的语法是什么?
下面是一些可以让你们直观地看到正在发生的事情的输出:
Name Price
0 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
1 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
2 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
3 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
4 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
5 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
6 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
7 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
8 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
9 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
10 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
11 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
12 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
13 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
14 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99
15 Dell - Inspiron 2-in-1 13.3" Touch-Screen Lapt... $849.99既然你有了主意,就剪掉一些输出.
这里是我想要的,除了在数据框架内。
Product Name: Dell - XPS 2-in-1 13.3" Touch-Screen Laptop - Intel Core i7 - 16GB Memory - 512GB Solid State Drive - Silver
Price:$241.99
Product Name: HP - 15.6" Laptop - AMD A6-Series - 4GB Memory - 500GB Hard Drive - Black
Price:$241.99
Product Name: Dell - Inspiron 15.6" Touch-Screen Laptop - Intel Core i3 - 6GB Memory - 1TB Hard Drive - Black
Price:$349.99
Product Name: HP - 15.6" Laptop - Intel Core i5 - 8GB Memory - 2TB Hard Drive - Textured linear gradient grooves in black
Price:$449.99
Product Name: Lenovo - Ideapad 110s 11.6" Laptop - Intel Celeron - 2GB Memory - 32GB eMMC Flash Memory - White
Price:$169.99
Product Name: HP - 15.6" Laptop - Intel Core i5 - 8GB Memory - 1TB Hard Drive - HP finish in jet black
Price:$399.99
Product Name: Lenovo - Flex 4 14 2-in-1 14" Touch-Screen Laptop - Intel Pentium - 4GB Memory - 500GB Hard Drive - Black
Price:$349.99
Product Name: Lenovo - 15.6" Laptop - Intel Core i3 - 6GB Memory - 1TB Hard Drive - Ebony black
Price:$312.99
Product Name: Lenovo - Flex 4 1130 2-in-1 11.6" Touch-Screen Laptop - Intel Celeron - 2GB Memory - 64GB eMMC Flash Memory
Price:$229.99
Product Name: HP - Spectre x360 2-in-1 13.3" Touch-Screen Laptop - Intel Core i7 - 8GB Memory - 256GB Solid State Drive - Natural silver
Price:$1,129.99谢谢。
发布于 2017-08-04 00:12:11
更新的代码。集装箱的清单不完全正确。class=row的div标记比您需要的要高一点。现在,遍历所有list-item标记并将它们逐行添加到数据帧的for循环。
my_url = 'http://www.bestbuy.com/site/searchpage.jsp?st=laptops&_dyncharset=UTF-8&id=pcat17071&type=page&sc=Global&cp=1&nrp=&sp=&qp=&list=n&af=true&iht=y&usc=All+Categories&ks=960&keys=keys'
uClient = uReq(my_url)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
#This is the div tag that contains each of the products listed on the page
containers = page_soup.find_all("div",{'class':'list-item'})
#Just a check to make sure all the products are found
print len(containers)
#Create an empty dataframe, It can be populated later, no need for a fixed
#structure
DF_obj = DataFrame(columns = ['Name', 'Price'])
#we want to loop over the containers, but during each loop, we also want to
#track the loop number we are on, this will help us enter data into the
#dataframe, so we wrap containers with enumerate()
# in this situation index = loop count, container = value from containers list
for index,container in enumerate(containers):
try:
#These two you provided
title_container = container.find_all("h4")
price_container = container.find_all("div", {"class", "pb-hero-price pb-purchase-price"})
#Now that we found the info, we insert it into the data frame. This
#can be done with DF_obj.loc[row#,column Name], and our df is updated
#for every container
DF_obj.loc[index,'Name'] = title_container[0].text
DF_obj.loc[index, 'Price'] = price_container[0].text
except Exception:
pass
print DF_obj和数据
Name Price
0 HP - 15.6" Laptop - AMD A6-Series - 4GB Memory... $241.99
1 HP - 15.6" Laptop - Intel Core i5 - 8GB Memory... $399.99
2 Lenovo - Ideapad 110s 11.6" Laptop - Intel Cel... $169.99
3 Dell - Inspiron 15.6" Touch-Screen Laptop - In... $349.99
4 HP - 15.6" Laptop - Intel Core i5 - 8GB Memory... $449.99
5 HP - 17.3" Laptop - Intel Core i7 - 8GB Memory... $529.99
6 Dell - Inspiron 11.6" Laptop - Intel Celeron -... $174.99
7 Lenovo - 15.6" Laptop - AMD A6-Series - 4GB Me... $229.99
8 Dell - Inspiron 15.6" Touch-Screen Laptop - In... $349.99
9 Lenovo - Flex 4 14 2-in-1 14" Touch-Screen Lap... $349.99
10 Dell - Inspiron 17.3" Laptop - AMD A9-Series -... $506.99
11 Dell - Inspiron 17.3" Laptop - Intel Core i7 -... $938.99
12 Samsung - 11.6" Chromebook - Intel Celeron - 4... NaN
13 Acer - 15.6" Chromebook - Intel Celeron - 4GB ... $219.99
14 Lenovo - Yoga 710 2-in-1 11.6" Touch-Screen La... $399.99
15 Dell - Inspiron 2-in-1 17.3" Touch-Screen Lapt... $849.99
16 HP - Spectre x360 2-in-1 13.3" Touch-Screen La... $1,279.99
17 Asus - 2-in-1 13.3" Touch-Screen Laptop - Inte... $1,129.99
18 HP - 15.6" Laptop - AMD A12-Series - 6GB Memor... $299.99
19 Asus - ROG GL502VM 15.6" Laptop - Intel Core i... $1,149.99
20 Lenovo - 15.6" Laptop - Intel Core i3 - 6GB Me... $312.99
21 Lenovo - Flex 4 1130 2-in-1 11.6" Touch-Screen... $229.99
22 Samsung - 12.3" Chromebook Plus - Touch Screen... $399.00
23 HP - Spectre x360 2-in-1 13.3" Touch-Screen La... $1,129.99https://stackoverflow.com/questions/45494018
复制相似问题