我自己也是Python的新手。出于训练的目的,我正在尝试从一个网站上搜集一些数据。深入研究这个网站的HTML/CSS告诉我,这并不是那么简单,因为大多数div等都没有class或ID。
<table class="trade-list-table max-width">
<thead>
</thead>
<tbody>
<tr class="cursor-pointer" data-on-click-link="/pc/Trade/Detail/313809613" data-on-click-link-action="NewWindow" data-toggle="tooltip" data-original-title="" title="">
<td>
<img class="trade-item-icon item-quality-legendary" alt="Icon" src="./Search Result - Tamriel Trade Centre_files/crafting_outfitter_potion_014.png" data-original-title="" title="">
<div class="item-quality-legendary">
XXSTRING1XX
</div>
<div>
Level:
<img class="small-icon" src="./Search Result - Tamriel Trade Centre_files/nonvet.png">
XXSTRING2XX
</div>
</td>
<td class="hidden-xs">
<div class="text-small-width text-danger">
XXSTRING3XX
</div>
</td>
<td class="hidden-xs">
<div>
XXSTRING4XX
</div>
<div>
XXSTRING5XX
</div>
</td>
<td class="gold-amount bold">
<img class="small-icon" src="./Search Result - Tamriel Trade Centre_files/gold.png">
XXSTRING6XX
<div class="text-danger">
X
</div>
<img class="small-icon" src="./Search Result - Tamriel Trade Centre_files/amount.png">
XXSTRING7XX
<div class="text-danger">
=
</div>
<img class="small-icon" src="./Search Result - Tamriel Trade Centre_files/gold.png">
54,999
</td>
<td class="bold hidden-xs" data-mins-elapsed="2">Now</td>
</tr>我试过很多方法。在过去的7天里,我一直在挣扎。当我打印结果时,我需要XXSTRING1XX直到XXSTRING7XX,这样我才能将它们推入.csv文件或类似的文件中。
我一直遇到的困难是,大多数div没有特定的类。在大多数情况下,我无法返回字符串。
我一直在使用Python处理来自bs4的请求和BeautifulSoup。
import requests
from bs4 import BeautifulSoup
page = requests.get('https://eu.tamrieltradecentre.com/pc/Trade/SearchResult?ItemID=211&SearchType=Sell&ItemNamePattern=Dreugh+Wax&ItemCategory1ID=&ItemCategory2ID=&ItemCategory3ID=&ItemTraitID=&ItemQualityID=&IsChampionPoint=false&LevelMin=&LevelMax=&MasterWritVoucherMin=&MasterWritVoucherMax=&AmountMin=&AmountMax=&PriceMin=&PriceMax=')
soup = BeautifulSoup(page.content, 'html.parser')
container = soup.find(class_="trade-list-table max-width")
itembox = container.find_all(class_="cursor-pointer")
item = itembox[0]
# Select all table rows and first TD
tr = container.find_all(class_="cursor-pointer")
tr1 = tr[0].find_all('td')
# Itemname
itemname = item.find('div', class_="item-quality-legendary").get_text()
print (itemname)
# Itemlevel + level type
# Tradername
# Location
# Guild name
# Unit price
# Quantity
# Total price
# Timestamp?发布于 2019-07-27 08:45:36
编辑由于您要从某些数据源中查找特定字符串,例如,假设一个包含未知字符串的文本文件,则:
file.txt
some
unknown
strings
to
look
for
...bs.py
import re
from bs4 import BeautifulSoup
filename = 'file.txt' # file containing unknown strings
data = []
with open(filename, 'r') as f: # open file
data = f.readlines()
data = [line.strip('\n') for line in data] # ['some','unknown','strings','to','look','for',...]
src = request.get(...)
soup = BeautifulSoup(src, 'html.parser')
results = []
for target in data:
result = soup.find_all(string=re.compile(target)) # look at documentation for other functionalities!
if result: # if any results are found
for string in result:
string = string.split() # cleanup
results.append(string)
else: # no results found
results.append(result)
print(results) # do something这应该会让您大致了解要做什么。如果您仍然不确定,请查看BS4的文档。
https://stackoverflow.com/questions/57226861
复制相似问题