我需要解析超文本标记语言并找到与"product-size _product-size“匹配的文本(没有任何其他单词,如"disabled _disabled ")。所以我使用了BeautifulSoup并剪切了我需要的HTML码
import requests
from bs4 import BeautifulSoup
import re
URL =.......
headers = {"User-Agent": .......}
page = requests.get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
div = soup.find("div", class_="size-list")
print("Find size-list \n" + str(div) +'\n')得到了这个
<div class="size-list" tabindex="-1">
<label for="size-10"
class="product-size _product-size disabled _disabled "
data-sku="01122345" data-name="10">
<div>
<input type="radio" value="10" name="size" id="size-10"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="10">10</span>
<span></span>
</label>
<label for="size-11"
class="product-size _product-size disabled _disabled "
data-sku="01122346" data-name="11">
<div>
<input type="radio" value="11" name="size" id="size-11"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="11">11</span>
<span></span>
</label>
<label for="size-12"
class="product-size _product-size "
data-sku="01122347" data-name="12">
<div>
<input type="radio" value="12" name="size" id="size-12"
class="_sizeInput" tabindex="0">
</div>
<span class="size-name" title="12">12</span>
<span></span>
</label>
<label for="size-13"
class="product-size _product-size disabled _disabled "
data-sku="01122348" data-name="13">
<div>
<input type="radio" value="13" name="size" id="size-13"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="13">13</span>
<span></span>
</label>
<label for="size-14"
class="product-size _product-size "
data-sku="01122349" data-name="14">
<div>
<input type="radio" value="14" name="size" id="size-14"
class="_sizeInput" tabindex="0">
</div>
<span class="size-name" title="14">14</span>
<span></span>
</label>
</div>现在我需要在文本中查找包含字符串"product-size _product-size“而不包含"disabled _disabled”的匹配项,如果我找到任何匹配项,请检查它们有什么"size-name“。我只是坚持(半个小时的Python用户,抱歉)。尝试使用以下命令查找与字符串"product-size _product-size“简单匹配的内容
soup.find_all('label', class_="product-size _product-size ")
soup.find(class_="product-size _product-size ")
soup.find_all(text=re.compile(r'product-size _product-size '))
#div.find... or soup.find..., and ect, whatever. 但是只得到了[]或者一个都没有。我做错了什么?
发布于 2019-07-16 05:05:08
使用Css选择器和:not(类)
data='''<div class="size-list" tabindex="-1">
<label for="size-10"
class="product-size _product-size disabled _disabled "
data-sku="01122345" data-name="10">
<div>
<input type="radio" value="10" name="size" id="size-10"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="10">10</span>
<span></span>
</label>
<label for="size-11"
class="product-size _product-size disabled _disabled "
data-sku="01122346" data-name="11">
<div>
<input type="radio" value="11" name="size" id="size-11"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="11">11</span>
<span></span>
</label>
<label for="size-12"
class="product-size _product-size "
data-sku="01122347" data-name="12">
<div>
<input type="radio" value="12" name="size" id="size-12"
class="_sizeInput" tabindex="0">
</div>
<span class="size-name" title="12">12</span>
<span></span>
</label>
<label for="size-13"
class="product-size _product-size disabled _disabled "
data-sku="01122348" data-name="13">
<div>
<input type="radio" value="13" name="size" id="size-13"
disabled="disabled" class="_sizeInput" tabindex="-1">
</div>
<span class="size-name" title="13">13</span>
<span></span>
</label>
<label for="size-14"
class="product-size _product-size "
data-sku="01122349" data-name="14">
<div>
<input type="radio" value="14" name="size" id="size-14"
class="_sizeInput" tabindex="0">
</div>
<span class="size-name" title="14">14</span>
<span></span>
</label>
</div>'''
soup=BeautifulSoup(data,'html.parser')
for item in soup.select('.product-size._product-size:not(.disabled)'):
print(item.select_one('.size-name').text)输出:
12
14https://stackoverflow.com/questions/57047053
复制相似问题