我试着打印这篇文章https://i.imgur.com/SLl1URt.png我用了“soup.find_all”(“p”,class_="review")“试着用.getText或者在.contents里面检查,但是没有一个有效。
网络链接https://m.wuxiaworld.co/Castle-of-Black-Iron/
下面是一些调试信息https://i.imgur.com/0k6NHeD.png
import urllib2
from bs4 import BeautifulSoup
def info(novelname):
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
url = "https://m.wuxiaworld.co/"+novelname+"/"
headers={'User-Agent':user_agent,'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8','Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
'Accept-Encoding': 'none',
'Accept-Language': 'en-US,en;q=0.8',
'Connection': 'keep-alive'}
request = urllib2.Request(url, headers=headers)
response = urllib2.urlopen(request)
soup = BeautifulSoup(response, features="html.parser")
for textp in soup.find_all("p", class_="review"):
print textp.contents
print textp
print textp.getText()发布于 2019-06-16 23:23:03
import requests
from bs4 import BeautifulSoup
from collections import OrderedDict
def info(novelname):
response = requests.get(
'https://m.wuxiaworld.co/{}/'.format(novelname.replace(' ', '-')),
headers=OrderedDict(
(
("User-Agent", "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7"),
("Accept", "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"),
("Accept-Language", "en-US,en;q=0.5"),
("Accept-Encoding", "gzip, deflate"),
("Connection", "keep-alive"),
("Upgrade-Insecure-Requests", "1")
)
)
)
if response.status_code == 200:
soup = BeautifulSoup(response.content, 'html5lib')
for textp in soup.find_all('p', attrs={'class': 'review'}):
print textp.text.strip()
info('Castle of Black Iron')问题是你的html解析器..。使用html5lib给我们
Description
After the Catastrophe, every rule in the world was rewritten.
In the Age of Black Iron, steel, iron, steam engines and fighting force became the crux in which human beings depended on to survive.
A commoner boy by the name Zhang Tie was selected by the gods of fortune and was gifted a small tree which could constantly produce various marvelous fruits. At the same time, Zhang Tie was thrown into the flames of war, a three-hundred-year war between the humans and monsters on the vacant continent. Using crystals to tap into the potentials of the human body, one must cultivate to become stronger.
The thrilling legends of mysterious clans, secrets of Oriental fantasies, numerous treasures and legacies in the underground world — All in the Castle of Black Iron!
Citadel of Black Iron
黑铁之堡发布于 2019-06-16 22:10:51
当你打印你的汤时,你会在终端中看到一些html标签(不是所有的源),.I认为这个网站隐藏了datas.So的某些部分,我建议使用Selenium。如果您还没有下载,您可以安装在:
https://chromedriver.storage.googleapis.com/index.html?path=2.35/所有代码:
from selenium import webdriver
driver_path = r'your driver path'
browser = webdriver.Chrome(executable_path=driver_path)
browser.get("https://m.wuxiaworld.co/Castle-of-Black-Iron/")
x = browser.find_elements_by_css_selector("p[class='review']") ## Declare which class
for text1 in x:
print text1.text
browser.close()产出:
灾难发生后,世界上的每一条规则都被改写了。在黑铁时代,钢铁、钢铁、蒸汽机和战斗力成为人类赖以生存的关键。一个名叫张铁的普通男孩是被命运之神选中的,他被赋予了一棵小树,它能不断地产生各种奇妙的果实。与此同时,张蒂被扔进战火中,这是一场长达300年的人类与怪物在空旷大陆上的战争。利用晶体来挖掘人体的潜能,一个人必须培养才能变得更强壮。神秘家族的惊心动魄的传说,东方幻想的秘密,地下世界无数的珍宝和遗产--都在黑铁城堡里!黑铁黑铁之堡城堡
https://stackoverflow.com/questions/56622856
复制相似问题