首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >从html,python中的特定部分提取文本。

从html,python中的特定部分提取文本。
EN

Stack Overflow用户
提问于 2020-06-28 04:17:25
回答 2查看 153关注 0票数 1

我试着做一个程序来显示一首歌的歌词,但是我被这个错误困住了:

AttributeError: 'NoneType' object has no attribute 'text'

下面是代码:

代码语言:javascript
复制
def get_lyrics(url):
    lyrics_html = requests.get(url)
    soup = BeautifulSoup(lyrics_html.content, "html.parser")
    lyrics = soup.find('div', {"class": "lyrics"})
    return lyrics.text

这是我拿歌词的站点。我无法解释什么是错的,例如,我将搜索这首歌的歌词,下面是这首歌的歌词:点击。你可以从你自己的页面中看到歌词所在的“地方”,一个带有类“歌词”的div。这是如何制作这个网站的所有歌词网页。有人能帮我吗?Ty

EN

回答 2

Stack Overflow用户

回答已采纳

发布于 2020-06-28 07:24:44

该页面返回两个版本的页面(可能是为了混淆、刮板、和机器人)。一个版本的类以"Lyrics__Container..."开头,另一个版本以类lyrics开头。如果找不到带有类Lyrics__Container的标记,歌词就在带有类lyrics的标记中。

这应该总是打印歌词:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'html.parser')

text = soup.select_one('div[class^="Lyrics__Container"], .lyrics').get_text(strip=True, separator='\n')
print(text)

指纹:

代码语言:javascript
复制
[Intro]
Ah, mhh (ehi)
Ho la bocca piena
Va bene
[Verse]
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis (eh, eh-eh)
Ciao, Ciao mi chiamo Luis (eh, eh-eh)
Ciao, mi chiamo Luis
Si, si, si Sal
A a a a Si si si si si si
Proprio così mi chiamo io
Ciao mi chiamo Luis Aah

... and so on.

编辑:更新版本:

代码语言:javascript
复制
import requests
from bs4 import BeautifulSoup


url = 'https://genius.com/Avicii-the-nights-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')

def get_text(elements):
    text = ''
    for c in elements:
        for t in c.select('a, span'):
            t.unwrap()
        if c:
            c.smooth()
            text += c.get_text(strip=True, separator='\n')
    return text


cs = soup.select('div[class^="Lyrics__Container"]')
if cs:
    text = get_text(cs)
else:
    text = get_text(soup.select('.lyrics'))

print(text)

指纹:

代码语言:javascript
复制
[Verse 1]
(Hey)
Once upon a younger year
When all our shadows disappeared
The animals inside came out to play (Hey)
Hey, went face to face with all our fears
Learned our lessons through the tears
Made memories we knew would never fade
[Pre-Chorus]
One day my father he told me
Son, don't let it slip away

...etc.
票数 2
EN

Stack Overflow用户

发布于 2020-06-28 07:22:26

您应该使用这个链接https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics,而不是https://genius.com/,因为您已经提到了歌曲。

代码语言:javascript
复制
def get_lyrics(url):
    lyrics_html = requests.get(url)
    soup = BeautifulSoup(lyrics_html.text, "lxml")
    lyrics_text = []
    lyrics = soup.find_all('div', class_="Lyrics__Container-sc-1ynbvzw-2 jgQsqn")
    for i in lyrics:
        lyrics_text.append(i.text.strip())
        # print(i.text.strip())
    return lyrics_text

output = get_lyrics("https://genius.com/Luis-sal-ciao-mi-chiamo-luis-lyrics")

产出将是:

代码语言:javascript
复制
['[Intro]Ah, mhh (ehi)Ho la bocca pienaVa bene[Verse]Ciao, mi chiamo Luis (eh, eh-eh)Ciao, mi chiamo Luis (eh, eh-eh)Ciao, Ciao mi chiamo Luis (eh, eh-eh)Ciao, mi chiamo LuisSi, si, si SalA a a a Si si si si si siProprio così mi chiamo ioCiao mi chiamo Luis AahLuis Sal, Luis, Luis, Luis SalCiao mi chiamo Luis, Luis SalEeemEeeCiao, Ciao BolognaMi chiamo LuisCiao Mamma (Eee) EeeCiao, Ciao anche a voi LuistiMi chiamo Luis, Lo youtuber EeeEeeCiao, Sono uno youtuberMi chiamo LuisSono uno youtuberEeeCiao, Sono uno youtuberMi chiamo LuisSono uno youtuberA e (Diglielo Luis) a e ă a e e a ă a a a-aaaaCiao mi chiamo LuisEee (Ma chi ti caga)Eee Ciao (Ma chi vuoi che ti guardi)Mi chiamo LuisHahahahaEeeVoglio diventare uno youtuberEee', '', '[Outro]Uuu BolognaDuemila EeeEee EeeEe']
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/62618154

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档