文章/答案/技术大牛

发布

社区首页 >问答首页 >使用BeautifulSoup4与谷歌翻译

问使用BeautifulSoup4与谷歌翻译
EN

Stack Overflow用户

提问于 2016-07-19 07:13:37

回答 3查看 2.4K关注 0票数 4

目前，我正在浏览AutomateTheBoringStuff的网页抓取部分，并试图编写一个脚本，该脚本可以使用BeautifulSoup4从谷歌翻译中提取翻译单词。

我检查了一页的html内容，其中的“解释”是翻译出来的词：

<span id="result_box" class="short_text" lang="en">  
    <span class>Explanation</span>
</span>

使用BeautifulSoup4，我尝试了不同的选择器，但没有返回翻译的单词。下面是我尝试过的几个例子，但它们根本没有返回任何结果：

soup.select('span[id="result_box"] > span')  
soup.select('span span')

我甚至直接从开发人员工具中复制了选择器，这给了我#result_box > span。这同样不会返回任何结果。

有人能向我解释一下如何将BeautifulSoup4用于我的目的吗？这是我第一次使用BeautifulSoup4，但我认为我使用BeautifulSoup或多或少是正确的，因为选择器

soup.select('span[id="result_box"]')

为我获取外部span元素**

[<span class="short_text" id="result_box"></span>]

**不知道为什么“leng=”“en”部分会丢失，但我相当肯定，我已经找到了正确的元素。

以下是完整的代码：

import bs4, requests

url = 'https://translate.google.ca/#zh-CN/en/%E6%B2%BB%E5%85%B7'
res = requests.get(url)
res.raise_for_status
soup = bs4.BeautifulSoup(res.text, "html.parser")
translation = soup.select('#result_box span')
print(translation)

编辑:如果我将Google翻译页面保存为一个离线html文件，然后从该html文件中生成一个soup对象，那么定位元素就没有问题了。

import bs4

file = open("Google Translate.html")
soup = bs4.BeautifulSoup(file, "html.parser")
translation = soup.select('#result_box span')
print(translation)

python

html

beautifulsoup

回答 3

Stack Overflow用户

回答已采纳

发布于 2016-07-19 08:40:14

result_box div是正确的元素，但是您的代码只有在保存浏览器中所看到的内容时才能工作，因为它包括动态生成的内容，使用的是只获得源本身的请求来阻止任何动态生成的内容。转换是通过对下面的url的ajax调用生成的：

"https://translate.google.ca/translate_a/single?client=t&sl=zh-CN&tl=en&hl=en&dt=at&dt=bd&dt=ex&dt=ld&dt=md&dt=qca&dt=rw&dt=rm&dt=ss&dt=t&ie=UTF-8&oe=UTF-8&source=bh&ssel=0&tsel=0&kc=1&tk=902911.786207&q=%E6%B2%BB%E5%85%B7"

对于您的请求，它返回：

[[["Fixture","治具",,,0],[,,,"Zhì jù"]],,"zh-CN",,,[["治 具",1,[["Fixture",999,true,false],["Fixtures",0,true,false],["Jig",0,true,false],["Jigs",0,true,false],["Governance",0,true,false]],[[0,2]],"治具",0,1]],1,,[["ja"],,[1],["ja"]]]

因此，您要么必须模拟请求，传递所有必需的参数，要么使用一些支持动态内容的内容，如硒。

票数 2

Stack Overflow用户

发布于 2016-07-19 07:26:15

只需试一试：

translation = soup.select('#result_box span')[0].text
print(translation)

票数 0

Stack Overflow用户

发布于 2021-09-08 05:39:16

你可以试试这个不同的方法：

if filename.endswith(extension_file):
        with open(os.path.join(files_from_folder, filename), encoding='utf-8') as html:
            soup = BeautifulSoup('<pre>' + html.read() + '</pre>', 'html.parser')
            for title in soup.findAll('title'):
                recursively_translate(title)

有关完整代码，请参见：

https://neculaifantanaru.com/en/python-code-text-google-translate-website-translation-beautifulsoup-library.html

或者在这里：

https://neculaifantanaru.com/en/example-google-translate-api-key-python-code-beautifulsoup.html

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/38451783

复制

相似问题

问使用BeautifulSoup4与谷歌翻译
EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用BeautifulSoup4与谷歌翻译EN

回答 3

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用BeautifulSoup4与谷歌翻译
EN