blocks|key|1241151|text|我将在这里发布user283120的第二个答案，比第一个更精确：|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1241152|Pywikibot核心不支持任何直接(HTML)方式与Wiki交互，因此您应该使用API。如果需要，可以通过使用urllib2轻松地完成任务。|1241153|这是我用来获取commons页面的HTML的一个例子：import+urllib2+...+++++url+=+"https://commons.wikimedia.org/wiki/"+%2B+page.title().replace("+","_")+++++html+=+urllib2.urlopen(url).read().decode('utf-8')|offset|length|style|CODE|1241154|entityMap|0|LINK|mutability|MUTABLE|url|https://commons.wikimedia.org/wiki/^0|0|0|R|49|1L|Z|0|0^^$0|@$1|2|3|4|5|6|7|R|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|S|8|@]|9|@]|A|$]]|$1|D|3|E|5|6|7|T|8|@$F|U|G|V|H|I]]|9|@$F|W|G|X|1|Y]]|A|$]]|$1|J|3|-4|5|6|7|Z|8|@]|9|@]|A|$]]]|K|$L|$5|M|N|O|A|$P|Q]]]]

I'll post here user283120 second answer, more precise than the first one:

Pywikibot core doesn't support any direct (HTML) way to interact to Wiki, so you should use API. 
If you need to, you can do it easily by using urllib2.

This is an example I used to get HTML of a wiki page in commons:
<code>
 import urllib2
...
 url = "<a href="https://commons.wikimedia.org/wiki/" rel="nofollow noreferrer">https://commons.wikimedia.org/wiki/</a>" + page.title().replace(" ","_")
 html = urllib2.urlopen(url).read().decode('utf-8')
</code>

blocks|key|2812557|text|SaveHTML.py下载文章和图像的HTML页面，并将有趣的部分保存到文件中，即文章文本和页脚|type|unstyled|depth|inlineStyleRanges|entityRanges|data|2812558|来源：https://git.wikimedia.org/blob/pywikibot%252Fcompat.git/HEAD/saveHTML.py|offset|length|2812559|entityMap|0|LINK|mutability|MUTABLE|url|https://git.wikimedia.org/blob/pywikibot%252Fcompat.git/HEAD/saveHTML.py^0|0|3|1Y|0|0^^$0|@$1|2|3|4|5|6|7|N|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|O|8|@]|9|@$D|P|E|Q|1|R]]|A|$]]|$1|F|3|-4|5|6|7|S|8|@]|9|@]|A|$]]]|G|$H|$5|I|J|K|A|$L|M]]]]

"[saveHTML.py] downloads the HTML-pages of articles and images and saves the interesting parts, i.e. the article-text and the footer to a file"

source: <a href="https://git.wikimedia.org/blob/pywikibot%2Fcompat.git/HEAD/saveHTML.py" rel="nofollow">https://git.wikimedia.org/blob/pywikibot%2Fcompat.git/HEAD/saveHTML.py</a>

blocks|key|1241139|text|一般来说，您应该使用pywikibot而不是wikipedia+(例如，代替“导入wikipedia”，您应该使用“导入pywikibot")，如果您正在寻找wikipedia.py中的方法和类，它们现在是分开的，可以在pywikibot文件夹中找到它们(主要在page.py和site.py中)。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1241140|如果您想运行您用compat编写的脚本，可以在pywikibot中使用一个名为compat2core.py的脚本(在scripts文件夹中)，并且有一个关于转换的详细帮助，名为README-Transsion.txt，仔细阅读它。|1241141|entityMap^0|0|0^^$0|@$1|2|3|4|5|6|7|F|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|G|8|@]|9|@]|A|$]]|$1|D|3|-4|5|6|7|H|8|@]|9|@]|A|$]]]|E|$]]

In general you should use pywikibot instead of wikipedia (e.g. instead of "import wikipedia" you should use "import pywikibot") and if you are looking for methods and class that were been in wikipedia.py, they are now separated and can be found in pywikibot folder (mainly in page.py and site.py)

If you want to run your scripts that you wrote in compat, you can use a script in pywikibot-core named compat2core.py (in scripts folder) and there is a detailed help about conversion named README-conversion.txt, read it carefully.

blocks|key|1246594|text|Mediawiki有一个解析操作，它允许获取由Mediawiki标记解析器返回的wikimarkup的html片段。|type|unstyled|depth|inlineStyleRanges|entityRanges|data|1246595|对于pywikibot库，已经实现了一个函数，您可以这样使用：|offset|length|1246596|def+getHtml(self,pageTitle):
++++++++'''
++++++++get+the+HTML+code+for+the+given+page+Title
++++++++
++++++++Args:
++++++++++++pageTitle(str):+the+title+of+the+page+to+retrieve
++++++++++++
++++++++Returns:
++++++++++++str:+the+rendered+HTML+code+for+the+page
++++++++'''
++++++++page=self.getPage(pageTitle)
++++++++html=page._get_parsed_page()
++++++++return+html|code-block|syntax|javascript|1246597|当使用mwclient库时，有一个通用的api方法(请参阅：https://github.com/mwclient/mwclient/blob/master/mwclient/client.py+)|1246598|它可以用于检索如下所示的html代码：|1246599|def+getHtml(self,pageTitle):
++++++++'''
++++++++get+the+HTML+code+for+the+given+page+Title
++++++++
++++++++Args:
++++++++++++pageTitle(str):+the+title+of+the+page+to+retrieve
++++++++'''
++++++++api=self.getSite().api("parse",page=pageTitle)
++++++++if+not+"parse"+in+api:
++++++++++++raise+Exception("could+not+retrieve+html+for+page+%25s"+%25+pageTitle)
++++++++html=api["parse"]["text"]["*"]
++++++++return+html+++|1246600|如上面所示，这给出了一个鸭型接口，它是在我是提交者的Py-3+3rdparty+mediawiki库中实现的。这是通过关闭问题38+-添加html页面检索解决的。|1246601|entityMap|0|LINK|mutability|MUTABLE|url|https://pypi.org/project/pywikibot/|1|https://pypi.org/project/mwclient/|2|https://github.com/mwclient/mwclient/blob/master/mwclient/client.py|3|https://en.wikipedia.org/wiki/Duck_typing|4|https://github.com/WolfgangFahl/py-3rdparty-mediawiki|5|https://github.com/WolfgangFahl/py-3rdparty-mediawiki/issues/38^0|0|2|A|0|0|0|3|9|1|U|1V|2|0|0|0|C|4|3|Q|N|4|1P|G|5|0^^$0|@$1|2|3|4|5|6|7|1A|8|@]|9|@]|A|$]]|$1|B|3|C|5|6|7|1B|8|@]|9|@$D|1C|E|1D|1|1E]]|A|$]]|$1|F|3|G|5|H|7|1F|8|@]|9|@]|A|$I|J]]|$1|K|3|L|5|6|7|1G|8|@]|9|@$D|1H|E|1I|1|1J]|$D|1K|E|1L|1|1M]]|A|$]]|$1|M|3|N|5|6|7|1N|8|@]|9|@]|A|$]]|$1|O|3|P|5|H|7|1O|8|@]|9|@]|A|$I|J]]|$1|Q|3|R|5|6|7|1P|8|@]|9|@$D|1Q|E|1R|1|1S]|$D|1T|E|1U|1|1V]|$D|1W|E|1X|1|1Y]]|A|$]]|$1|S|3|-4|5|6|7|1Z|8|@]|9|@]|A|$]]]|T|$U|$5|V|W|X|A|$Y|Z]]|10|$5|V|W|X|A|$Y|11]]|12|$5|V|W|X|A|$Y|13]]|14|$5|V|W|X|A|$Y|15]]|16|$5|V|W|X|A|$Y|17]]|18|$5|V|W|X|A|$Y|19]]]]

The Mediawiki API has a parse action which allows to get the html snippet for the wikimarkup as returned by the Mediawiki markup parser.
For the <a href="https://pypi.org/project/pywikibot/" rel="nofollow noreferrer">pywikibot library</a> there is already a function implemented which you can use like this:
<pre class="lang-py prettyprint-override"><code>def getHtml(self,pageTitle):
 '''
 get the HTML code for the given page Title
 
 Args:
 pageTitle(str): the title of the page to retrieve
 
 Returns:
 str: the rendered HTML code for the page
 '''
 page=self.getPage(pageTitle)
 html=page._get_parsed_page()
 return html
</code></pre>
When using the <a href="https://pypi.org/project/mwclient/" rel="nofollow noreferrer">mwclient python library</a> there is a generic api method see:
<a href="https://github.com/mwclient/mwclient/blob/master/mwclient/client.py" rel="nofollow noreferrer">https://github.com/mwclient/mwclient/blob/master/mwclient/client.py</a>
Which can be used to retrieve the html code like this:
<pre class="lang-py prettyprint-override"><code>def getHtml(self,pageTitle):
 '''
 get the HTML code for the given page Title
 
 Args:
 pageTitle(str): the title of the page to retrieve
 '''
 api=self.getSite().api(&quot;parse&quot;,page=pageTitle)
 if not &quot;parse&quot; in api:
 raise Exception(&quot;could not retrieve html for page %s&quot; % pageTitle)
 html=api[&quot;parse&quot;][&quot;text&quot;][&quot;*&quot;]
 return html 
</code></pre>
As shown above this gives a <a href="https://en.wikipedia.org/wiki/Duck_typing" rel="nofollow noreferrer">duck typed interface</a> which is implemented in the <a href="https://github.com/WolfgangFahl/py-3rdparty-mediawiki" rel="nofollow noreferrer">py-3rdparty-mediawiki</a> library for which i am a committer. This was resolved with closing <a href="https://github.com/WolfgangFahl/py-3rdparty-mediawiki/issues/38" rel="nofollow noreferrer">issue 38 - add html page retrieval</a>

blocks|key|2812659|text|使用Pywikibot，您可以使用http.request()获取html内容：|type|unstyled|depth|inlineStyleRanges|offset|length|style|CODE|entityRanges|data|2812660|import+pywikibot
from+pywikibot.comms+import+http
site+=+pywikibot.Site('wikipedia:en')
page+=+pywikibot.Page(s,+'Elvis+Presley')
path+=+'{}/index.php?title={}'.format(site.scriptpath(),+page.title(as_url=True))
r+=+http.request(site,+path)
print(r[94:135])|code-block|syntax|javascript|2812661|这将给出html内容。|2812662|'<title>Elvis+Presley+–+Wikipedia</title>\n'|2812663|使用Pywikibot+6.0，+http.request()提供一个requests.Response对象，而不是纯文本。在这种情况下，您必须使用text属性：|BOLD|2812664|print(r.text[94:135])|2812665|得到同样的结果。|2812666|entityMap^0|H|E|0|0|0|0|C|4|H|E|Z|H|0|0|0^^$0|@$1|2|3|4|5|6|7|X|8|@$9|Y|A|Z|B|C]]|D|@]|E|$]]|$1|F|3|G|5|H|7|10|8|@]|D|@]|E|$I|J]]|$1|K|3|L|5|6|7|11|8|@]|D|@]|E|$]]|$1|M|3|N|5|H|7|12|8|@]|D|@]|E|$I|J]]|$1|O|3|P|5|6|7|13|8|@$9|14|A|15|B|Q]|$9|16|A|17|B|C]|$9|18|A|19|B|C]]|D|@]|E|$]]|$1|R|3|S|5|H|7|1A|8|@]|D|@]|E|$I|J]]|$1|T|3|U|5|6|7|1B|8|@]|D|@]|E|$]]|$1|V|3|-4|5|6|7|1C|8|@]|D|@]|E|$]]]|W|$]]

With Pywikibot you may use <code>http.request()</code> to get the html content:
<pre><code>import pywikibot
from pywikibot.comms import http
site = pywikibot.Site('wikipedia:en')
page = pywikibot.Page(s, 'Elvis Presley')
path = '{}/index.php?title={}'.format(site.scriptpath(), page.title(as_url=True))
r = http.request(site, path)
print(r[94:135])
</code></pre>
This should give the html content
<pre><code>'&lt;title&gt;Elvis Presley – Wikipedia&lt;/title&gt;\n'
</code></pre>
With Pywikibot 6.0 <code>http.request()</code> gives a <code>requests.Response</code> object rather than plain text. In this case you must use the text Attribute:
<pre><code>print(r.text[94:135])
</code></pre>
to get the same result.

I'm using pywikibot-core, and I used before another python Mediawiki API wrapper as <a href="https://github.com/goldsmith/Wikipedia" rel="noreferrer">Wikipedia.py</a> (which has a .HTML method). I switched to pywikibot-core 'cause I think it has many more features, but I can't find a similar method. 
(beware: I'm not very skilled).

How do I get the HTML of a wiki page with Pywikibot?

翻译质量差，导致语言生硬或混乱。

没有提供实际的解决方法或示例。

解答不清晰，无法理解或解决问题。

页面排版不美观，阅读体验差。

文章

问答

视频

教程

学习中心

腾讯云实验室

直播

竞赛

腾讯云代码分析专区

腾讯iOA零信任安全管理系统专区

腾讯云架构师技术同盟交流圈

腾讯云数据库专区

腾讯云智能顾问专区

腾讯云原生专区

腾讯混元专区

腾讯云TCE专区

腾讯云Lighthouse专区

腾讯云HAI专区

腾讯云Edgeone专区

腾讯云存储专区

腾讯云智能专区

腾讯轻联专区 

腾讯云开发专区

TAPD专区

腾讯轻量云游戏服专区

EdgeOne AI 安全实战专区

腾讯云最具价值专家

腾讯云架构师技术同盟

腾讯云创作之星

腾讯云开发者先锋

腾讯云代码助手

云原生构建

TAPD 敏捷项目管理

Cloud Studio

SDK中心

API中心

命令行工具

涵盖代码开发、场景应用、自动测试全流程，助你从零构建专属AI助手

一站式MCP教程库，解锁AI应用新玩法

聚焦“写作效率、视觉美观与运行性能”三方面进行全面升级，为您提供更高效、稳定的创作环境

社区富文本&Markdown编辑器全新改版上线，欢迎大家体验!

诚挚邀请您参与本次调研，分享您的真实使用感受与建议。您的反馈至关重要，感谢您的支持与参与！

社区新版编辑器体验调研

我使用的是pywikibot-core，在另一个python包装器之前，我使用它作为 (它有一个.HTML方法)。我转而使用pywikibot-core，因为我认为它有更多的特性，但我找不到类似的方法。(注意:我不是很熟练)。

问如何使用Pywikibot获得wiki页面的HTML？
EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Pywikibot获得wiki页面的HTML？EN

回答 5

Stack Overflow用户

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问如何使用Pywikibot获得wiki页面的HTML？
EN