在做作业时,我有"doc.html“文件和数据:
<span class="descriptor">Title:</span> Automated Scalable Bayesian Inference via Hilbert Coresets
<span class="descriptor">Title:</span> PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
<span class="descriptor">Title:</span> Covariances, Robustness, and Variational Bayes
<span class="descriptor">Title:</span> Edge-exchangeable graphs and sparsity (NIPS 2016)
<span class="descriptor">Title:</span> Fast Measurements of Robustness to Changing Priors in Variational Bayes
<span class="descriptor">Title:</span> Boosting Variational Inference对于每一行,我都试图在</span>之后得到任何东西--所以预期的输出应该是:
Automated Scalable Bayesian Inference via Hilbert Coresets
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Covariances, Robustness, and Variational Bayes
Edge-exchangeable graphs and sparsity (NIPS 2016)
Fast Measurements of Robustness to Changing Priors in Variational Bayes
Boosting Variational Inference我尝试了下面的代码(不起作用)。
from bs4 import BeautifulSoup
with open("doc.html") as fp:
soup = BeautifulSoup(fp, 'html.parser')
for line in soup.find_all('span'):
print line.get_text()丢失的那块是什么?
发布于 2017-10-29 06:13:14
您需要span元素的nextSibling,而不是span内部的text!
注意:使用条带()删除尾随的换行符。
>>> with open("doc.html") as fp:
... soup = BeautifulSoup(fp, 'html.parser')
... for line in soup.find_all('span'):
... print line.nextSibling.strip()
...
Automated Scalable Bayesian Inference via Hilbert Coresets
PASS-GLM: polynomial approximate sufficient statistics for scalable Bayesian GLM inference
Covariances, Robustness, and Variational Bayes
Edge-exchangeable graphs and sparsity (NIPS 2016)
Fast Measurements of Robustness to Changing Priors in Variational Bayes
Boosting Variational Inference
>>> https://stackoverflow.com/questions/46997325
复制相似问题