文章/答案/技术大牛

发布

社区首页 >问答首页 >使用xmltodict访问Python标记中的一行

问使用xmltodict访问Python标记中的一行
EN

Stack Overflow用户

提问于 2019-11-26 05:56:33

回答 2查看 301关注 0票数 0

我有一个xml文件，看起来如下：

<!-- For the full list of available Crowd HTML Elements and their input/output documentation,
      please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html -->

<!-- You must include crowd-form so that your task submits answers to MTurk -->
<crowd-form answer-format="flatten-objects">

    <!-- The crowd-classifier element will create a tool for the Worker to
 select the correct answer to your question.
          Your image file URLs will be substituted for the "image_url" variable below

          when you publish a batch with a CSV input file containing multiple image file URLs.

          To preview the element with an example image, try setting the src attribute to

          "https://s3.amazonaws.com/cv-demo-images/two-birds.jpg" -->
<crowd-image-classifier\n        
src= "https://someone@example.com/abcd.jpg"\n        
categories="[\'Yes\', \'No\']"\n        
header="abcd"\n        
name="image-contains">\n\n       
<!-- Use the short-instructions section for quick instructions that the Worker\n
will see while working on the task. Including some basic examples of\n              
good and bad answers here can help get good results. You can include\n              
any HTML here. -->\n        
<short-instructions>\n\n        
</crowd-image-classifier>
</crowd-form>
<!-- YOUR HTML ENDS -->

我想提取这条线：

src = https://someone@example.com/abcd.jpg

并将其赋值给python中的一个变量。xml解析的Bit新技术：

我试着说：

hit_doc = xmltodict.parse(get_hit['HIT']['Question'])
image_url = hit_doc['HTMLQuestion']['HTMLContent']['crowd-form']['crowd-image-classifier']

错误：

    image_url = hit_doc['HTMLQuestion']['HTMLContent']['crowd-form']['crowd-image-classifier']
TypeError: string indices must be integers

如果我没有在代码中访问'crowd-image-classifier'并限制自己

hit_doc = xmltodict.parse(get_hit['HIT']['Question'])
image_url = hit_doc['HTMLQuestion']['HTMLContent']

然后我将获得完整的xml文件。

如何访问img src？

python

regex

python-3.x

xml

xmltodict

回答 2

Stack Overflow用户

发布于 2019-11-26 11:24:04

您可以使用BeautifulSoup。见下面的工作代码。

from bs4 import BeautifulSoup


html = '''<!-- For the full list of available Crowd HTML Elements and their input/output documentation,
      please refer to https://docs.aws.amazon.com/sagemaker/latest/dg/sms-ui-template-reference.html -->

<!-- You must include crowd-form so that your task submits answers to MTurk -->
<crowd-form answer-format="flatten-objects">

    <!-- The crowd-classifier element will create a tool for the Worker to
 select the correct answer to your question.
          Your image file URLs will be substituted for the "image_url" variable below

          when you publish a batch with a CSV input file containing multiple image file URLs.

          To preview the element with an example image, try setting the src attribute to

          "https://s3.amazonaws.com/cv-demo-images/two-birds.jpg" -->
<crowd-image-classifier\n        
src= "https://someone@example.com/abcd.jpg"\n        
categories="[\'Yes\', \'No\']"\n        
header="abcd"\n        
name="image-contains">\n\n       
<!-- Use the short-instructions section for quick instructions that the Worker\n
will see while working on the task. Including some basic examples of\n              
good and bad answers here can help get good results. You can include\n              
any HTML here. -->\n        
<short-instructions>\n\n        
</crowd-image-classifier>
</crowd-form>
<!-- YOUR HTML ENDS -->'''

soup = BeautifulSoup(html, 'html.parser')
element = soup.find('crowd-image-classifier')
print(element['src'])

输出

https://someone@example.com/abcd.jpg

票数 1

Stack Overflow用户

发布于 2019-11-29 06:04:43

我转而使用xml元素树。

我得到的语法有点类似于：

import xml.etree.ElementTree as ET
root = ET.fromstring(hit_doc)
for child in root:
    if child[0].text == 'crowd-image-classifier':
    image_data = child[1].text

票数 0

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/59044608

复制

相似问题

问使用xmltodict访问Python标记中的一行
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用xmltodict访问Python标记中的一行EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问使用xmltodict访问Python标记中的一行
EN