文章/答案/技术大牛

发布

社区首页 >问答首页 >用BeautifulSoup在Python中循环使用XML表块

问用BeautifulSoup在Python中循环使用XML表块
EN

Stack Overflow用户

提问于 2018-05-03 23:03:26

回答 1查看 563关注 0票数 0

当我用漂亮的汤解析一些XML时，我有一个数据结构：

<h2>Fri 4 May</h2><table cellspacing="0" cellpadding="12">

            <tr>

                <td class="time ">6:00am</td>  

                <td class="other-details ">

                    <a class="prog-link" href="http://www.tvguide.co.uk/m-detail/157702075/137913159/breakfast" id="308829348" >

                        <div class="title" style="border-left:4px solid #CE3D32">

                            Breakfast

                        </div>

                        <div class="detail">

                        A round-up of national and international news, plus current affairs, arts and entertainment, and weather   

                            <div class="other">

                            (Subtitles) (Interactive) 

                            </div>

                            <br>





                                    <div class="rating">Rating:  <span class="rating-num">1.5</span></div>



                        </div>

                    </a>

                </td>

            </tr>
...
...
...
</table>

这些结构中有几个是按时间顺序排列的，其中有连续几天的电视导播数据。

我现在掌握的代码如下：

for x in soup.select('h2'):

                for tr in soup.select('table tr'):

                    if not tr.script:

                        for td in tr.find_all('td'):

                            a = ''.join(re.sub(r'\s+', ' ', td.text))
                            b = a.strip()

                            #print x.text
                            #print b

                            if b[:1] in '0123456789':


                                date_list.append(b)


                            else:

                                if ' Rating' in b:

                                    c = b.split(' Rating')

                                else:

                                    c = b.split(' Rating')
                                    c.append(0.0)

                                desc = c[0]
                                desc_list.append(desc)


                                rating = ''.join(['Rating: ', str(c[1])])
                                rating_list.append(rating)

但是，这给了我在<h2>标记中定义的每个日期实例的所有可能日期之间的每个块。我真正想要的是按逻辑顺序：

按顺序对每个<h2>日期标记进行交互。
只打印属于那一天的<table>块。

我快到了，只是想不出我需要做的最后修改。

xml

beautifulsoup

python

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-05-03 23:18:37

我认为问题可能在于，" soup.select“总是从XML的开头开始，因此在第二个soup.select中，您将找到tr的所有实例。

在下面的片段中，我将第二个soup.select替换为x.select --它将只从"x“节点进行选择，而不是从一开始就进行选择。

for x in soup.select('table'):

                for tr in x.select('tr'):

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/50164915

复制

相似问题

问用BeautifulSoup在Python中循环使用XML表块
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用BeautifulSoup在Python中循环使用XML表块EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问用BeautifulSoup在Python中循环使用XML表块
EN