首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >连接由pd.read_html制成的多个df

连接由pd.read_html制成的多个df
EN

Stack Overflow用户
提问于 2019-11-11 19:21:18
回答 1查看 30关注 0票数 0

我的标题没有任何意义,所以我将简要介绍一下情况

我从网站上抓取数据,基本上是表格,但在这种情况下,每一行都是一个表格元素,而且每个奇怪的表格元素都没有用,所以我删除了

因此,我想要的是使用read_html()连接由每个偶数表元素组成的每个单独的数据帧。

下面是我的代码

代码语言:javascript
复制
import pandas as pd

all_table = ["""<table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr height="10px">
<td align="right" colspan="9">
<font color="#D5D5D5">.</font>
</td>
</tr>
<tr height="30px" valign="middle" width="100%">
<td class="size-12" colspan="8" width="100%">
<strong>Shipment Status</strong>
</td>
</tr>
<tr valign="bottom" width="100%">
<td align="center" class="size-10" width="10%">
<strong>Station</strong>
</td>
<td align="center" class="size-10" width="10%">
<strong>Flight No.</strong>
</td>
<td align="center" class="size-10" width="25%">
<strong>Status</strong>
</td>
<td align="center" class="size-10" width="15%">
<strong>Date</strong>
</td>
<td align="center" class="size-10" width="9%">
<strong>Time</strong>
</td>
<td align="center" class="size-10" width="8%">
<strong>Pcs</strong>
</td>
<td align="center" class="size-10" width="8%">
<strong>Wgt</strong>
</td>
<td align="center" class="size-10" width="15%">
<strong>ULD - Battery - Temp</strong>
</td>
</tr>
<tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">KIX</td>
<td align="center" class="size-10" width="10%">
<center>-</center>
</td>
<td align="center" class="size-10" width="25%">Shipment Received</td>
<td align="center" class="size-10" width="15%">11 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 22:45</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">KIX</td>
<td align="center" class="size-10" width="10%">
<center>-</center>
</td>
<td align="center" class="size-10" width="25%">Freight On Hand</td>
<td align="center" class="size-10" width="15%">11 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 22:45</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">KIX</td>
<td align="center" class="size-10" width="10%">SQ0621</td>
<td align="center" class="size-10" width="25%">Flight  Departed</td>
<td align="center" class="size-10" width="15%">13 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 17:18</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">SIN</td>
<td align="center" class="size-10" width="10%">SQ0621</td>
<td align="center" class="size-10" width="25%">Flight Arrived</td>
<td align="center" class="size-10" width="15%">13 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 23:02</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">SIN</td>
<td align="center" class="size-10" width="10%">SQ0621</td>
<td align="center" class="size-10" width="25%">Flight Arrived</td>
<td align="center" class="size-10" width="15%">13 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 23:02</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">SIN</td>
<td align="center" class="size-10" width="10%">SQ0621</td>
<td align="center" class="size-10" width="25%">Shipment Checked Into Warehouse</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 02:57</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">SIN</td>
<td align="center" class="size-10" width="10%">SQ0422</td>
<td align="center" class="size-10" width="25%">Flight  Departed</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 07:39</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">SQ0422</td>
<td align="center" class="size-10" width="25%">Flight Arrived</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 10:12</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">SQ0422</td>
<td align="center" class="size-10" width="25%">Flight Arrived</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 10:30</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">SQ0422</td>
<td align="center" class="size-10" width="25%">Shipment Checked Into Warehouse</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 14:10</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">
<center>-</center>
</td>
<td align="center" class="size-10" width="25%">Shipment Ready for Pick-up</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 14:21</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#FFFFFF" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">
<center>-</center>
</td>
<td align="center" class="size-10" width="25%">Document Delivered</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 17:15</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
</table>, <table cellpadding="0" cellspacing="0" cols="8" width="100%">
<tbody><tr bgcolor="#F0F0F0" class="result-row">
<td align="center" class="size-10" width="10%">BOM</td>
<td align="center" class="size-10" width="10%">
<center>-</center>
</td>
<td align="center" class="size-10" width="25%">Shipment Delivered</td>
<td align="center" class="size-10" width="15%">14 Oct 2019</td>
<td align="center" class="size-10" width="9%"> 17:15</td>
<td align="center" class="size-10" width="8%">34</td>
<td align="center" class="size-10" width="8%">411.3</td>
<td align="center" class="size-10" width="15%"></td>
</tr>
</tbody></table>"""]

final_delivary = pd.DataFrame()

a = 0
for i in range(len(all_table)):
    print("-"*150)
    if a % 2 == 0:
        print(a)
        # print(all_table[a])
        tmp_table = all_table[a]
        tmp_df = pd.read_html(str(tmp_table))
        print("tmp_df = \n", tmp_df)
        print("type of tmp_df = ", type(tmp_df))
        print("#"*75)
        tmp_df2 = pd.DataFrame(tmp_df[0])
        print("tmp_df2 = \n", tmp_df2)
        print("type of tmp_df2 = ", type(tmp_df2))
        print("@"*75)
        print("final_delivary = \n", final_delivary)
        print("type of final_delivary = ", type(final_delivary))
        pd.concat([final_delivary, tmp_df2], axis=0)
    else:
        print("nope")
    a+=1

print("final_delivary = ", final_delivary)

因此,我在将单个数据帧连接到主数据帧时遇到了问题,而我得到的结果是空数据帧,所以请帮助我解决这个问题

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2019-11-11 19:51:56

尝尝这个

代码语言:javascript
复制
from bs4 import BeautifulSoup as bs
import pandas as pd
all_table = '''
                html content
            '''
finalDf = pd.DataFrame()
soup = bs(all_table)
tables = soup.findAll("table")
for i,table in enumerate(tables):
    if i%2==0:  
        df = pd.read_html(str(table))
        finalDf = pd.concat([finalDf,df[0]])
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58800357

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档