首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >熊猫数据清理

熊猫数据清理
EN

Stack Overflow用户
提问于 2018-07-31 15:53:36
回答 1查看 102关注 0票数 0

所以我正在阅读从PDF格式到熊猫数据的表格,但是我对熊猫来说还是很新的,而且阅读这些文档是相当令人畏惧的。我相信有一个相当简单的方法来做我需要做的,但我只是不知道怎么做。

代码语言:javascript
复制
          0                    1           2        3                4                5       6       7                      8              9        10               11          12   13
0        NaN                 col0        col1     col2             col3             col4    col5    col6                   col7           col8     col9            col10       col11  NaN
1        NaN             Location        Date      NaN              NaN              NaN     NaN     NaN                    NaN            NaN      NaN              NaN         NaN  NaN
2        NaN             measure1         1**     40**             30**             20**      20  0.02**                    3**           10**      5**            100**        15**  NaN
3        NaN             measure2         100      400              300              200     200       2                    300            100       50            1,000         150  NaN
4        NaN            location1   1/15/1994     5900            28000             7600   25000     150                    ---            ---      ---              ---         ---  ---
5        NaN                  NaN   3/16/1994     4900            12000             4400   11000      60                    ---            ---      ---              ---         ---  ---
6        NaN                  NaN    1/4/1995        1                1                1       1       8                    ---            ---      ---              ---         ---  ---
7        NaN                  NaN   4/12/2004     8400            34000             4600   17000   <1000                    ---            ---      ---              ---         ---  ---
8        NaN                  NaN   7/28/2008     3200            15400             4430   17100  172  I                    ---            ---      ---              ---         ---  ---
9        NaN                  NaN   5/19/2011     2000            11000             2500    9200  0.2  1                    ---            ---      ---              ---         ---  ---
10       NaN                  NaN    8/6/2013     2700            20000             5300   20000    2  6                    ---            ---      ---              ---         ---  ---
11       NaN                  NaN  11/13/2013     2600            14000             5400   20000  0.1  3                    ---            ---      ---              ---         ---  ---
12       NaN                  NaN    2/5/2014     3200            19000             6400   25000   18  0                    ---            ---      ---              ---         ---  ---
13       NaN                  NaN    5/7/2014     2000            15000             4100   16000   22  0                    ---            ---      ---              ---         ---  ---
14       NaN                  NaN  12/18/2014     2500            32000             5200   20000    8  8                    ---            ---      ---              ---         ---  ---
15       NaN                  NaN    6/4/2015     1700            15000             5200   21000   44  0                    ---            ---      ---              ---         ---  ---
16       NaN                  NaN   1/20/2017     1400           15,000            6,300  21,000    1  2                    ---            ---      ---              ---         ---  ---
17       NaN            location2   1/15/1994      210              290               39     180      69                    ---            ---      ---              ---         ---  ---
18       NaN                  NaN   3/24/1994     1500            12000             4100   18000  400  0                    ---            ---      ---              ---         ---  ---
19       NaN                  NaN    1/4/1995        1                1                1       1       8                    ---            ---      ---              ---         ---  ---
20       NaN                  NaN    2/1/2000    <1000             8900             5200   58000  <10000                    ---            ---      ---              ---         ---  ---
21       NaN                  NaN   4/12/2004     <5.0               42               78     540     150                    ---            ---      ---              ---         ---  ---
22       NaN                  NaN   7/28/2008     23.3             27.9               28     409    9.34                    ---            ---      ---              ---         ---  ---
23       NaN                  NaN   5/19/2011      1.8               12               22     170  0.2  1                    ---            ---      ---              ---         ---  ---
24       NaN                  NaN    8/6/2013      4.3               23               71     590  0.1  3                    ---            ---      ---              ---         ---  ---
25       NaN                  NaN   1/19/2017   0.21 I           0.26 I              7.7      42  0.2  4                    ---            ---      ---              ---         ---  ---
26       NaN            location3   3/21/1994       <1               <1               <1      <1      <8                    ---            ---      ---              ---         ---  ---
27  2/1/2000                   <1          <1       <1               <2              <10     ---     ---                    ---            ---      ---              ---         NaN  NaN

因此,我需要解决三个主要问题。

首先:最后一行不知何故与其他行不对齐。我需要将两列上丢失的行中的所有值移到右边,以便日期排列起来。这也意味着第一列不应该存在。

第二:由于在PDF中设置这些表的方式很愚蠢,其他一些事情也搞砸了。日期列应该只是日期。我需要以某种方式将日期列中没有说‘日期’或有一个日期的所有行移到一个列中。

最后:地点南安。每个位置下的所有NaN值实际上都与相同的位置有关,因此我需要以某种方式填充这些值。

所以我想要的输出会更像这样..。

代码语言:javascript
复制
          0                 1           2        3                4                5       6       7                      8              9        10               11          12      13
0       
1                     Location        Date     col1             col2             col3    col4    col5                   col6           col7     col8             col9       col10    col11
2                     measure1         NaN      1**             40**             30**    20**      20                 0.02**            3**     10**              5**       100**     15**
3                     measure2         NaN      100              400              300     200     200                      2            300      100               50       1,000     150
4                    location1   1/15/1994     5900            28000             7600   25000     150                    ---            ---      ---              ---         ---     ---
5                    location1   3/16/1994     4900            12000             4400   11000      60                    ---            ---      ---              ---         ---     ---
6                    location1    1/4/1995        1                1                1       1       8                    ---            ---      ---              ---         ---     ---
7                    location1   4/12/2004     8400            34000             4600   17000   <1000                    ---            ---      ---              ---         ---     ---
8                    location1   7/28/2008     3200            15400             4430   17100  172  I                    ---            ---      ---              ---         ---     ---
9                    location1   5/19/2011     2000            11000             2500    9200  0.2  1                    ---            ---      ---              ---         ---     ---
10                   location1    8/6/2013     2700            20000             5300   20000    2  6                    ---            ---      ---              ---         ---     ---
11                   location1  11/13/2013     2600            14000             5400   20000  0.1  3                    ---            ---      ---              ---         ---     ---
12                   location1    2/5/2014     3200            19000             6400   25000   18  0                    ---            ---      ---              ---         ---     ---
13                   location1    5/7/2014     2000            15000             4100   16000   22  0                    ---            ---      ---              ---         ---     ---
14                   location1  12/18/2014     2500            32000             5200   20000    8  8                    ---            ---      ---              ---         ---     ---
15                   location1    6/4/2015     1700            15000             5200   21000   44  0                    ---            ---      ---              ---         ---     ---
16                   location1   1/20/2017     1400           15,000            6,300  21,000    1  2                    ---            ---      ---              ---         ---     ---
17                   location2   1/15/1994      210              290               39     180      69                    ---            ---      ---              ---         ---     ---
18                   location2   3/24/1994     1500            12000             4100   18000  400  0                    ---            ---      ---              ---         ---     ---
19                   location2    1/4/1995        1                1                1       1       8                    ---            ---      ---              ---         ---     ---
20                   location2    2/1/2000    <1000             8900             5200   58000  <10000                    ---            ---      ---              ---         ---     ---
21                   location2   4/12/2004     <5.0               42               78     540     150                    ---            ---      ---              ---         ---     ---
22                   location2   7/28/2008     23.3             27.9               28     409    9.34                    ---            ---      ---              ---         ---     ---
23                   location2   5/19/2011      1.8               12               22     170  0.2  1                    ---            ---      ---              ---         ---     ---
24                   location2    8/6/2013      4.3               23               71     590  0.1  3                    ---            ---      ---              ---         ---     ---
25                   location2   1/19/2017   0.21 I           0.26 I              7.7      42  0.2  4                    ---            ---      ---              ---         ---     ---
26                   location3   3/21/1994       <1               <1               <1      <1      <8                    ---            ---      ---              ---         ---     ---
27                   location3    2/1/2000       <1               <1               <1      <2     <10                    ---            ---      ---              ---         ---     ---
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-07-31 16:35:21

对于第一点,您可以尝试如下:

代码语言:javascript
复制
df = df.T
df.iloc[:,-1] = df.iloc[:,-1].shift(1)
df = df.T
df = df.drop(df.columns[0], axis=1)

关于最后一点:

代码语言:javascript
复制
df['1'] = df['1'].ffill()
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/51617492

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档