首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >在所有列中创建具有连续NaN值的元组列表

在所有列中创建具有连续NaN值的元组列表
EN

Stack Overflow用户
提问于 2018-10-12 10:59:16
回答 1查看 326关注 0票数 2

我试图构建一个包含连续开始日期和结束日期的元组列表,在这些日期中,所有列都有NaN值。

在下面的示例中,我的结果应该如下所示:

代码语言:javascript
复制
missing_dates = [('2018-10-10 20:00:00', '2018-10-10 22:00:00'),
('2018-10-11 02:00:00', '2018-10-11 03:00:00 ')]

如果存在孤立的NaN,则应该在元组中重复该值。

带有表的字典示例,用于可视化。

代码语言:javascript
复制
   dicts = [
        {'datetime': '2018-10-10 18:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-10 19:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-10 19:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-10 19:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-10 20:00:00', 'variable1': np.nan, 'variable2': np.nan},
        {'datetime': '2018-10-10 21:00:00', 'variable1': np.nan, 'variable2': np.nan},
        {'datetime': '2018-10-10 22:00:00', 'variable1': np.nan, 'variable2': np.nan},
        {'datetime': '2018-10-10 23:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-10 23:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-11 00:00:00', 'variable1': 20, 'variable2': 30},
        {'datetime': '2018-10-11 01:00:00', 'variable1': np.nan, 'variable2': 30},
        {'datetime': '2018-10-11 02:00:00', 'variable1': np.nan, 'variable2': np.nan},
        {'datetime': '2018-10-11 03:00:00', 'variable1': np.nan, 'variable2': np.nan}]

表表示:

代码语言:javascript
复制
----------------------+-----------+-----------+
|          datetime   | variable1 | variable2 |
+---------------------+-----------+-----------+
| 2018-10-10 18:00:00 |      20.0 |     30.0  |
| 2018-10-10 19:00:00 |      20.0 |     30.0  | 
| 2018-10-10 19:00:00 |      20.0 |     30.0  |
| 2018-10-10 19:00:00 |      20.0 |     30.0  |
| 2018-10-10 20:00:00 |       NaN |     NaN   |
| 2018-10-10 21:00:00 |       NaN |     NaN   |
| 2018-10-10 22:00:00 |       NaN |     NaN   |
| 2018-10-10 23:00:00 |      20.0 |     30.0  |
| 2018-10-10 23:00:00 |      20.0 |     30.0  | 
| 2018-10-11 00:00:00 |      20.0 |     30.0  |
| 2018-10-11 01:00:00 |       NaN |     30.0  |
| 2018-10-11 02:00:00 |       NaN |     NaN   |
| 2018-10-11 03:00:00 |       NaN |     NaN   |
+---------------------+-----------+-----------+

我所做的:

代码语言:javascript
复制
df = pd.DataFrame(example_dict)
s = dframe.set_index('datetime').isnull().all(axis=1)
df['new_col'] = s.values
dframe.datetime = pd.to_datetime(dframe.datetime)
new_df = dframe.loc[dframe['new_col'] == True]
new_df['delta'] = (new_df['datetime'] - new_df['datetime'].shift(1))

我有一个很好的三角洲数据,但我有点迷路了。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-10-12 11:15:05

使用:

代码语言:javascript
复制
#create boolean mask for not NaNs rows
mask = df.drop('datetime', axis=1).notnull().any(axis=1)
#create groups for missing rows with same values
df['g'] = mask.cumsum()

#aggregate first and last, convert to nested lists and map to tuples
L = list(map(tuple, df[~mask].groupby('g')['datetime'].agg(['first','last']).values.tolist()))
print (L)
[('2018-10-10 20:00:00', '2018-10-10 22:00:00'), 
 ('2018-10-11 02:00:00', '2018-10-11 03:00:00')]

类似的解决方案,只有掩码是颠倒的:

代码语言:javascript
复制
mask = df.drop('datetime', axis=1).isnull().all(axis=1)
df['g'] = (~mask).cumsum()

L = list(map(tuple, df[mask].groupby('g')['datetime'].agg(['first','last']).values.tolist()))
print (L)
[('2018-10-10 20:00:00', '2018-10-10 22:00:00'), 
 ('2018-10-11 02:00:00', '2018-10-11 03:00:00')]
票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52778097

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档