首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >为具有空单元格的Python-dataframe的所有列生成单独的散点图

为具有空单元格的Python-dataframe的所有列生成单独的散点图
EN

Stack Overflow用户
提问于 2020-04-15 22:02:05
回答 1查看 330关注 0票数 1

我尝试自动绘制大型数据帧的相关性图。目标是将每一列与另一列在散点中绘制,并通过一条回归线。每一列代表一个不同的变量,一列可能有空单元格、整数和字符串值(尝试代码和工作示例如下)

示例代码:

代码语言:javascript
复制
Age     Height   Weight  Sex
21      180      54      M
56      171      65      V
23      NaN      84      V
NaN     195      71      M
42      165      67      V
84      167      93      M
12      NaN      88      M
31      152      73      V
NaN     184      NaN     V


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height": 
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex": 
['M','V','V','M','V','M','M','V','V']})

print(df_subset)
col_choice = ["Age", "Height", "Weight"]

for pos1, axis1 in enumerate(col_choice):   # Pick a first col
    for pos2, axis2 in enumerate(col_choice[pos1+1:]):   # Pick a later col
        plt.scatter(df_subset.loc[:,axis1], df_subset.loc[:,axis2]) #scatter plot
        a, b = np.polyfit(df_subset.loc[:,axis1], df_subset.loc[:,axis2], 1) #determining parameters for regression line
        x = df_subset.loc[:,axis1]
        plt.plot(x, a*x + b) #regression line on scatter-plot
        plt.show()
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2020-04-16 18:14:56

解决方案:

代码语言:javascript
复制
    import pandas as pd

    import numpy as np

    import matplotlib.pyplot as plt



    df_subset = pd.DataFrame({"Age": [21,56,23,np.nan,42,84,12,31,np.nan], "Height": 
[180,171,np.nan,195,165,167,np.nan,152,184], "Weight": [54,65,84,71,67,93,88,73,np.nan], "Sex": 
['M','V','V','M','V','M','M','V','V']})

 
    print(df_subset)



    col_choice = ["Age", "Height", "Weight"]

    for pos1, axis1 in enumerate(col_choice):   # Pick a first col
    
        for pos2, axis2 in enumerate(col_choice[pos1+1:]):   # Pick a later col
        
            df = df_subset[[axis1,axis2]].dropna()
        
            print(df)
        
            plt.scatter(df.iloc[:,0], df.iloc[:,1]) #scatter plot
        
            a, b = np.polyfit(df.iloc[:,0], df.iloc[:,1], 1) #determining parameters for regression line
        
            x = df.iloc[:,0]
        
            plt.plot(x, a*x + b) #regression line on scatter-plot
        
            plt.show()
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/61230716

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档