首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >Pythonic方法生成海运热图子图

Pythonic方法生成海运热图子图
EN

Stack Overflow用户
提问于 2021-07-16 00:46:09
回答 3查看 153关注 0票数 2

我有一个包含7列的数据帧。Regressor列有3个不同的回归变量(DTDT-2DT-4)。

我想生成一个相关性热图。

代码语言:javascript
复制
df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()

df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()

#  SUBPLOTS
fig = plt.figure(figsize=(12,6))

plt.subplot(221)  
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')

plt.subplot(222)  
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')

plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')

plt.show()

我也得到了剧情

现在,问题是,如果我有10个回归变量,那么我必须为每个回归变量编写10次重复的代码。这不是一种pythonic方式,也不是好的编程实践。

有没有办法用pythonic的方式做同样的工作(例如,使用循环等)?

请注意:在演示数据帧中,我有3个回归变量,但在我的主数据帧中,我可以有更多的回归变量。因此,我需要一种动态的方法来生成基于回归变量的图。

演示数据:

代码语言:javascript
复制
{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}
EN

回答 3

Stack Overflow用户

发布于 2021-07-19 15:19:16

已经可用的答案是使用循环,但我环顾了一下,看看是否可以使用镶嵌面网格来处理这个问题。这是一个很棒的answer。我对它进行了修改以适合您的代码。使用类别变量将单个数据框拆分为多个列,以限制列数。map函数使用拆分的数据绘制热图。然而,我们找不到一种方法来设置颜色映射。我认为使用单一颜色映射的扩展非常适合分析。

代码语言:javascript
复制
import pandas as pd
import seaborn as sns

data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}

df_dt_corr = pd.DataFrame(data)

g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))

票数 2
EN

Stack Overflow用户

发布于 2021-07-19 01:54:51

这只是一个简单的例子,把所有的东西放在一个循环中。首先,程序通过获取df['Regressors'].values中的所有唯一值来查找应该使用的回归变量。

axes是根据有多少个回归变量自动决定的。它会试着做一个正方形。

可能的色彩映射表被定义为colors,如果您想要不同的颜色,请更改此列表。程序从第一种颜色开始,然后是第二种颜色,依此类推。如果有太少的颜色,它将循环回到开始。

代码语言:javascript
复制
regressors = set(df['Regressor'].values)
fig = plt.figure(figsize=(12,6))

import math
axes = (math.ceil(math.sqrt(len(regressors))),) * 2

colors = [
            'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
            'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
            'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']

for index, regressor in enumerate(regressors):
    df_dt = df[(df['Regressor']==regressor)]
    df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()

    plt.subplot(*axes, index + 1)
    plt.title('Regressor: ' + regressor)
    sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
plt.show()

我更改了您使用plt.subplot的方式,因为您使用的方法只支持最多9个绘图,并且以这种方式自动修改轴更容易。

票数 0
EN

Stack Overflow用户

发布于 2021-07-25 22:18:35

首先选择唯一值

我将Regressor列中的唯一值存储到vals变量中。然后我使用它对每个值进行循环。请参阅下面的解决方案:

代码语言:javascript
复制
# get the unique values in "Regressor" column
vals=df['Regressor'].unique()

plt.figure(figsize=[10,10],dpi=200)
plt.suptitle("Correlation Map") # Super Title
# start the loop for selecting data and plotting
for idx, value in enumerate(vals):
    #get the dataframe for the unique value and drop the unwanted column using the "iloc"
    data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
    # plot the correlation map
    plt.subplot(len(vals),2,idx+1)
    plt.title(f"Regressor={value}")
    sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here

您只需在此处选择子图和字幕中各列的列数。

结果

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/68397783

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档