我有一个包含7列的数据帧。Regressor列有3个不同的回归变量(DT、DT-2和DT-4)。
我想生成一个相关性热图。
df_dt = df[(df["Regressor"]=="DT")]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()
df_dt2 = df[(df["Regressor"]=="DT-2")]
df_dt2_corr = df_dt2.drop(["Regressor"], axis=1).corr()
df_dt4 = df[(df["Regressor"]=="DT-4")]
df_dt4_corr = df_dt4.drop(["Regressor"], axis=1).corr()
# SUBPLOTS
fig = plt.figure(figsize=(12,6))
plt.subplot(221)
plt.title('Regressor: DT')
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap = 'Reds_r')
plt.subplot(222)
plt.title('Regressor: DT-2')
sns.heatmap(df_dt2_corr, annot=True, fmt='.2f', square=True, cmap = 'Blues_r')
plt.subplot(223)
plt.title('Regressor: DT-4')
sns.heatmap(df_dt4_corr, annot=True, fmt='.2f', square=True, cmap = 'BuGn_r')
plt.show()我也得到了剧情

现在,问题是,如果我有10个回归变量,那么我必须为每个回归变量编写10次重复的代码。这不是一种pythonic方式,也不是好的编程实践。
有没有办法用pythonic的方式做同样的工作(例如,使用循环等)?
请注意:在演示数据帧中,我有3个回归变量,但在我的主数据帧中,我可以有更多的回归变量。因此,我需要一种动态的方法来生成基于回归变量的图。
演示数据:
{'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}发布于 2021-07-19 15:19:16
已经可用的答案是使用循环,但我环顾了一下,看看是否可以使用镶嵌面网格来处理这个问题。这是一个很棒的answer。我对它进行了修改以适合您的代码。使用类别变量将单个数据框拆分为多个列,以限制列数。map函数使用拆分的数据绘制热图。然而,我们找不到一种方法来设置颜色映射。我认为使用单一颜色映射的扩展非常适合分析。
import pandas as pd
import seaborn as sns
data = {'Regressor': {0: 'DT', 1: 'DT', 2: 'DT', 3: 'DT', 4: 'DT', 19: 'DT-2', 20: 'DT-2', 21: 'DT-2', 22: 'DT-2', 23: 'DT-2', 39: 'DT-4', 40: 'DT-4', 41: 'DT-4', 42: 'DT-4', 43: 'DT-4'}, 'Method': {0: 'method_1', 1: 'method_1', 2: 'method_1', 3: 'method_1', 4: 'method_1', 19: 'method_1', 20: 'method_1', 21: 'method_1', 22: 'method_1', 23: 'method_1', 39: 'method_1', 40: 'method_1', 41: 'method_1', 42: 'method_1', 43: 'method_1'}, 'CE': {0: 0.002874032327519, 1: 0.005745640214479, 2: 0.004661679592489, 3: 0.002846754581854, 4: 0.004576990206546, 19: 0.105364819313149, 20: 0.085976562255755, 21: 0.095881176731004, 22: 0.097398912201617, 23: 0.100491941499165, 39: 0.018162548523961, 40: 0.018954401200213, 41: 0.01788125083107, 42: 0.019784900032633, 43: 0.020438103824639}, 'MAE': {0: 0.737423646017325, 1: 2.00787732271062, 2: 2.86926125864208, 3: 3.32855382663718, 4: 3.77490323897613, 19: 13.345092685398, 20: 12.8063543324171, 21: 13.1292091661974, 22: 13.1451455897874, 23: 13.6537246486947, 39: 3.2667181947348, 40: 4.29467676417246, 41: 5.34081768096088, 42: 5.50421114390641, 43: 7.46988963588581}, 'MSqE': {0: 0.847829904338757, 1: 6.68342912741117, 2: 12.5560681493523, 3: 17.2772893168584, 4: 22.02275890951, 19: 232.978432669064, 20: 237.820275013751, 21: 244.5869111788, 22: 247.73962294989, 23: 266.451945948429, 39: 15.6880657226101, 40: 28.2245308508171, 41: 44.7562607712654, 42: 46.5234139459763, 43: 87.2324237935045}, 'R2': {0: 0.999729801060669, 1: 0.998038240639634, 2: 0.996528815654117, 3: 0.995203737109921, 4: 0.993477444422499, 19: 0.926657847114707, 20: 0.93726355821839, 21: 0.932221279553296, 22: 0.91924882453144, 23: 0.925514811021512, 39: 0.995151906119729, 40: 0.991723226976753, 41: 0.986284593333255, 42: 0.982615342502863, 43: 0.97292435121805}}
df_dt_corr = pd.DataFrame(data)
g = sns.FacetGrid(df_dt_corr, col="Regressor", col_wrap=2)
g.map_dataframe(lambda data, color:sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True))

发布于 2021-07-19 01:54:51
这只是一个简单的例子,把所有的东西放在一个循环中。首先,程序通过获取df['Regressors'].values中的所有唯一值来查找应该使用的回归变量。
axes是根据有多少个回归变量自动决定的。它会试着做一个正方形。
可能的色彩映射表被定义为colors,如果您想要不同的颜色,请更改此列表。程序从第一种颜色开始,然后是第二种颜色,依此类推。如果有太少的颜色,它将循环回到开始。
regressors = set(df['Regressor'].values)
fig = plt.figure(figsize=(12,6))
import math
axes = (math.ceil(math.sqrt(len(regressors))),) * 2
colors = [
'Greys', 'Purples', 'Blues', 'Greens', 'Oranges', 'Reds',
'YlOrBr', 'YlOrRd', 'OrRd', 'PuRd', 'RdPu', 'BuPu',
'GnBu', 'PuBu', 'YlGnBu', 'PuBuGn', 'BuGn', 'YlGn']
for index, regressor in enumerate(regressors):
df_dt = df[(df['Regressor']==regressor)]
df_dt_corr = df_dt.drop(["Regressor"], axis=1).corr()
plt.subplot(*axes, index + 1)
plt.title('Regressor: ' + regressor)
sns.heatmap(df_dt_corr, annot=True, fmt='.2f', square=True, cmap=colors[index%len(colors)])
plt.show()我更改了您使用plt.subplot的方式,因为您使用的方法只支持最多9个绘图,并且以这种方式自动修改轴更容易。
发布于 2021-07-25 22:18:35
首先选择唯一值
我将Regressor列中的唯一值存储到vals变量中。然后我使用它对每个值进行循环。请参阅下面的解决方案:
# get the unique values in "Regressor" column
vals=df['Regressor'].unique()
plt.figure(figsize=[10,10],dpi=200)
plt.suptitle("Correlation Map") # Super Title
# start the loop for selecting data and plotting
for idx, value in enumerate(vals):
#get the dataframe for the unique value and drop the unwanted column using the "iloc"
data=df[df['Regressor']==value].iloc[:,2:] # 2: selects the thrid column onwards
# plot the correlation map
plt.subplot(len(vals),2,idx+1)
plt.title(f"Regressor={value}")
sns.heatmap(data.corr(), annot=True, fmt='.2f', square=True) here您只需在此处选择子图和字幕中各列的列数。
结果

https://stackoverflow.com/questions/68397783
复制相似问题