我有4个样本数据和5个特征,作为一个数组,data。

import numpy as np
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
print (data)n_samples, n_features = data.shape = (4,5)
当我按如下方式对其应用StandardScaler时,它是跨功能还是跨样本标准化数据?
from sklearn.preprocessing import StandardScaler, MinMaxScaler
result = StandardScaler().fit_transform(data)
print (result)
[[ 0.57735027 1. 1. 1. 0. ]
[-1.73205081 -1. -1. -1. 0. ]
[ 0.57735027 1. 1. 1. 0. ]
[ 0.57735027 -1. -1. -1. 0. ]]在机器学习中,跨样本或跨功能的数据标准化的最佳实践是什么?
发布于 2020-08-03 22:51:48
对于StandardScaler/MinMaxScaler,数据将跨要素进行缩放,这是最佳常见做法
import numpy as np
from sklearn.preprocessing import StandardScaler
data = np.array([[1,1,1,1,0],
[0,0,0,0,0],
[1,1,1,1,0],
[1,0,0,0,0]])
result = StandardScaler().fit_transform(data)
result
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])你可以自己验证一下
(data - data.mean(0))/data.std(0).clip(1e-5)
array([[ 0.57735027, 1. , 1. , 1. , 0. ],
[-1.73205081, -1. , -1. , -1. , 0. ],
[ 0.57735027, 1. , 1. , 1. , 0. ],
[ 0.57735027, -1. , -1. , -1. , 0. ]])https://stackoverflow.com/questions/63231343
复制相似问题