我试图找出葡萄酒数据集的化学特性在多大程度上影响数据集的质量属性。
错误:
ValueError: y数据不在logit链接函数的域中。预期域: 0.0,1.0,但找到3.0,9.0
守则:
import pandas as pd
from pygam import LogisticGAM
white_data = pd.read_csv("winequality-white.csv",sep=';');
X = white_data[[
"fixed acidity","volatile acidity","citric acid","residual sugar","chlorides","free sulfur dioxide",
"total sulfur dioxide","density","pH","sulphates","alcohol"
]]
print(X.describe)
y = pd.Series(white_data["quality"]);
print(white_quality.describe)
white_gam = LogisticGAM().fit(X, y)所述代码的输出:
<bound method NDFrame.describe of fixed acidity volatile acidity citric acid residual sugar chlorides \
0 7.0 0.27 0.36 20.7 0.045
1 6.3 0.30 0.34 1.6 0.049
2 8.1 0.28 0.40 6.9 0.050
3 7.2 0.23 0.32 8.5 0.058
4 7.2 0.23 0.32 8.5 0.058
... ... ... ... ... ...
4893 6.2 0.21 0.29 1.6 0.039
4894 6.6 0.32 0.36 8.0 0.047
4895 6.5 0.24 0.19 1.2 0.041
4896 5.5 0.29 0.30 1.1 0.022
4897 6.0 0.21 0.38 0.8 0.020
free sulfur dioxide total sulfur dioxide density pH sulphates \
0 45.0 170.0 1.00100 3.00 0.45
1 14.0 132.0 0.99400 3.30 0.49
2 30.0 97.0 0.99510 3.26 0.44
3 47.0 186.0 0.99560 3.19 0.40
4 47.0 186.0 0.99560 3.19 0.40
... ... ... ... ... ...
4893 24.0 92.0 0.99114 3.27 0.50
4894 57.0 168.0 0.99490 3.15 0.46
4895 30.0 111.0 0.99254 2.99 0.46
4896 20.0 110.0 0.98869 3.34 0.38
4897 22.0 98.0 0.98941 3.26 0.32
alcohol
0 8.8
1 9.5
2 10.1
3 9.9
4 9.9
... ...
4893 11.2
4894 9.6
4895 9.4
4896 12.8
4897 11.8
[4898 rows x 11 columns]>
<bound method NDFrame.describe of 0 6
1 6
2 6
3 6
4 6
..
4893 6
4894 5
4895 6
4896 7
4897 6
Name: quality, Length: 4898, dtype: int64>
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-71-e1c5720823a6> in <module>
16 print(white_quality.describe)
17
---> 18 white_gam = LogisticGAM().fit(X, y)
~/miniconda3/lib/python3.7/site-packages/pygam/pygam.py in fit(self, X, y, weights)
893
894 # validate data
--> 895 y = check_y(y, self.link, self.distribution, verbose=self.verbose)
896 X = check_X(X, verbose=self.verbose)
897 check_X_y(X, y)
~/miniconda3/lib/python3.7/site-packages/pygam/utils.py in check_y(y, link, dist, min_samples, verbose)
227 .format(link, get_link_domain(link, dist),
228 [float('%.2f'%np.min(y)),
--> 229 float('%.2f'%np.max(y))]))
230 return y
231
ValueError: y data is not in domain of logit link function. Expected domain: [0.0, 1.0], but found [3.0, 9.0]文件:(我在使用朱庇特笔记本,但我不认为你需要):https://drive.google.com/drive/folders/1RAj2Gh6WfdzpwtgbMaFVuvBVIWwoTUW5?usp=sharing
发布于 2020-01-15 13:48:01
您可能希望将LinearGAM - LogisticGAM用于分类任务。
https://stackoverflow.com/questions/58796551
复制相似问题