我想使用winequality-white.csv函数读取pandas.read_html()数据。
这是我的代码:
import pandas as pd
wine = pd.DataFrame(
pd.read_html(
"https://github.com/shrikant-temburwar/Wine-Quality-Dataset/blob/master/winequality-white.csv",
thousands=";",
header=0,
)[0]
)..。但结果是:
Unnamed: 0 "fixed acidity";"volatile acidity";"citric acid";"residual sugar";"chlorides";"free sulfur dioxide";"total sulfur dioxide";"density";"pH";"sulphates";"alcohol";"quality"
0 NaN 7;0.27;0.36;20.7;0.045;45;170;1.001;3;0.45;8.8;6
1 NaN 6.3;0.3;0.34;1.6;0.049;14;132;0.994;3.3;0.49;9...
2 NaN 8.1;0.28;0.4;6.9;0.05;30;97;0.9951;3.26;0.44;1...
3 NaN 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4...
4 NaN 7.2;0.23;0.32;8.5;0.058;47;186;0.9956;3.19;0.4... 当然,我可以选择raw,然后使用read_csv,但是在html读取的情况下,如何修复它呢?
发布于 2022-12-01 13:58:26
好的,这里有一个使用pd.read_html的选项
import pandas as pd
wine = pd.read_html(
"https://github.com/shrikant-temburwar/Wine-Quality-Dataset/blob/master/winequality-white.csv",
header=0
)[0]
wine.drop('Unnamed: 0', axis=1, inplace=True)
headers = wine.columns[0].replace('"', '').split(';')
wine.columns = ['data']
wine[headers] = wine.data.str.split(';', expand=True)
wine.drop('data', axis=1, inplace=True)
wine.head()上述守则将导致:
>>> wine.head()
fixed acidity volatile acidity citric acid residual sugar chlorides free sulfur dioxide total sulfur dioxide density pH sulphates alcohol quality
0 7 0.27 0.36 20.7 0.045 45 170 1.001 3 0.45 8.8 6
1 6.3 0.3 0.34 1.6 0.049 14 132 0.994 3.3 0.49 9.5 6
2 8.1 0.28 0.4 6.9 0.05 30 97 0.9951 3.26 0.44 10.1 6
3 7.2 0.23 0.32 8.5 0.058 47 186 0.9956 3.19 0.4 9.9 6
4 7.2 0.23 0.32 8.5 0.058 47 186 0.9956 3.19 0.4 9.9 6
>>> 但是,我绝不会将以下代码片段的简单性转换为上面的代码:
import pandas as pd
wine = pd.read_csv(
'https://raw.githubusercontent.com/shrikant-temburwar/Wine-Quality-Dataset/master/winequality-white.csv',
header=0,
sep=';'
)发布于 2022-12-01 13:28:04
您可能更好地使用github的rawdatacontent地址来消除由于html接口不同而造成的问题。
这里是你能做的
import pandas as pd
import requests
import io
url = "https://raw.githubusercontent.com/shrikant-temburwar/Wine-Quality-Dataset/master/winequality-white.csv"
r = requests.get(url)
obj = io.BytesIO(r.content)
wine = pd.read_csv(obj, delimiter=";")
wine.head()https://stackoverflow.com/questions/74640187
复制相似问题