我目前使用的是Jupyter笔记本电脑来分析公司数据。我的第一步是清理和格式化数据。到目前为止我的代码是:
%matplotlib inline
# First, we'll import pandas, a data processing and CSV file I/O library
import pandas as pd
# We'll also import seaborn, a Python graphing library
import warnings # current version of seaborn generates a bunch of warnings that we'll ignore
warnings.filterwarnings("ignore")
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
sns.set(style="dark", color_codes=True)
Users = pd.read_csv("Users.csv", delimiter = ';', engine = 'python') # maak een pandas dataframe per bestand
Users['ContractHours'].fillna(0, inplace = True)
Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)然后,我尝试在列ContractHours中将NaN值替换为零,并将该列转换为浮点型。已成功将NaN替换为0。但是我收到了这个错误:
ValueError Traceback (most recent call last)
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56156)()
ValueError: Unable to parse string "32,5"
During handling of the above exception, another exception occurred:
ValueError Traceback (most recent call last)
<ipython-input-22-bcb66b8c06fb> in <module>()
20 #Users = Users['ContractHours'].replace(',', '.')
21 Users['ContractHours'].fillna(0, inplace = True)
---> 22 Users['ContractHours'] = Users['ContractHours'].apply(pd.to_numeric)
23
24 #print(Customers.head(10))
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype, args, **kwds)
2353 else:
2354 values = self.asobject
-> 2355 mapped = lib.map_infer(values, f, convert=convert_dtype)
2356
2357 if len(mapped) and isinstance(mapped[0], Series):
pandas\_libs\src\inference.pyx in pandas._libs.lib.map_infer (pandas\_libs\lib.c:66645)()
C:\Users\masc\AppData\Local\Continuum\Anaconda3\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
124 coerce_numeric = False if errors in ('ignore', 'raise') else True
125 values = lib.maybe_convert_numeric(values, set(),
--> 126 coerce_numeric=coerce_numeric)
127
128 except Exception:
pandas\_libs\src\inference.pyx in pandas._libs.lib.maybe_convert_numeric (pandas\_libs\lib.c:56638)()
ValueError: Unable to parse string "32,5" at position 0如何将字符串"32,5“解析为”ContractHour“列中的浮点数?
我还试着把',‘替换成’‘。但是它会导致所有其他列消失,逗号仍然是逗号。
Users = Users['ContractHours'].replace(',', '.')结果是:
0 34
1 24
2 40
3 35
4 40
5 24
6 32
7 32
8 32
9 24
10 24
11 24
12 24
13 0
14 32
15 28
16 32
17 32
18 28
19 24
20 40
21 40
22 36
23 24
24 32,5
25 36
26 36
27 24
28 40
29 40
30 28
31 32
32 32
33 40
34 32
35 24
36 24
37 40
38 25
39 24
Name: ContractHours, dtype: object和所有其他列都消失了,32,5需要为32.5
发布于 2019-03-05 23:06:06
在read_csv中使用参数decimal进行正确的floats解析
Users = pd.read_csv("Users.csv", sep = ';', decimal=',')您的解决方案应由regex=True更改为替换为子字符串:
Users = Users['ContractHours'].replace(',', '.', regex=True).astype(float)https://stackoverflow.com/questions/55005686
复制相似问题