更新
托马什的回答解决了这个问题。然而,试图在我的实际问题中使用它会导致引用列名的问题、数据丢失时的问题等等。为了规避这一问题,我在评论中使用了CJR的建议,只是简单地选择了我的DataFrames。
下面的原始问题
我记忆中有一只熊猫的DataFrame。我希望能够将其写入文件(使用to_csv),然后使用read_csv将结果读入新的DataFrame。我希望原始DataFrame和新的"from DataFrame“具有相同的数据类型。
我试图通过为quoting和read_csv使用quotechar和to_csv参数来实现这一点。然而,这似乎不起作用。
我理解对于read_csv,dtype参数可以用来强制数据类型,但这对我的用例来说并不实用(很多用于回归测试的自动生成的文件)。
下面是完整的例子。
tmp.py
import pandas as pd
from csv import QUOTE_NONNUMERIC
import sys
print('Python version information:')
print(sys.version)
print('Pandas version information:')
print(pd.__version__)
df1 = pd.DataFrame([['A', '100', 100], ['B', '200', 200]])
print('df1:')
print(df1.info())
df1.to_csv('tmp.csv', index=False, quoting=QUOTE_NONNUMERIC,
quotechar='"')
df2 = pd.read_csv('tmp.csv', quoting=QUOTE_NONNUMERIC, quotechar='"')
print('df2:')
print(df2.info())运行tmp.py输出
Python version information:
3.7.3 (default, Jun 11 2019, 01:11:15)
[GCC 6.3.0 20170516]
Pandas version information:
0.24.2
df1:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null object
2 2 non-null int64
dtypes: int64(1), object(2)
memory usage: 128.0+ bytes
None
df2:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null float64
2 2 non-null float64
dtypes: float64(2), object(1)
memory usage: 128.0+ bytes
Noneobject,dtype都是DataFrames。df1,dtype是object,而对于df2,dtype是float64。df1有dtype int64,而df2有dtype float64。正如csv模块所描述的,csv.QUOTE_NONNUMERIC“指示读取器将所有非引用字段转换为float类型。”tmp.csv的内容如下。注意,第二列被引用为,所以我希望read_csv给我一个对象。
tmp.csv
0,1,2
"A","100",100
"B","200",200发布于 2019-07-10 17:11:15
尝试在读取时使用QUOTE_NONE,这样可以保留读/写之间的数据类型。
使用带有int64**:**的原始数据集的
import pandas as pd
from csv import QUOTE_NONNUMERIC, QUOTE_NONE
import sys
print('Python version information:')
print(sys.version)
print('Pandas version information:')
print(pd.__version__)
df1 = pd.DataFrame([['A', '100', 100], ['B', '200', 200]])
print('df1:')
print(df1.info())
df1.to_csv('tmp.csv', index=False, quoting=QUOTE_NONNUMERIC, quotechar='"')
df2 = pd.read_csv('tmp.csv', quoting=QUOTE_NONE).replace('"','', regex=True)
print('df2:')
print(df2.info())结果:
Python version information:
3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
Pandas version information:
0.24.2
df1:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null object
2 2 non-null int64
dtypes: int64(1), object(2)
memory usage: 128.0+ bytes
None
df2:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null object
2 2 non-null int64
dtypes: int64(1), object(2)
memory usage: 128.0+ bytes
None在输入中使用float64 值的:
import pandas as pd
from csv import QUOTE_NONNUMERIC, QUOTE_NONE, QUOTE_MINIMAL
import sys
print('Python version information:')
print(sys.version)
print('Pandas version information:')
print(pd.__version__)
df1 = pd.DataFrame([['A', '100', 100.1], ['B', '200', 200.2]])
print('df1:')
print(df1.info())
df1.to_csv('tmp.csv', index=False, quoting=QUOTE_NONNUMERIC, quotechar='"')
df2 = pd.read_csv('tmp.csv', quoting=QUOTE_NONE).replace('"','', regex=True)
print('df2:')
print(df2.info())结果:
Python version information:
3.6.6 |Anaconda, Inc.| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
Pandas version information:
0.24.2
df1:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null object
2 2 non-null float64
dtypes: float64(1), object(2)
memory usage: 128.0+ bytes
None
df2:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2 entries, 0 to 1
Data columns (total 3 columns):
0 2 non-null object
1 2 non-null object
2 2 non-null float64
dtypes: float64(1), object(2)
memory usage: 128.0+ bytes
Nonehttps://stackoverflow.com/questions/56975273
复制相似问题