我有一台16 Gb内存的MacBook (MacOSX10.9)。通过Anaconda安装的两个Python: 2.7.8和3.4.1。两者都配备了最新的scikit-learn 0.15.1。同时尝试运行简单的代码(只是测试序列化大型矩阵的可能性):
import numpy as np
test_data = np.random.rand(10000, 60000)
print(test_data.nbytes / 2**30)
from sklearn.externals import joblib
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')Python 2.7.8做得很好,但Python 3.4.1坚持以下错误:
Failed to save <class 'numpy.ndarray'> to .npy file:
Traceback (most recent call last):
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 240, in save
obj, filename = self._write_array(obj, filename)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 203, in _write_array
self.np.save(filename, array)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/npyio.py", line 453, in save
format.write_array(fid, arr)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/numpy/lib/format.py", line 410, in write_array
fp.write(array.tostring('C'))
OSError: [Errno 22] Invalid argument
Traceback (most recent call last):
File "<ipython-input-3-90ed09e5c6d4>", line 1, in <module>
joblib.dump(test_data, '/Users/va/Desktop/test_data.joblib')
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/site-
packages/sklearn/externals/joblib/numpy_pickle.py", line 368, in dump
pickler.dump(value)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 412, in dump
self.framer.end_framing()
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 196, in end_framing
self.commit_frame(force=True)
File "/Users/va/anaconda/python.app/Contents/lib/python3.4/pickle.py", line 208, in commit_frame
write(data)
OSError: [Errno 22] Invalid argument问题似乎出在要存储的数据量上。例如,Python3可以很好地处理1.5G的np.random.rand(10000,20000)。
以防万一,pickle的效果不是很好:
import pickle
with open('/Users/va/Desktop/test_data.pkl', 'wb') as f:
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)转至:
Traceback (most recent call last):
File "<ipython-input-6-3f73f3011539>", line 3, in <module>
pickle.dump(test_data, f, protocol=pickle.HIGHEST_PROTOCOL)
OSError: [Errno 22] Invalid argument在Windows7上,Python3.4可以很好地与joblib和pickle配合使用。
有什么建议可以在Mac上用Python3解决这个问题吗?
发布于 2015-05-10 04:13:13
在安装了Python3.4.3的OS X 10.10上使用pickle时也会发生这种情况
取而代之的是,我开始使用https://github.com/zopefoundation/zodbpickle,它的速度大约慢2-3倍,但绝对适用于sklearn分类器
https://stackoverflow.com/questions/25301958
复制相似问题