我的应用程序在使用pickle序列化字典列表(CSV数据)时挂起。使用常规的Python解释器没有任何问题。我在Win32上使用Python2.7,PyPy 2.6.0。
以下是我对应用程序执行Ctrl+C命令时的输出:
Traceback (most recent call last):
File "<builtin>/app_main.py", line 75, in run_toplevel
File ".\Da-Lite\dalite_build_script.py", line 167, in <module>
pickle.dump(data_sheets, fo)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 1413, in dump
Pickler(file, protocol).dump(obj)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 224, in dump
self.save(obj)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 653, in save_dict
self._batch_setitems(obj.iteritems())
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 667, in _batch_setitems
save(v)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 653, in save_dict
self._batch_setitems(obj.iteritems())
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 667, in _batch_setitems
save(v)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 600, in save_list
self._batch_appends(iter(obj))
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 615, in _batch_appends
save(x)
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 653, in save_dict
self._batch_setitems(obj.iteritems())
File "H:\Developer\Python\pypy-2.6.0-win32\lib-python\2.7\pickle.py", line 665, in _batch_setitems
for k, v in items:
KeyboardInterrupt使用Pickle对程序来说不是必须的,但如果有一个相对简单的解决方案来克服这个问题,它会让我的生活变得更容易。
发布于 2015-08-05 16:14:21
这里的答案很简单-- pypy上的pickle比较慢,因为它是用纯python实现的,而不是用CPython中的C实现的
发布于 2015-08-05 19:21:54
如果你有一个字典列表,一件非常简单的事情就是将列表分成几个部分,并将列表的每个部分dump到一个不同的文件中。如下所示:
>>> d1 = dict(zip(range(10),range(10)))
>>> d2 = dict(zip(range(10,20),range(10,20)))
>>> d3 = dict(zip(range(20,30),range(20,30)))
>>> d4 = dict(zip(range(30,40),range(30,40)))
>>> x = [d1,d2,d3,d4]
>>> fnames = ['a.pik', 'b.pik', 'c.pik', 'd.pik']
>>>
>>> import pathos
>>> p = pathos.pools.ProcessPool()
>>>
>>> def dump(data, fname):
... import dill
... with open(fname, 'w') as f:
... dill.dump(data, f)
... return
...
>>> r = p.uimap(dump, x, fnames)
>>> # no need to do this, but just FYI, it returns nothing
>>> list(r)
[None, None, None, None]
>>> 需要注意的一件事是,我使用了一个名为multiprocess的multiprocessing分支,它由pathos…使用它提供了一个可以接受多个参数的multiprocessing map,减少了启动map的一些开销,并且具有比pickle更好的序列化功能。
我之所以使用uimap,是因为我不关心保持返回值的顺序(返回值是None)。但根据大小的不同,您可能会尝试使用线程池,甚至可以使用itertools中的imap。
>>> pathos.pools.ProcessPool
<class 'pathos.multiprocessing.ProcessPool'>
>>> pathos.pools.ThreadPool
<class 'pathos.threading.ThreadPool'>
>>> pathos.pools.SerialPool
<class 'pathos.serial.SerialPool'>注:我是pathos的作者。我知道它可以在标准的python中工作。然而,目前我不能确认它是否能在PyPy中工作。我已经有人尝试过了,并制作了支持PyPy的补丁,但我不在PyPy…中进行测试所以你必须试一试,找出答案。如果pathos在PyPy…上不起作用然后,您必须修改dump函数,使其只接受一个参数,或者确保您使用的是itertools.imap。无论如何,将列表分成几个块,然后在不同的进程/线程/其他任何地方序列化这些块,这是我的主要观点。
https://stackoverflow.com/questions/31797226
复制相似问题