我使用下面的代码和嵌套生成器来迭代文本文档,并使用get_train_minibatch()返回培训示例。我想坚持(腌制)生成器,这样我就可以回到文本文档中的相同位置。但是,不能对生成器进行腌制。
get_train_example()成为一个单独的人,这样我就没有几台发电机了。然后,我可以在这个模块中创建一个全局变量,以跟踪get_train_example()的发展。[编辑:还有两个想法:
]
以下是代码:
def get_train_example():
for l in open(HYPERPARAMETERS["TRAIN_SENTENCES"]):
prevwords = []
for w in string.split(l):
w = string.strip(w)
id = None
prevwords.append(wordmap.id(w))
if len(prevwords) >= HYPERPARAMETERS["WINDOW_SIZE"]:
yield prevwords[-HYPERPARAMETERS["WINDOW_SIZE"]:]
def get_train_minibatch():
minibatch = []
for e in get_train_example():
minibatch.append(e)
if len(minibatch) >= HYPERPARAMETERS["MINIBATCH SIZE"]:
assert len(minibatch) == HYPERPARAMETERS["MINIBATCH SIZE"]
yield minibatch
minibatch = []发布于 2009-12-21 18:11:33
下面的代码应该多做或少做你想做的事情。第一个类定义了一些类似文件的内容,但是可以被腌制。(当您解锁它时,它会重新打开文件,并试图找到您对其进行腌制时的位置)。第二类是生成word窗口的迭代器。
class PickleableFile(object):
def __init__(self, filename, mode='rb'):
self.filename = filename
self.mode = mode
self.file = open(filename, mode)
def __getstate__(self):
state = dict(filename=self.filename, mode=self.mode,
closed=self.file.closed)
if not self.file.closed:
state['filepos'] = self.file.tell()
return state
def __setstate__(self, state):
self.filename = state['filename']
self.mode = state['mode']
self.file = open(self.filename, self.mode)
if state['closed']: self.file.close()
else: self.file.seek(state['filepos'])
def __getattr__(self, attr):
return getattr(self.file, attr)
class WordWindowReader:
def __init__(self, filenames, window_size):
self.filenames = filenames
self.window_size = window_size
self.filenum = 0
self.stream = None
self.filepos = 0
self.prevwords = []
self.current_line = []
def __iter__(self):
return self
def next(self):
# Read through files until we have a non-empty current line.
while not self.current_line:
if self.stream is None:
if self.filenum >= len(self.filenames):
raise StopIteration
else:
self.stream = PickleableFile(self.filenames[self.filenum])
self.stream.seek(self.filepos)
self.prevwords = []
line = self.stream.readline()
self.filepos = self.stream.tell()
if line == '':
# End of file.
self.stream = None
self.filenum += 1
self.filepos = 0
else:
# Reverse line so we can pop off words.
self.current_line = line.split()[::-1]
# Get the first word of the current line, and add it to
# prevwords. Truncate prevwords when necessary.
word = self.current_line.pop()
self.prevwords.append(word)
if len(self.prevwords) > self.window_size:
self.prevwords = self.prevwords[-self.window_size:]
# If we have enough words, then return a word window;
# otherwise, go on to the next word.
if len(self.prevwords) == self.window_size:
return self.prevwords
else:
return self.next()发布于 2009-12-21 11:09:34
您可以创建一个标准迭代器对象,但它不会像生成器那样方便;您需要将迭代器的状态存储在instace上(以便对其进行腌制),并定义一个next()函数来返回下一个对象:
class TrainExampleIterator (object):
def __init__(self):
# set up internal state here
pass
def next(self):
# return next item here
pass迭代器协议很简单,定义对象上的.next()方法就是将它传递给循环等。
在Python3中,迭代器协议使用__next__方法(稍微一致一些)。
发布于 2009-12-21 17:34:21
对于您来说,这可能不是一个选项,但在某些条件下,无堆栈Python (http://stackless.com)确实允许您对函数和生成器之类的东西进行筛选。这将起作用:
在foo.py中:
def foo():
with open('foo.txt') as fi:
buffer = fi.read()
del fi
for line in buffer.split('\n'):
yield line在foo.txt中:
line1
line2
line3
line4
line5在翻译中:
Python 2.6 Stackless 3.1b3 060516 (python-2.6:66737:66749M, Oct 2 2008, 18:31:31)
IPython 0.9.1 -- An enhanced Interactive Python.
In [1]: import foo
In [2]: g = foo.foo()
In [3]: g.next()
Out[3]: 'line1'
In [4]: import pickle
In [5]: p = pickle.dumps(g)
In [6]: g2 = pickle.loads(p)
In [7]: g2.next()
Out[7]: 'line2'需要注意的是:您必须缓冲文件的内容,并删除文件对象。这意味着文件的内容将在泡菜中复制。
https://stackoverflow.com/questions/1939015
复制相似问题