class Download(Task):
date_interval = DateIntervalParameter()
def output(self):
return LocalTarget("data/user_{0}.tar.bz2".format(self.date_interval))
def run(self):
#import pdb; pdb.set_trace()
SENTENCE_URL = 'http://downloads.org/exports/user_lists.tar.bz2'
sentence_file = download(SENTENCE_URL, out=self.output().path)
class Uncompress(Task):
date_interval = DateIntervalParameter()
def output(self):
return LocalTarget("data/user_{0}.tar".format(self.date_interval))
def requires(self):
return Download(self.date_interval)
def run(self):
with open(self.output().path, 'wb') as tar_file, open(self.input().path, 'rb') as file:
decompressor = BZ2Decompressor()
#loop over each tar file in the bzip file
for data in iter(lambda : file.read(100 * 1024), b''):
tar_file.write(decompressor.decompress(data))我的第一个任务是从互联网上下载一个文件,下一个任务是解压缩它。我将要编写的下一个任务将读取tar文件中的CSV文件,并将其解析为多个文件。即data/file_{var}、data/file_{var2}..但我认为任务3需要有一个日期间隔才能传递给其他任务。
有没有办法绕过这一点,或者有更好的方法来组织我的任务?
发布于 2017-04-06 20:56:09
有几件事你可以做。来自文档:http://luigi.readthedocs.io/en/stable/parameters.html
Parameters are resolved in the following order of decreasing priority:
1. Any value passed to the constructor, or task level value set on the command line (applies on an instance level)
2. Any value set on the command line (applies on a class level)
3. Any configuration option (applies on a class level)
4. Any default value provided to the parameter (applies on a class level)在命令行中,您可以执行以下操作:
luigi Uncompress --Download-dateinverval 2017-02-03将参数传递给层次结构中的其他任务。
https://stackoverflow.com/questions/42760051
复制相似问题