我尝试使用apache_beam.io.fileio模块来读取文件lines.txt并将其合并到我的流水线中。
lines.txt包含以下内容:
line1
line2
line3当我运行以下管道代码时:
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)我得到以下输出:
<apache_beam.io.fileio.ReadableFile object at 0x000001A8C6C55F08>我期望的是
line1
line2
line3我如何才能得到我期望的结果?
发布于 2020-07-30 05:11:34
中生成的PCollection
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()是一个ReadableFile对象。为了访问这个对象,我们可以使用apache beam pydoc中记录的各种函数。
下面我们实现read_utf8()
with beam.Pipeline(options=pipeline_options) as p:
lines = (
p
| beam.io.fileio.MatchFiles(file_pattern="lines.txt")
| beam.io.fileio.ReadMatches()
| beam.Map(lambda file: file.read_utf8())
)
# print file contents to screen
lines | 'print to screen' >> beam.Map(print)我们得到了预期的结果:
line1
line2
line3https://stackoverflow.com/questions/63162487
复制相似问题