首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用cloudera快速入门的Flume Avro Sink源

使用cloudera快速入门的Flume Avro Sink源
EN

Stack Overflow用户
提问于 2014-08-12 21:39:12
回答 1查看 3.4K关注 0票数 0

是否可以在Cloudera-Quickstart-CDH-VM中使用Avro Sink/Source设置Flume客户端-收集器-结构?我知道没有实际的用处,但是我想了解Flume是如何使用Avro文件的,以及我以后如何使用它们来处理PIG等。

它尝试了几种配置,但都不起作用。对我来说,我似乎需要几个代理,但是在VM中只能有一个。

我最后一次尝试:

代码语言:javascript
复制
    agent.sources = reader avro-collection-source
    agent.channels = memoryChannel memoryChannel2
    agent.sinks = avro-forward-sink hdfs-sink

  #Client
    agent.sources.reader.type = exec
    agent.sources.reader.command = tail -f /home/flume/avro/source.txt

    agent.sources.reader.logStdErr = true
    agent.sources.reader.restart = true


    agent.sources.reader.channels = memoryChannel


    agent.sinks.avro-forward-sink.type = avro
    agent.sinks.avro-forward-sink.hostname = 127.0.0.1
    agent.sinks.avro-forward-sink.port = 80


    agent.sinks.avro-forward-sink.channel = memoryChannel


    agent.channels.memoryChannel.type = memory

    agent.channels.memoryChannel.capacity = 10000
    agent.channels.memoryChannel.transactionCapacity = 100

 # Collector

    agent.sources.avro-collection-source.type = avro
    agent.sources.avro-collection-source.bind = 127.0.0.1
    agent.sources.avro-collection-source.port = 80

    agent.sources.avro-collection-source.channels = memoryChannel2

    agent.sinks.hdfs-sink.type = hdfs
    agent.sinks.hdfs-sink.hdfs.path = /var/flume/avro

    agent.sinks.hdfs-sink.channel = memoryChannel2

    agent.channels.memoryChannel2.type = memory

    agent.channels.memoryChannel2.capacity = 20000
    agent.channels.memoryChannel2.transactionCapacity = 2000

谢谢你的建议!

EN

回答 1

Stack Overflow用户

发布于 2014-09-25 20:50:18

我认为这是可以做到的。在下面给出的示例中,我使用了一个源(source1),它从一个假脱机目录源中读取数据,并将其转储到avro接收器。我有另一个源(source2),它是一个avro源,并链接到source1的avro接收器。这样你就有了你正在寻找的流。请根据您的用途修改此conf文件:

代码语言:javascript
复制
# Sources, channels, and sinks are defined per
# agent name, in this case 'tier1'.
dataplatform.sources  = source1 source2
dataplatform.channels = channel1 channel3 
dataplatform.sinks    = sink1 sink2 sink3


# For each source, channel, and sink, set standard properties.
dataplatform.sources.source1.type         = spooldir
dataplatform.sources.source1.spoolDir     = /home/flume/flume-sink-clean/
dataplatform.sources.source1.deserializer.maxLineLength = 1000000
dataplatform.sources.source1.deletePolicy = immediate
dataplatform.sources.source1.batchSize    = 10000
dataplatform.sources.source1.decodeErrorPolicy = IGNORE

# Channel Type
dataplatform.channels.channel1.type = FILE
dataplatform.channels.channel1.checkpointDir = /home/flume/flume_file_channel/dataplatform/file-channel/checkpoint
dataplatform.channels.channel1.dataDirs = /home/flume/flume_file_channel/dataplatform/file-channel/data
dataplatform.channels.channel1.write-timeout = 60
dataplatform.channels.channel1.use-fast-replay = true
dataplatform.channels.channel1.transactionCapacity = 1000000
dataplatform.channels.channel1.maxFileSize = 2146435071
dataplatform.channels.channel1.capacity = 100000000


# Describe Sink2
dataplatform.sinks.sink2.type = avro
dataplatform.sinks.sink2.hostname = 0.0.0.0
dataplatform.sinks.sink2.port = 20002
dataplatform.sinks.sink2.batch-size = 10000

# Describe source2
dataplatform.sources.source2.type = avro
dataplatform.sources.source2.bind = 0.0.0.0
dataplatform.sources.source2.port = 20002


# Channel3: Source 2 to Channel3 to Local
dataplatform.channels.channel3.type = FILE
dataplatform.channels.channel3.checkpointDir = /home/flume/flume_file_channel/local/file-channel/checkpoint
dataplatform.channels.channel3.dataDirs = /home/flume/flume_file_channel/local/file-channel/data
dataplatform.channels.channel3.transactionCapacity = 1000000
dataplatform.channels.channel3.checkpointInterval = 30000
dataplatform.channels.channel3.maxFileSize = 2146435071
dataplatform.channels.channel3.capacity = 10000000

# Describe Sink3 (Local File System)
dataplatform.sinks.sink3.type = file_roll
dataplatform.sinks.sink3.sink.directory = /home/flume/flume-sink/
dataplatform.sinks.sink3.sink.rollInterval = 60
dataplatform.sinks.sink3.batchSize = 1000

# Bind the source and sink to the channel
dataplatform.sources.source1.channels = channel1
dataplatform.sources.source2.channels = channel3
dataplatform.sinks.sink1.channel = channel1
dataplatform.sinks.sink2.channel = channel2
dataplatform.sinks.sink3.channel = channel3
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/25265804

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档