首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >无法使用Spark流将数据从TCP端口写入HDFS

无法使用Spark流将数据从TCP端口写入HDFS
EN

Stack Overflow用户
提问于 2014-06-25 21:00:52
回答 3查看 1.4K关注 0票数 1

我正在尝试从TCP端口流式传输数据,并使用Spark-Streaming将数据加载到HDFS中。

这些文件是在HDFS中创建的,但它们都是空的。但是Spark Streaming控制台显示从TCP端口读取数据。

我在CDH-5中使用Scala-Shell在Spark 0.9.0、0.9.1和1.0中尝试了这一点。我在另一个终端上做了一个'nc -lk 9993‘来传输数据。

下面是代码,请告诉我们这个问题是如何解决的。谢谢。

代码语言:javascript
复制
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.api.java.function._
import org.apache.spark.streaming._
import org.apache.spark.streaming.api._
import org.apache.spark.streaming.StreamingContext._
import StreamingContext._

val ssc8 = new StreamingContext("local", "NetworkWordCount", Seconds(1))
val lines8 = ssc8.socketTextStream("localhost", 9993)

val words8 = lines8.flatMap(_.split(" "))


val pairs8 = words8.map(word => (word, 1))
val wordCounts8 = pairs8.reduceByKey(_ + _)

wordCounts8.saveAsTextFiles("hdfs://Node1:8020/user/root/Spark8")

wordCounts8.print()

ssc8.start() 

附加

我已经提供了下面的日志和HDFS文件-

代码语言:javascript
复制
HDFS  Output Files
--------------------

-rw-r--r--   3 user1 user1          0 2014-06-26 09:19 /user/user1/SparkV/_SUCCESS
-rw-r--r--   3 user1 user1         0 2014-06-26 09:19 /user/user1/SparkV/part-00000
-rw-r--r--   3 user1 user1          0 2014-06-26 09:19 /user/user1/SparkV/part-00001




Spark-Shell Console Log
---------------------


-------------------------------------------
Time: 1403789836000 ms
-------------------------------------------
(f,3)
(fsd,2)
(sdf,2)
(fds,1)
(sd,3)

14/06/26 09:37:16 INFO scheduler.JobScheduler: Finished job streaming job 1403789836000 ms.1 from job set of time 1403789836000 ms
14/06/26 09:37:16 INFO storage.MemoryStore: ensureFreeSpace(8) called with curMem=327, maxMem=286339891
14/06/26 09:37:16 INFO storage.MemoryStore: Block input-0-1403789836000 stored as bytes to memory (size 8.0 B, free 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerInfo: Added input-0-1403789836000 in memory on localhost:49784 (size: 8.0 B, free: 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerMaster: Updated info of block input-0-1403789836000
14/06/26 09:37:16 WARN storage.BlockManager: Block input-0-1403789836000 already exists on this machine; not re-adding it
14/06/26 09:37:16 INFO receiver.BlockGenerator: Pushed block input-0-1403789836000
14/06/26 09:37:16 INFO storage.MemoryStore: ensureFreeSpace(15) called with curMem=335, maxMem=286339891
14/06/26 09:37:16 INFO storage.MemoryStore: Block input-0-1403789836200 stored as bytes to memory (size 15.0 B, free 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerInfo: Added input-0-1403789836200 in memory on localhost:49784 (size: 15.0 B, free: 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerMaster: Updated info of block input-0-1403789836200
14/06/26 09:37:16 WARN storage.BlockManager: Block input-0-1403789836200 already exists on this machine; not re-adding it
14/06/26 09:37:16 INFO receiver.BlockGenerator: Pushed block input-0-1403789836200
14/06/26 09:37:16 INFO storage.MemoryStore: ensureFreeSpace(8) called with curMem=350, maxMem=286339891
14/06/26 09:37:16 INFO storage.MemoryStore: Block input-0-1403789836400 stored as bytes to memory (size 8.0 B, free 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerInfo: Added input-0-1403789836400 in memory on localhost:49784 (size: 8.0 B, free: 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerMaster: Updated info of block input-0-1403789836400
14/06/26 09:37:16 WARN storage.BlockManager: Block input-0-1403789836400 already exists on this machine; not re-adding it
14/06/26 09:37:16 INFO receiver.BlockGenerator: Pushed block input-0-1403789836400
14/06/26 09:37:16 INFO storage.MemoryStore: ensureFreeSpace(9) called with curMem=358, maxMem=286339891
14/06/26 09:37:16 INFO storage.MemoryStore: Block input-0-1403789836600 stored as bytes to memory (size 9.0 B, free 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerInfo: Added input-0-1403789836600 in memory on localhost:49784 (size: 9.0 B, free: 273.1 MB)
14/06/26 09:37:16 INFO storage.BlockManagerMaster: Updated info of block input-0-1403789836600
14/06/26 09:37:16 WARN storage.BlockManager: Block input-0-1403789836600 already exists on this machine; not re-adding it
14/06/26 09:37:16 INFO receiver.BlockGenerator: Pushed block input-0-1403789836600
14/06/26 09:37:17 INFO storage.MemoryStore: ensureFreeSpace(14) called with curMem=367, maxMem=286339891
14/06/26 09:37:17 INFO storage.MemoryStore: Block input-0-1403789836800 stored as bytes to memory (size 14.0 B, free 273.1 MB)
14/06/26 09:37:17 INFO storage.BlockManagerInfo: Added input-0-1403789836800 in memory on localhost:49784 (size: 14.0 B, free: 273.1 MB)
14/06/26 09:37:17 INFO storage.BlockManagerMaster: Updated info of block input-0-1403789836800
14/06/26 09:37:17 WARN storage.BlockManager: Block input-0-1403789836800 already exists on this machine; not re-adding it
14/06/26 09:37:17 INFO receiver.BlockGenerator: Pushed block input-0-1403789836800
14/06/26 09:37:18 INFO scheduler.ReceiverTracker: Stream 0 received 6 blocks
14/06/26 09:37:18 INFO scheduler.JobScheduler: Added jobs for time 1403789838000 ms
EN

回答 3

Stack Overflow用户

发布于 2014-06-26 05:49:23

乍一看,我猜你应该尝试local4,而不仅仅是本地,这样Spark就可以调度更多的任务。

票数 2
EN

Stack Overflow用户

发布于 2015-01-26 23:14:23

尝试使用

wordCounts8.saveAsTextFiles("hdfs://Node1:8020/user/root/Spark8",“日志”)

==========或

在"Spark8“后面加上一些时间戳

wordCounts8.saveAsTextFiles("hdfs://Node1:8020/user/root/Spark8“+System.currentTimeMillis().toString()

==========使用Spark1.3对我来说很有效,看看它对你是否有效

票数 0
EN

Stack Overflow用户

发布于 2015-08-14 03:18:45

我也有同样的问题。

尝试运行

代码语言:javascript
复制
hadoop fs -cat hdfs://Node1:8020/user/root/Spark8

( hadoop命令对于您来说可能有所不同。对我来说,我必须使用/a/bin/hadoop访问它,但这是特定于您的设置的)

查看此命令是否返回:

代码语言:javascript
复制
cat: `hdfs://Node1:8020/user/root/Spark8': Is a directory

如果是这样,那么正如您在评论中所说的,您应该能够在该目录中看到一个_SUCCESS文件以及一些part-*文件。

在这一点上,我的问题解决了。但在写入HDFS时,您似乎还有其他问题。

至于为什么你的文件仍然是空的,我建议切换到Spark1.4.0,因为这可能在CDH5.4上工作得更好。此外,如果您在使用HDFS权限时遇到问题,则必须执行

代码语言:javascript
复制
hadoop dfs -chmod -R 0777 /your_hdfs_folder

以便具有写访问权限。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/24409389

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档