下面的JSON文件包含6行:
[
{"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:12 EST","n":"est"}]],
"apps":[],
"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},
"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"12","n":"cpu"},{"v":"154665","n":"seq"},{"v":"2016-08-24 14:23:17 EST","n":"est"}]
},
{"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:14 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"5","n":"cpu"},{"v":"154666","n":"seq"},{"v":"2016-08-24 14:23:23 EST","n":"est"}]},
{"events":[[{"v":"LOGOFF","n":"type"},{"v":"2016-08-24 14:24:04 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"0","n":"cpu"},{"v":"154667","n":"seq"},{"v":"2016-08-24 14:24:05 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"O","n":"state"},{"v":"5376","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"29","n":"cpu"},{"v":"154668","n":"seq"},{"v":"2016-09-25 16:57:24 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"16","n":"cpu"},{"v":"154669","n":"seq"},{"v":"2016-09-25 16:57:30 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"17","n":"cpu"},{"v":"154670","n":"seq"},{"v":"2016-09-25 16:57:36 EST","n":"est"}]}
]JSON如下所示:
JSON
0
1
2
3
4
5所需输出:
Count
6发布于 2018-10-01 12:26:38
好的,你在Spark中,你需要把你的Json转换成dataset,并对它进行适当的操作。因此,在这里,我编写了从Json到dataset的工作流程,并举例说明了所需的步骤。我认为这种回答方式更有好处,因为您可以看到步骤,然后您可以决定如何处理信息。
病例分类部门(名称:字符串,地址:字符串)病例分类医生(名称:字符串,部门:部门)
因此,正如您从上面的代码中看到的,我自下而上地创建了我想要处理的数据。在you Json中,有很多字段(例如,v),我无法理解其背后的含义。所以要小心,不要把它们混在一起。
spark.read.json("doctorsData.json).asDoctor
有几点。spark是一个spark会话,您需要创建它。这里它的实例是spark,它可以是任何东西。您还需要import spark.implicits._。
count()来计算数据集的问题。THe下面的方法展示了如何计算它:myDataset.count() recordsCount(myDataset: DatasetDoctor):Long = def
发布于 2018-10-01 18:20:18
我有一个包含三条记录的文件,格式正确,Spark 2.x,正在读取数据帧/数据集:
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._
val df = spark.read
.option("multiLine", true)
.option("mode", "PERMISSIVE")
.option("inferSchema", true)
.json("/FileStore/tables/json_01.txt")
df.select("*").show(false)
df.printSchema()
df.count()如果只是总计数,那么这就足够了,最后一行。
res15: Long = 3https://stackoverflow.com/questions/52581658
复制相似问题