首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >如何计算JSON文件的行数?

如何计算JSON文件的行数?
EN

Stack Overflow用户
提问于 2018-10-01 04:03:33
回答 2查看 1.9K关注 0票数 1

下面的JSON文件包含6行:

代码语言:javascript
复制
[
    {"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:12 EST","n":"est"}]],
     "apps":[],
     "agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},
     "header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"12","n":"cpu"},{"v":"154665","n":"seq"},{"v":"2016-08-24 14:23:17 EST","n":"est"}]
    },
{"events":[[{"v":"INPUT","n":"type"},{"v":"2016-08-24 14:23:14 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"5","n":"cpu"},{"v":"154666","n":"seq"},{"v":"2016-08-24 14:23:23 EST","n":"est"}]},
{"events":[[{"v":"LOGOFF","n":"type"},{"v":"2016-08-24 14:24:04 EST","n":"est"}]],"apps":[],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.1.18","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"0","n":"cpu"},{"v":"154667","n":"seq"},{"v":"2016-08-24 14:24:05 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"O","n":"state"},{"v":"5376","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"29","n":"cpu"},{"v":"154668","n":"seq"},{"v":"2016-09-25 16:57:24 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"16","n":"cpu"},{"v":"154669","n":"seq"},{"v":"2016-09-25 16:57:30 EST","n":"est"}]},
{"events":[],"apps":[[{"v":"ccSvcHst","n":"pname"},{"v":"7704","n":"pid"},{"v":"Old Virus Definition File","n":"title"},{"v":"F","n":"state"},{"v":"5588","n":"mem"},{"v":"0","n":"cpu"}]],"agent":{"calls":[],"info":[{"v":"7990994","n":"agentid"},{"v":"7999994","n":"stationid"}]},"header":[{"v":"TUSTX002LKVT1JN","n":"host"},{"v":"192.168.0.5","n":"ip"},{"v":"V740723","n":"vzid"},{"v":"16.3.16.0","n":"version"},{"v":"17","n":"cpu"},{"v":"154670","n":"seq"},{"v":"2016-09-25 16:57:36 EST","n":"est"}]}
]

JSON如下所示:

代码语言:javascript
复制
JSON
0
1
2
3
4
5

所需输出:

代码语言:javascript
复制
Count
6
EN

回答 2

Stack Overflow用户

发布于 2018-10-01 12:26:38

好的,你在Spark中,你需要把你的Json转换成dataset,并对它进行适当的操作。因此,在这里,我编写了从Json到dataset的工作流程,并举例说明了所需的步骤。我认为这种回答方式更有好处,因为您可以看到步骤,然后您可以决定如何处理信息。

  1. Input Data:你有了Json,那就是你应该开始处理的数据。然后,您需要决定哪些字段是重要的。在大多数情况下,它只是很小的一部分,你不想加载所有可能不必要的字段。
  2. Create a Case类:您可以使用case类,因为这样您就可以序列化输入数据。为了简单起见,我有一个隶属于某个部门的医生,我在Json中获取数据。我可以有以下案例类:

病例分类部门(名称:字符串,地址:字符串)病例分类医生(名称:字符串,部门:部门)

因此,正如您从上面的代码中看到的,我自下而上地创建了我想要处理的数据。在you Json中,有很多字段(例如,v),我无法理解其背后的含义。所以要小心,不要把它们混在一起。

  • Have a dataaset:好的,下面的代码将Json序列化为我们定义的case类:

spark.read.json("doctorsData.json).asDoctor

有几点。spark是一个spark会话,您需要创建它。这里它的实例是spark,它可以是任何东西。您还需要import spark.implicits._

  • In商业!:好了,现在你已经进入了商业领域,进入了火花世界。这就是使用count()来计算数据集的问题。THe下面的方法展示了如何计算它:

myDataset.count() recordsCount(myDataset: DatasetDoctor):Long = def

票数 1
EN

Stack Overflow用户

发布于 2018-10-01 18:20:18

我有一个包含三条记录的文件,格式正确,Spark 2.x,正在读取数据帧/数据集:

代码语言:javascript
复制
import org.apache.spark.sql.Column
import org.apache.spark.sql.functions._

val df = spark.read
        .option("multiLine", true)
        .option("mode", "PERMISSIVE")
        .option("inferSchema", true)
        .json("/FileStore/tables/json_01.txt")

df.select("*").show(false)
df.printSchema()
df.count()

如果只是总计数,那么这就足够了,最后一行。

代码语言:javascript
复制
res15: Long = 3
票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/52581658

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档