我是Spark的新手。按照书中的以下示例,我发现下面的命令给出了错误。在Spark中编写代码时,运行Spark-SQL命令的最佳方式是什么?scala> // Use SQL to create another DataFrame containing the accountscala> val acSummary= spark.sql("SELECT accNo, sum(tranAmount
我对SparkSQL有个问题。我从csv文件中读取了一些数据。接下来,我执行groupBy和join操作,完成的任务是将连接的数据写入文件。18/08/07 23:39:40 INFO spark.ContextCleaner: Cleaned accumulator 1069
18/08/07 23:39:40 INFO spark.ContextCleaner/07 23:39:40 INFO spark.ContextCleaner: Cleane
BY (origin,transdate) ORDER BY cnt DESC LIMIT 1"); ` `Exception in thread "main" org.apache.spark.sql.AnalysisException$StructConverter.toCatalystImpl(CatalystTypeConverters.scala:256) org.apache.spark.sql.catalyst.CatalystTypeConverters$Stru
我想为DataFrame中具有多个值的列添加where条件。df.where($"type".==="type1" && $"status"==="completed").df.where($"type" IN ("type1","type2") && $"status" IN ("completed","inprogress")