我打电话给下面的人:
propertiesDF.select(
col("timestamp"), col("coordinates")(0) as "lon",
col("coordinates")(1) as "lat",
col("properties.tide (above mllw)") as "tideAboveMllw",
col("properties.wind speed") as "windSpeed")这给了我以下错误:
org.apache.spark.sql.AnalysisException:在气温、大气压、露点、主导波周期、平均波向、名称、节目名称、有效波高、潮位(高于mllw)、能见度、水温、风向、风速等条件下,均无此类结构场潮(高于mllw);
现在肯定有这样一个结构域。(错误消息本身就是这样说的。)
下面是模式:
root
|-- timestamp: long (nullable = true)
|-- coordinates: array (nullable = true)
| |-- element: double (containsNull = true)
|-- properties: struct (nullable = true)
| |-- air temperature: double (nullable = true)
| |-- atmospheric pressure: double (nullable = true)
| |-- dew point: double (nullable = true)
.
.
.
| |-- tide (above mllw):: string (nullable = true)
.
.
.输入读入为JSON,如下所示:
val df = sqlContext.read.json(dirName)如何处理列名中的括号?
发布于 2016-08-09 16:59:41
首先,您应该避免这样的名称,但是您可以拆分访问路径:
val df = spark.range(1).select(struct(
lit(123).as("tide (above mllw)"),
lit(1).as("wind speed")
).as("properties"))
df.select(col("properties").getItem("tide (above mllw)"))
// or
df.select(col("properties")("tide (above mllw)"))或将有问题的字段用回标括起来:
df.select(col("properties.`tide (above mllw)`"))这两种解决方案都假设数据包含以下结构(基于用于查询的访问路径):
df.printSchema
// root
// |-- properties: struct (nullable = false)
// | |-- tide (above mllw): integer (nullable = false)
// | |-- wind speed: integer (nullable = false)发布于 2016-08-09 16:51:55
基于文献资料,您可以尝试使用单引号。如下所示:
propertiesDF.select(
col("timestamp"), col("coordinates")(0) as "lon",
col("coordinates")(1) as "lat",
col("'properties.tide (above mllw)'") as "tideAboveMllw",
col("properties.wind speed") as "windSpeed")https://stackoverflow.com/questions/38856240
复制相似问题