我尝试通过从数据帧中选择小时+分钟/60和其他列来创建新的数据帧,如下所示:
val logon11 = logon1.select("User","PC","Year","Month","Day","Hour","Minute",$"Hour"+$"Minute"/60)我得到的错误如下:
<console>:38: error: overloaded method value select with alternatives:
(col: String,cols: String*)org.apache.spark.sql.DataFrame <and>
(cols: org.apache.spark.sql.Column*)org.apache.spark.sql.DataFrame
cannot be applied to (String, String, String, String, String, String, String,org.apache.spark.sql.Colum)
...也许我知道原因是我不能同时使用"select“获得这些类型的DataFrame。那我怎么才能得到这样的数据帧呢?
发布于 2017-02-12 14:30:39
您可以使用withColumn从现有列创建新列,也可以根据以下条件创建新列
val logon1 = Seq(("User1","PC1",2017,2,12,12,10)).toDF("User","PC","Year","Month","Day","Hour","Minute")
val logon11 = logon1.withColumn("new_col", $"Hour"+$"Minute"/60)
logon11.printSchema()
logon11.show输出:
root
|-- User: string (nullable = true)
|-- PC: string (nullable = true)
|-- Year: integer (nullable = false)
|-- Month: integer (nullable = false)
|-- Day: integer (nullable = false)
|-- Hour: integer (nullable = false)
|-- Minute: integer (nullable = false)
|-- new_col: double (nullable = true)
+-----+---+----+-----+---+----+------+------------------+
| User| PC|Year|Month|Day|Hour|Minute| new_col|
+-----+---+----+-----+---+----+------+------------------+
|User1|PC1|2017| 2| 12| 12| 10|12.166666666666666|
+-----+---+----+-----+---+----+------+------------------+发布于 2017-02-12 12:56:47
DF的select方法接受类型为全String或全org.apache.spark.sql.Column的参数,但不能同时接受这两种类型的参数。
在本例中,您同时将String和Column类型参数传递给select方法。
val logon11 = logon1.select($"User",$"PC",$"Year",$"Month",$"Day",$"Hour",$"Minute",$"Hour"+$"Minute"/60 as "total_hours")希望它能帮上忙!
https://stackoverflow.com/questions/42184191
复制相似问题