在Scala/Spark中,我试图做以下工作:
val portCalls_Ports =
portCalls.join(ports, portCalls("port_id") === ports("id"), "inner")但是,我得到了以下错误:
Exception in thread "main" org.apache.spark.sql.AnalysisException:
binary type expression port_id cannot be used in join conditions;确实,这是一个二进制类型:
root
|-- id: binary (nullable = false)
|-- port_id: binary (nullable = false)
.
.
.
+--------------------+--------------------+
| id| port_id|
+--------------------+--------------------+
|[FB 89 A0 FF AA 0...|[B2 B2 84 B9 52 2...|ports("id")也是。
我正在使用以下库:
scalaVersion := "2.11.11"
libraryDependencies ++= Seq(
// Spark dependencies
"org.apache.spark" %% "spark-hive" % "1.6.2",
"org.apache.spark" %% "spark-mllib" % "1.6.2",
// Third-party libraries
"postgresql" % "postgresql" % "9.1-901-1.jdbc4",
"net.sf.jopt-simple" % "jopt-simple" % "5.0.3"
)请注意,我正在使用JDBC读取数据库表。
解决这个问题的最好方法是什么?
发布于 2017-06-09 14:47:19
Pre Spark2.1.0,我所知道的最佳解决方法是使用base64函数将二进制列转换为String,并比较以下几个方面:
import org.apache.spark.sql.functions._
val portCalls_Ports =
portCalls.join(ports, base64(portCalls("port_id")) === base64(ports("id")), "inner")https://stackoverflow.com/questions/44460275
复制相似问题