全,
我正在尝试为spark创建UDF,它将用于生成每行唯一的ID。为了确保唯一性,我所依赖的是: ID生成器将采用“时间戳( bigint )的时代值+”作为参数传递的唯一源ID + randomNumber 5位数。
我有两个问题:
在id生成函数"idGenerator"
error:Type mismatch;
found : String(SRC_")
required : org.apache.spark.sql.Column
df.withColumn("rowkey",SequenceGeneratorUtil.GenID("SRC_") ) 请提供任何指针..。
Object SequenceGeneratorUtil extends Serializable {
val random = new scala.util.Random
val start = 10000
val end = 99999
//CustomEpochGenerator - this is custom function to generate the epoch value for current timestamp in milliseconds
// ID Generator will take "epoch value of timestamp ( bigint ) + "unique Source ID passed as argument + randomNumber 5 digit
def idGenerator(SrcIdentifier: String ): String = SrcIdentifier + CustomEpochGenerator.nextID.toString + (start + random.nextInt((end - start) + 1)).toString // + monotonically_increasing_id ( not working )
val GenID = udf[String, String](idGenerator __)
}
val df2 = df.withColumn("rowkey",SequenceGeneratorUtil.GenID("SRC_") ) 发布于 2020-11-12 14:48:42
职能以下的变化
def idGenerator(SrcIdentifier: String ): String = SrcIdentifier + CustomEpochGenerator.nextID.toString + (start + random.nextInt((end - start) + 1)).toString // + monotonically_increasing_id ( not working )在下面的函数中,在mId中添加idGenerator附加参数以保存monotonically_increasing_id值。
def idGenerator(SrcIdentifier: String,mId: Long): String = SrcIdentifier + CustomEpochGenerator.nextID.toString + (start + random.nextInt((end - start) + 1)).toString + mIdudf以下的变化
val GenID = udf[String, String](idGenerator __)至
val GenID = udf(idGenerator _)失败:错误:类型错配;查找:String(SRC_)必需: org.apache.spark.sql.Column df.withColumn("rowkey",SequenceGeneratorUtil.GenID("SRC_") )
因为SequenceGeneratorUtil.GenID需要org.apache.spark.sql.Column类型的值,但是传递值时,SRC_是String类型的。
若要解决此问题,请使用lit函数。
df.withColumn("rowkey",SequenceGeneratorUtil.GenID(lit("SRC_")) )withColumn以下的变化
val df2 = df.withColumn("rowkey",SequenceGeneratorUtil.GenID("SRC_") ) 至
val df2 = df
.withColumn(
"rowkey",
SequenceGeneratorUtil.GenID(
lit("SRC_"), // using lit function to pass static string.
monotonically_increasing_id
)
) https://stackoverflow.com/questions/64804129
复制相似问题