文章/答案/技术大牛

发布

社区首页 >问答首页 >在spark sql中运行涉及圆形函数的查询时出错

问在spark sql中运行涉及圆形函数的查询时出错
EN

Stack Overflow用户

提问于 2018-10-08 15:57:37

回答 1查看 1.7K关注 0票数 1

我正试图通过将表的一列舍入到同一表中的另一列所指定的精度(例如从下表中)获得一个新列：

+--------+--------+
|    Data|Rounding|
+--------+--------+
|3.141592|       3|
|0.577215|       1|
+--------+--------+

我应该能够得到以下结果：

+--------+--------+--------------+
|    Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592|       3|         3.142|
|0.577215|       1|           0.6|
+--------+--------+--------------+

特别是，我尝试了以下代码：

import pandas as pd
from pyspark.sql import SparkSession
from pyspark.sql.types import (
  StructType, StructField, FloatType, LongType, 
  IntegerType
)

pdDF = pd.DataFrame(columns=["Data", "Rounding"], data=[[3.141592, 3], 
   [0.577215, 1]])

mySchema = StructType([ StructField("Data", FloatType(), True), 
StructField("Rounding", IntegerType(), True)])

spark = (SparkSession.builder
    .master("local")
    .appName("column rounding")
    .getOrCreate())

df = spark.createDataFrame(pdDF,schema=mySchema)

df.show()

df.createOrReplaceTempView("df_table")


df_rounded = spark.sql("SELECT Data, Rounding, ROUND(Data, Rounding) AS Rounded_Column FROM df_table")

df_rounded .show()

但我得到了以下错误：

raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: u"cannot resolve 'round(df_table.`Data`, df_table.`Rounding`)' due to data type mismatch: Only foldable Expression is allowed for scale arguments; line 1 pos 23;\n'Project [Data#0, Rounding#1, round(Data#0, Rounding#1) AS Rounded_Column#12]\n+- SubqueryAlias df_table\n   +- LogicalRDD [Data#0, Rounding#1], false\n"

如能提供任何帮助，将不胜感激:)

apache-spark

apache-spark-sql

pyspark-sql

回答 1

Stack Overflow用户

回答已采纳

发布于 2018-10-08 18:00:02

使用spark，催化剂将在运行- Only foldable Expression is allowed for scale arguments中抛出以下错误

即@param scale new scale to be round to, this should be a constant int at runtime

圆周率只期望比例尺为文字。您可以尝试编写自定义代码，而不是星星之火sql方式。

编辑：

与UDF，

val df = Seq(
  (3.141592,3),
  (0.577215,1)).toDF("Data","Rounding")

df.show()
df.createOrReplaceTempView("df_table")

import org.apache.spark.sql.functions._
def RoundUDF(customvalue:Double, customscale:Int):Double = BigDecimal(customvalue).setScale(customscale, BigDecimal.RoundingMode.HALF_UP).toDouble
spark.udf.register("RoundUDF", RoundUDF(_:Double,_:Int):Double)

val df_rounded = spark.sql("select Data, Rounding, RoundUDF(Data, Rounding) as Rounded_Column from df_table")
df_rounded.show()

输入：

    +--------+--------+
    |    Data|Rounding|
    +--------+--------+
    |3.141592|       3|
    |0.577215|       1|
    +--------+--------+

输出：

+--------+--------+--------------+
|    Data|Rounding|Rounded_Column|
+--------+--------+--------------+
|3.141592|       3|         3.142|
|0.577215|       1|           0.6|
+--------+--------+--------------+

票数 2

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/52706055

复制

相似问题

问在spark sql中运行涉及圆形函数的查询时出错
EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在spark sql中运行涉及圆形函数的查询时出错EN

回答 1

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问在spark sql中运行涉及圆形函数的查询时出错
EN