我正在尝试从tiff图像中获取海量高程数据,我有一个csv文件。csv文件内容也有纬度、经度等属性。循环通过csv文件,获取纬度和经度,并调用elevation方法,代码如下。参考RasterFrames extracting location information problem
package main.scala.sample
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql._
import org.apache.spark.sql.functions._
import org.locationtech.rasterframes._
import org.locationtech.rasterframes.datasource.raster._
import org.locationtech.rasterframes.encoders.CatalystSerializer._
import geotrellis.raster._
import geotrellis.vector.Extent
import org.locationtech.jts.geom.Point
import org.apache.spark.sql.functions.col
object SparkSQLExample {
def main(args: Array[String]) {
implicit val spark = SparkSession.builder()
.master("local[*]").appName("RasterFrames")
.withKryoSerialization.getOrCreate().withRasterFrames
spark.sparkContext.setLogLevel("ERROR")
import spark.implicits._
val example = "https://raw.githubusercontent.com/locationtech/rasterframes/develop/core/src/test/resources/LC08_B7_Memphis_COG.tiff"
val rf = spark.read.raster.from(example).load()
val rf_value_at_point = udf((extentEnc: Row, tile: Tile, point: Point) => {
val extent = extentEnc.to[Extent]
Raster(tile, extent).getDoubleValueAtPoint(point)
})
val spark_file:SparkSession = SparkSession.builder()
.master("local[1]")
.appName("SparkByExamples")
.getOrCreate()
spark_file.sparkContext.setLogLevel("ERROR")
println("spark read csv files from a directory into RDD")
val rddFromFile = spark_file.sparkContext.textFile("point.csv")
println(rddFromFile.getClass)
def customF(str: String): String = {
val lat = str.split('|')(2).toDouble;
val long = str.split('|')(3).toDouble;
val point = st_makePoint(long, lat)
val test = rf.where(st_intersects(rf_geometry(col("proj_raster")), point))
.select(rf_value_at_point(rf_extent(col("proj_raster")), rf_tile(col("proj_raster")), point) as "value")
return test.toString()
}
val rdd2=rddFromFile.map(f=> customF(f))
rdd2.foreach(t=>println(t))
spark.stop()
}
}当我运行获取空指针异常时,任何帮助都很感谢
java.lang.NullPointerException
at org.apache.spark.sql.Dataset.<init>(Dataset.scala:182)
at org.apache.spark.sql.Dataset$.apply(Dataset.scala:64)
at org.apache.spark.sql.Dataset.withTypedPlan(Dataset.scala:3416)
at org.apache.spark.sql.Dataset.filter(Dataset.scala:1490)
at org.apache.spark.sql.Dataset.where(Dataset.scala:1518)
at main.scala.sample.SparkSQLExample$.main$scala$sample$SparkSQLExample$$customF$1(SparkSQLExample.scala:49)发布于 2021-07-15 01:57:49
正在RDD (customF)上映射的函数不为空安全。尝试调用customF(null),看看会发生什么。如果抛出异常,则必须确保rddFromFile不包含任何空值/缺失值。
如果这就是问题所在,那就有点难说了。我认为异常的堆栈跟踪没有通常的帮助,因为函数是在工作进程上的spark任务中运行的。
如果这就是问题所在,您可以重写customF来处理str为null的情况,或者将参数类型更改为Option[String] (并相应地调整逻辑)。
顺便说一句,UDF也有同样的特性。他们需要选择
Option类型作为输入null或https://stackoverflow.com/questions/68382546
复制相似问题