如何查看Spark 3 (3.0.0-preview2)中的分区过滤器和推送过滤器?
在Spark 2中,explain方法输出的细节如下:
== Physical Plan ==
Project [first_name#12, last_name#13, country#14]
+- Filter (((isnotnull(country#14) && isnotnull(first_name#12)) && (country#14 = Russia)) && StartsWith(first_name#12, M))
+- FileScan csv [first_name#12,last_name#13,country#14]
Batched: false,
Format: CSV,
Location: InMemoryFileIndex[file:/Users/powers/Documents/tmp/blog_data/people.csv],
PartitionFilters: [],
PushedFilters: [IsNotNull(country), IsNotNull(first_name), EqualTo(country,Russia), StringStartsWith(first_name,M)],
ReadSchema: struct这可以让您轻松地识别PartitionFilters和PushedFilters。
在Spark3中,解释要少得多,即使设置了extended参数:
val path = new java.io.File("./src/test/resources/person_data.csv").getCanonicalPath
val df = spark.read.option("header", "true").csv(path)
df
.filter(col("person_country") === "Cuba")
.explain("extended")下面是输出:
== Parsed Logical Plan ==
'Filter ('person_country = Cuba)
+- RelationV2[person_name#115, person_country#116] csv file:/Users/matthewpowers/Documents/code/my_apps/mungingdata/spark3/src/test/resources/person_data.csv
== Analyzed Logical Plan ==Only 18s
person_name: string, person_country: string
Filter (person_country#116 = Cuba)
+- RelationV2[person_name#115, person_country#116] csv file:/Users/matthewpowers/Documents/code/my_apps/mungingdata/spark3/src/test/resources/person_data.csv
== Optimized Logical Plan ==
Filter (isnotnull(person_country#116) AND (person_country#116 = Cuba))
+- RelationV2[person_name#115, person_country#116] csv file:/Users/matthewpowers/Documents/code/my_apps/mungingdata/spark3/src/test/resources/person_data.csv
== Physical Plan ==
*(1) Project [person_name#115, person_country#116]
+- *(1) Filter (isnotnull(person_country#116) AND (person_country#116 = Cuba))
+- BatchScan[person_name#115, person_country#116] CSVScan Location: InMemoryFileIndex[file:/Users/matthewpowers/Documents/code/my_apps/mungingdata/spark3/src/test/re..., ReadSchema: struct<person_name:string,person_country:string>有没有办法查看Spark 3中的分区过滤器和推送过滤器?
发布于 2020-04-19 17:55:37
这看起来像是一个在四月底修复的bug。谓词下推的JIRA是SPARK-30475,分区下推的JIRA是SPARK-30428。
你能检查一下你的Spark版本中是否包含这个补丁吗?
https://stackoverflow.com/questions/61296310
复制相似问题