首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用于Exchange分区的Spark Physical Plan false/true

用于Exchange分区的Spark Physical Plan false/true
EN

Stack Overflow用户
提问于 2021-01-04 03:14:44
回答 1查看 200关注 0票数 0
代码语言:javascript
复制
repartitionedDF.explain

显示了物理计划的这一点

代码语言:javascript
复制
== Physical Plan ==
Exchange hashpartitioning(purchase_month#25, 10), false, [id=#6]
+- LocalTableScan [item#23, price#24, purchase_month#25]

我注意到,在某些情况下,假也可能是真的。

这意味着?我知道一次,但已经忘记了。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2021-01-04 04:40:48

经过深入研究,我相信它指的是noUserSpecifiedNumPartition变量。如果进行重新分区,则此布尔变量将为false,因为您指定了分区的数量。否则就是true。尝试执行一个简单的orderBy,我认为您应该使用true

我发现这一点是通过

代码语言:javascript
复制
println(df.repartition('series).orderBy('series).queryExecution.executedPlan.prettyJson)

灵感来自this answer。它给出的输出为(仅截断到相关部分):

代码语言:javascript
复制
{
  "class" : "org.apache.spark.sql.execution.exchange.ShuffleExchangeExec",
  "num-children" : 1,
  "outputPartitioning" : [ {
    "class" : "org.apache.spark.sql.catalyst.plans.physical.RangePartitioning",
    "num-children" : 1,
    "ordering" : [ 0 ],
    "numPartitions" : 200
  }, {
    "class" : "org.apache.spark.sql.catalyst.expressions.SortOrder",
    "num-children" : 1,
    "child" : 0,
    "direction" : {
      "object" : "org.apache.spark.sql.catalyst.expressions.Ascending$"
    },
    "nullOrdering" : {
      "object" : "org.apache.spark.sql.catalyst.expressions.NullsFirst$"
    },
    "sameOrderExpressions" : {
      "object" : "scala.collection.immutable.Set$EmptySet$"
    }
  }, {
    "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
    "num-children" : 0,
    "name" : "series",
    "dataType" : "string",
    "nullable" : true,
    "metadata" : { },
    "exprId" : {
      "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
      "id" : 16,
      "jvmId" : "35ee1aa5-f899-4fca-a8a6-a06c3eaabe5c"
    },
    "qualifier" : [ ]
  } ],
  "child" : 0,
  "noUserSpecifiedNumPartition" : true
}, {
  "class" : "org.apache.spark.sql.execution.exchange.ShuffleExchangeExec",
  "num-children" : 1,
  "outputPartitioning" : [ {
    "class" : "org.apache.spark.sql.catalyst.plans.physical.HashPartitioning",
    "num-children" : 1,
    "expressions" : [ 0 ],
    "numPartitions" : 200
  }, {
    "class" : "org.apache.spark.sql.catalyst.expressions.AttributeReference",
    "num-children" : 0,
    "name" : "series",
    "dataType" : "string",
    "nullable" : true,
    "metadata" : { },
    "exprId" : {
      "product-class" : "org.apache.spark.sql.catalyst.expressions.ExprId",
      "id" : 16,
      "jvmId" : "35ee1aa5-f899-4fca-a8a6-a06c3eaabe5c"
    },
    "qualifier" : [ ]
  } ],
  "child" : 0,
  "noUserSpecifiedNumPartition" : false
}

其中truefalse很好地对应于物理规划:

代码语言:javascript
复制
df.repartition('series).orderBy('series).explain
== Physical Plan ==
*(1) Sort [series#16 ASC NULLS FIRST], true, 0
+- Exchange rangepartitioning(series#16 ASC NULLS FIRST, 200), true, [id=#192]
   +- Exchange hashpartitioning(series#16, 200), false, [id=#190]
      +- FileScan csv [series#16,timestamp#17,value#18] Batched: false, DataFilters: [], Format: CSV, Location: InMemoryFileIndex[file:/tmp/df.csv], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<series:string,timestamp:string,value:string>
票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/65553868

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档