我想得到一些关于以下不相交集的实现的反馈,基于不相交集林(Cormen,et.al.,Introduction to Algorithms,第二版,p.505 of )。它应该具有以下属性:
T的不相交集。add、union、find任何一般性的意见都是受欢迎的,但我特别怀疑以下几点:
add和size)MatchError:处理这类事情的最佳实践是什么?import scala.annotation.tailrec
/**
* This class implements a disjoint-sets algorithm with
* union-by-rank and path compression. The forest/sets/etc. are
* internal data structures not exposed to the outside. Instead,
* it is possible to add new elements (which end up in a newly
* created set), union two elements, and hence, their sets, and
* find the representative of a disjoint-set by a given element of
* the set.
*/
class DisjointSets[T](initialElements : Seq[T] = Nil) {
/**
* Add a new single-node forest to the disjoint-set forests. It will
* be placed into its own set.
*/
def add(elem : T) : Unit = synchronized {
nodes += (elem -> DisjointSets.Node(elem, 0, None))
}
/**
* Union the disjoint-sets of which <code>elem1</code>
* and <code>elem2</code> are members of.
* @return the representative of the unioned set
* @precondition elem1 and elem2 must have been added before
*/
def union(elem1 : T, elem2 : T) : T = synchronized {
// lookup elements
require(nodes.contains(elem1) && nodes.contains(elem2),
"Only elements previously added to the disjoint-sets can be unioned")
// retrieve representative nodes
(nodes.get(elem1).map(_.getRepresentative),
nodes.get(elem2).map(_.getRepresentative)) match {
// Distinguish the different union cases and return the new set representative
// Case #1: both elements already in same set
case (Some(n1), Some(n2)) if n1 == n2 =>
n1.elem
// Case #2: rank1 > rank2 -> make n1 parent of n2
case (Some(n1 @ DisjointSets.Node(_, rank1, _)),
Some(n2 @ DisjointSets.Node(_, rank2, _))) if rank1 > rank2 =>
n2.parent = Some(n1)
n1.elem
// Case #3: rank1 < rank2 -> make n2 parent of n1
case (Some(n1 @ DisjointSets.Node(_, rank1, _)),
Some(n2 @ DisjointSets.Node(_, rank2, _))) if rank1 < rank2 =>
n1.parent = Some(n2)
n2.elem
// Case #4: rank1 == rank2 -> keep n1 as representative and increment rank
case (Some(n1 @ DisjointSets.Node(_, rank1, _)),
Some(n2 @ DisjointSets.Node(_, rank2, _))) /*if rank1 == rank2*/ =>
n1.rank = rank1 + 1
n2.parent = Some(n1)
n1.elem
// we are guaranteed to find the two nodes in the map,
// and the above cases cover all possibilities
case _ => throw new MatchError("This should never have happened")
}
}
/**
* Finds the representative for a disjoint-set, of which
* <code>elem</code> is a member of.
*/
def find(elem : T) : Option[T] = synchronized {
nodes.get(elem) match {
case Some(node) =>
val rootNode = node.getRepresentative
// path compression
if (node != rootNode) node.parent = Some(rootNode)
Some(rootNode.elem)
case None => None
}
}
/**
* Returns the number of disjoint-sets managed in this data structure.
* Keep in mind: this is a non-vital/non-standard operation, so we do
* not keep track of the number of sets, and instead this method recomputes
* them each time.
*/
def size : Int = synchronized {
nodes.values.count(_.parent == None)
}
////
// Internal parts
private val nodes : scala.collection.mutable.Map[T, DisjointSets.Node[T]] =
scala.collection.mutable.Map.empty
// Initialization
synchronized { initialElements map (add _) }
}
object DisjointSets {
def apply[T](initialElements : Seq[T] = Nil) = new DisjointSets[T](initialElements)
////
// Internal parts
private case class Node[T](val elem : T, var rank : Int, var parent : Option[Node[T]]) {
/**
* Compute representative of this set.
* @return root element of the set
*/
@tailrec
final def getRepresentative: Node[T] = parent match {
case None => this
case Some(p) => p.getRepresentative
}
}
}发布于 2013-07-01 17:56:15
我有一种感觉,你的find是次优。我认为您应该将整个路径压缩到根,而不仅仅是单个节点的父节点。因此,您应该首先找到根节点,然后再遍历路径,并将所有节点的parent设置为Some(rootNode)。
关于MatchErrors,我也许会用
def union(elem1 : T, elem2 : T) : T =
// retrieve representative nodes
(nodes.get(elem1).map(_.getRepresentative),
nodes.get(elem2).map(_.getRepresentative)) match {
case (Some(n1), Some(n2)) if n1 == n2 => // ...
// ...
case _ => // one of the values is None
require(false,
"Only elements previously added to the disjoint-sets can be unioned")
}关于同步,我会让用户选择(S)他是否想要同步实现。我定义了所有操作的一个特性(分离合同总是一个好主意):
trait DisjointSets[T] {
def add(elem : T) : Unit;
def union(elem1 : T, elem2 : T) : T;
def find(elem : T) : Option[T];
def size : Int;
}然后就像
class DisjointSetsImpl[T] private (initialElements : Seq[T] = Nil)
extends DisjointSets[T]
{
// ...
}然后是一个包装器(要么显式的,要么就在伙伴对象的方法中)。
class DisjointSetSync[T](val underlying: DisjointSet[T])
extends DisjointSet[T]
{
override def add(elem: T) : Unit = synchronized {
underlying.add(elem);
}
// etc.
}您需要同步所有公共操作,包括add和size (考虑一个线程正在读取size,而另一个线程正在使用add向集合中添加一个新元素)。
这样,用户就可以使用更快的、默认的、非同步的实现,并在需要时将其包装到同步的实现中。在大多数情况下,用户无论如何都需要自己的同步,因为他们通常使用其他同步原语(例如锁或信号量),或者需要一次同步多个操作。
另外,您不需要同步初始化代码,例如
// Initialization
synchronized { initialElements map (add _) }因为在对象初始化时,还没有人引用它,所以其他线程不可能并发访问它。
我可能会为size使用一个不同的名称,类似于setCount,因为它很容易与元素的数量而不是不相交的集合数目混淆。您还可以通过统计不同集合的数量,并在每个成功的size之后将其减少,从而使union更快。(我隐约记得这个操作对于某些算法是有用的,但我不记得是哪个算法。)
您还可以考虑扩展Scala的标准特性(在重命名size之后),以使类更加有用:
Growable (这将用+=代替add )。Traversable,它将遍历所有集合中的所有元素。PartialFunction[T,T]来查找节点的代表(如果它在集合中--这将是find的同义词)。这不会让你付出任何代价,有时也是有用的。
已经有了像您这样的实现的请求,也许您可以考虑将它作为一个小型库发布到某个地方。
发布于 2013-07-01 20:32:31
除了Petr的精彩评论外:
我认为,通过将父对象作为NodeT的字段,您可以获得一些存储效率,并减少临时对象的数量。您只需使用以下方法初始化它(这是正确的初始值)。
object DisjointSets {
def apply[T](initialElements : Seq[T] = Nil) = new DisjointSets[T](initialElements)
////
// Internal parts
private case class Node[T](val elem : T, var rank : Int) {
var parent: Node[T] = this
/**
* Compute representative of this set.
* @return root element of the set
*/
@tailrec
final def getRepresentative: Node[T] =
if(parent eq this) this else parent.getRepresentative
}
}
}关于同步:我不会费心。将其完全排除在代码之外,并在文档中提出一个警告,即这件事并不是线程安全的。当人们想要在多线程上下文中使用它时,他们可能必须设置一个锁,因为他们有需要保持同步的多个数据结构。在scala中,可变数据结构主要来自安全上下文,例如参与者的内部状态。对于这个用例,默认情况下同步只会增加额外的开销。
https://codereview.stackexchange.com/questions/17621
复制相似问题