首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用Apache ZooKeeper实现死锁检测

用Apache ZooKeeper实现死锁检测
EN

Stack Overflow用户
提问于 2012-04-24 19:44:42
回答 1查看 1.9K关注 0票数 7

我在一家小型软件公司工作,我的任务是研究一个分布式锁管理器,供我们使用。它必须同时与Java和C++接口。

我已经使用ZooKeeper工作了几个星期,根据文档实现了共享锁(读和写锁)。现在需要实现死锁检测。如果每个客户端都能维护一个锁的图表,它将是快速和容易的。然而,您无法可靠地看到ZooKeeper中节点所发生的每一个更改。,所以维护一个准确的图表是不可能的。这意味着每次我检查死锁时,都需要下载许多锁,这似乎不切实际。

另一种解决方案是在ZooKeeper服务器内实现死锁检测,我现在正在研究这个问题。每个客户端将创建一个以其会话ID命名的“/等待”节点,其数据将是其等待的锁。因为每个锁都有一个短暂的所有者,所以我将有足够的信息来检测死锁。

我遇到的问题是,ZooKeeper服务器没有ZooKeeper客户机所具有的同步保证。另外,ZooKeeper服务器并没有像客户端那样被很好地记录下来,因为您通常不应该碰它。

因此,我的问题是:如何使用Apache ZooKeeper实现死锁检测?我在这里看到许多人推荐ZooKeeper作为分布式锁管理器,但是如果它不能支持死锁检测,那么任何人都不应该为此目的使用它。

编辑:

我有个可行的解决方案。我不能保证它的正确性,但它已经通过了我所有的测试。

我共享我的checkForDeadlock方法,这是死锁检测算法的核心。以下是您需要了解的其他信息:

  • 一次只能运行一个客户端的死锁检测。
  • 首先,客户端试图获取资源的锁。如果资源已经被锁定,并且客户机希望等待直到它变得可用,那么客户机接下来将检查死锁。如果等待资源不会导致死锁,那么接下来它将在一个特殊目录中创建一个znode,该目录标识此客户端正在等待该资源。这一行看起来如下:waitNode = zooKeeper.create(waitingPath + "/" + sessionID, resource.getBytes(), ZooDefs.Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
  • 在此客户端创建等待节点之前,任何其他客户端都不应该开始检查死锁。
  • 如果两个客户端几乎同时尝试获取锁,但将两者都授予将导致死锁,那么,与第一个客户端获得锁和第二个客户端被拒绝相比,第一个客户端可能被拒绝,而第二个客户端可能获得锁。这不应该是问题。
  • 如果checkForDeadlock发现死锁,它将抛出一个DeadlockException。否则,它将正常返回。
  • 严格按照顺序授予锁。如果资源具有已授予的读锁和等待写入锁,而另一个客户端希望获得读锁,则必须等到授予写锁后才释放。
  • bySequenceNumber是一个比较器,它根据ZooKeeper附加到序列znodes末尾的序列对znodes进行排序。

代码:

代码语言:javascript
复制
private void checkForDeadlock(String pathToResource) throws DeadlockException {
    // Algorithm:
    //   For each client who holds a lock on this resource:
    //     If this client is me, announce deadlock.
    //     Otherwise, if this client is waiting for a reserved resource, recursively check for deadlock on that resource.
    try {
        List<String> lockQueue = zooKeeper.getChildren(pathToResource, false); // Last I checked, children is implemented as an ArrayList.
        // lockQueue is the list of locks on this resource.
        // FIXME There is a slight chance that lockQueue could be empty.
        Collections.sort(lockQueue, bySequenceNumber);
        ListIterator<String> lockQueueIterator = lockQueue.listIterator();
        String grantedLock = lockQueueIterator.next(); // grantedLock is one lock on this resource.
        do {
            // lockQueue must contain a write lock, because there is a lock waiting.
            String lockOwner = null;
            try {
                lockOwner = Long.toString(zooKeeper.exists(pathToResource + "/" + grantedLock, false).getEphemeralOwner());
                // lockOwner is one client who holds a lock on this resource.
            }
            catch (NullPointerException e) {
                // Locks may be released while I'm running deadlock detection. I got a NullPointerException because
                // the lock I was currently looking at was deleted. Since the lock was deleted, its owner was obviously
                // not part of a deadlock. Therefore I can ignore this lock and move on to the next one.
                // (Note that a lock can be deleted if and only if its owner is not part of a deadlock.) 
                continue;
            }
            if (lockOwner.equals(sessionID)) { // If this client is me.
                throw new DeadlockException("Waiting for this resource would result in a deadlock.");
            }
            try {
                // XXX: Is is possible that reservedResource could be null?
                String reservedResource = new String(zooKeeper.getData(waitingPath + "/" + lockOwner, false, new Stat()));
                // reservedResource is the resource that this client is waiting for. If this client is not waiting for a resource, see exception.
                // I only recursively check the next reservedResource if I havn't checked it before.
                // I need to do this because, while I'm running my deadlock detection, another client may attempt to acquire
                // a lock that would cause a deadlock. Without this check, I would loop in that deadlock cycle indefinitely.
                if (checkedResources.add(reservedResource)) {
                    checkForDeadlock(reservedResource); // Depth-first-search
                }
            }
            catch (KeeperException.NoNodeException e) {
                // lockOwner is not waiting for a resource.
            }
            catch (KeeperException e) {
                e.printStackTrace(syncOut);
            }
            // This loop needs to run for each lock that is currently being held on the resource. There are two possibilities:
            // A. There is exactly one write lock on this resource. (Any other locks would be waiting locks.)
            //      In this case, the do-while loop ensures that the write lock has been checked.
            //      The condition that requires that the current lock is a read lock ensures that no locks after the write lock will be checked.
            // B. There are one or more read locks on this resource.
            //      In this case, I just check that the next lock is a read lock before moving on.
        } while (grantedLock.startsWith(readPrefix) && (grantedLock = lockQueueIterator.next()).startsWith(readPrefix));
    }
    catch (NoSuchElementException e) {
        // The condition for the do-while loop assumes that there is a lock waiting on the resource.
        // This assumption was made because a client just reported that it was waiting on the resource.
        // However, there is a small chance that the client has since gotten the lock, or even released it before
        // we check the locks on the resource.
        // FIXME (This may be a problem.)
        // In such a case, the childrenIterator.next() call could throw a NoSuchElementException.
        // We can safely assume that we are finished searching this branch, and therefore return.
    }
    catch (KeeperException e) {
        e.printStackTrace(syncOut);
    }
    catch (InterruptedException e) {
        e.printStackTrace(syncOut);
    }

}
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-04-27 05:25:45

您需要做两件事来进行死锁检测,一种是锁所有者列表,另一种是标准zk锁方提供给您的锁等待者列表,只要您将某种节点id写入所创建的z节点即可。

你不需要看到动物园管理员的每一个变化来检测死锁。死锁不会出现,很快就会消失。通过定义,死锁会一直存在,直到你对它做些什么。因此,如果您编写代码以便客户端监视他们感兴趣的每个锁节点,客户端最终会看到每个锁的所有者和等待者,而客户机将看到死锁。

不过,你得小心点。客户端可能不会按顺序查看更新,因为在客户端重新注册手表时可能会发生更新。因此,如果客户端确实检测到死锁,则客户端应该通过重新读取死锁中涉及的锁的所有者/观察者,再次检查死锁是否是真实的,并确保死锁是真实的。

票数 2
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/10304948

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档