首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >DBSCAN聚类算法不能正常工作。我做错了什么?

DBSCAN聚类算法不能正常工作。我做错了什么?
EN

Stack Overflow用户
提问于 2013-04-04 04:09:48
回答 1查看 3.4K关注 0票数 0

我试图编写DBSCAN算法来对一组点进行聚类,但是我得到的结果确实很糟糕。这可能是因为这些数据,但并不仅仅是这样。我得到了大小< minPoints的集群,这是不应该发生的。

我做错了什么?我已经看过代码很多次了,我不知道问题出在哪里。

我引用了在DBSCAN维基百科页面上给出的算法。

代码语言:javascript
复制
private static int[] dbScan(String[] points, int epsilon, int minPts) {
    int cluster = 0;
    // visited stores if point has been visited
    boolean[] visited = new boolean[points.length];
    // pointsCluster stores which cluster a point has been assigned to
    int[] pointsCluster = new int[points.length];
    for(int iii = 0; iii < points.length; iii++) {
        // if point iii is already visited, do nothing  
        if(visited[iii]) continue;                      
        visited[iii] = true;    // mark point iii as visited
        // get points in neighborhood of point iii
        HashSet<Integer> neighbors = epsilonNeighbors(points, iii, epsilon);    
        if(neighbors.size() < minPts) {
            // if number of neighbors < minPts, mark point iii as noise
            pointsCluster[iii] = -1;
        } else {
            ++cluster;                      // else, start new cluster
            expandCluster(points, iii, neighbors, pointsCluster, visited, cluster, epsilon, minPts);
        }
    }
    return pointsCluster;
}

/*
 * Expands a cluster if a point is not a noise point
 * and has > minPts in its epsilon neighborhood
 */
private static void expandCluster(String[] points, int seedPoint, HashSet<Integer> neighbors,
        int[] pointsCluster, boolean[] visited, int cluster, int epsilon, int minPts) {

    pointsCluster[seedPoint] = cluster;     //assign cluster to seed point
    // create queue to process neighbors
    Queue<Integer> seeds = new LinkedList<Integer>();
    seeds.addAll(neighbors);
    while(!seeds.isEmpty()) {
        int currentPoint = (Integer) seeds.poll();
        if(!visited[currentPoint]) {
            visited[currentPoint] = true;       // mark neighbor as visited
            // get neighbors of this currentPoint
            HashSet<Integer> currentNeighbors = epsilonNeighbors(points, currentPoint, epsilon);
            // if currentPoint has >= minPts in neighborhood, add those points to the queue
            if(currentNeighbors.size() >= minPts) {
                seeds.addAll(currentNeighbors);
            }
        }
        // if currentPoint has not been assigned a cluster, assign it to the current cluster
        if(pointsCluster[currentPoint] == 0) pointsCluster[currentPoint] = cluster;
    }
}

/*
 * Returns a HashSet containing the indexes of points which are
 * in the epsilon neighborhood of the point at index == currentPoint
 */
private static HashSet<Integer> epsilonNeighbors(String[] points, int currentPoint, int epsilon) {
    HashSet<Integer> neighbors = new HashSet<Integer>();
    String protein = points[currentPoint];
    for(int iii = 0; iii < points.length; iii++) {
        int score = similarity(points[iii], points[jjj]);
        if(score >= epsilon) neighbors.add(iii);
    }
    return neighbors;
}
EN

回答 1

Stack Overflow用户

发布于 2013-04-04 13:34:59

当您的结果不好时,可能是因为您的数据不好(用于基于密度的群集),或者是因为您的参数不好。

事实上,DBSCAN可以产生比minPts更小的集群,如果它们彼此接触的话。然后,他们就可以“窃取”彼此的边境点。

使用例如埃尔基来验证算法输出如何?

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/15802365

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档