首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >使用多台计算机执行失败

使用多台计算机执行失败
EN

Stack Overflow用户
提问于 2012-03-01 13:44:11
回答 1查看 461关注 0票数 1

与我的工作相关(这里是列表),我现在正在测试执行以下简单代码:

  • 初始化boost-mpi环境。
  • 将图形从文件加载到分布式adjacency_list中
  • 最后,在每台机器上对其执行两个简单的操作:计数边缘数和计算聚类系数。

下面是代码:

代码语言:javascript
复制
#include "Common.h"
#include "GraphFileReader.h"
#include "GraphNeighbors.h"
#include <boost/graph/metis.hpp>
#include <boost/mpi/environment.hpp>
#include <boost/mpi/communicator.hpp>
#include <time.h>

int main(int argc, char *argv []){

    // Start mpi enviroment
    boost::mpi::environment env(argc, argv);
    boost::mpi::communicator world;

    // Create the graph
    GraphFileReader *graphFileReader;
    undirectedAdjacencyList graph;

    if(process_id(graph.process_group()) == 0){
            // Load the graph's path
            graphFileReader = new GraphFileReader(argv[1]);
            // Read the graph file and adds the vertices and edges
            graphFileReader->loadGraph(graph);
    }

    // Wait until the process 0 has finished loading the graph
    world.barrier();
    synchronize(graph.process_group());

    GraphNeighbors graphNeighbors;

    // Now each machine should process it's own graph piece
    graphNeighbors.countEdges(graph);
    graphNeighbors.clusteringCoefficient(graph);

    // Wait for the other processes before finishing
    world.barrier();
    synchronize(graph.process_group());
    cout << "\n process: " << world.rank() <<" finishing\n" << std::endl;

结果是:

代码语言:javascript
复制
graphs: /usr/include/boost/graph/distributed/adjacency_list.hpp:2679: 
std::pair<typename boost::adjacency_list<OutEdgeListS, boost::distributedS<ProcessGroup, 
InVertexListS, InDistribution>, DirectedS, VertexProperty, EdgeProperty, GraphProperty,
 EdgeListS>::out_edge_iterator, typename boost::adjacency_list<OutEdgeListS, 
boost::distributedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, 
VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::out_edge_iterator> 
boost::out_edges(typename boost::adjacency_list<OutEdgeListS, 
boost::distributedS<ProcessGroup, InVertexListS, InDistribution>, DirectedS, 
VertexProperty, EdgeProperty, GraphProperty, EdgeListS>::vertex_descriptor, const 
boost::adjacency_list<OutEdgeListS, boost::distributedS<ProcessGroup, InVertexListS, 
InDistribution>, DirectedS, VertexProperty, EdgeProperty, GraphProperty, EdgeListS>&) [with
 OutEdgeListS = boost::vecS, ProcessGroup = boost::graph::distributed::mpi_process_group,
 InVertexListS = boost::vecS, InDistribution = boost::defaultS, DirectedS = 
boost::undirectedS, VertexProperty = Node, EdgeProperty = boost::no_property, GraphProperty
= boost::no_property, EdgeListS = boost::listS]: Assertion `v.owner == g.processor()' failed.

_________________________________________________________________

I'm process: 0    
I'm process: 1

Number of edges: 4
        0.37694 milliseconds
Number of edges: 2
        0.16284 milliseconds
rank 1 in job 1  compute-1-4_49342   caused collective abort of all ranks
  exit status of rank 1: killed by signal 6
_________________________________________________________________
Epilogue Args:
Job  ID:        138573.tucan
User ID:        ***
Group ID:       ***
Job Name:       mpiGraphs.job
Resource List:  5746
Queue Name:     ncpus=1,neednodes=2:ppn=2,nodes=2:ppn=2
Account String: cput=00:00:00,mem=420kb,vmem=13444kb,walltime=00:00:02
Date:           Thu Mar  1 14:28:19 CET 2012
_________________________________________________________________

另一方面,只有一台机器的执行工作得很好:

代码语言:javascript
复制
I'm process: 0

Number of edges: 6
        8.46696 milliseconds
The network average clustering coefficient is: 0.53333
        0.12708 milliseconds


 process: 0 finishing

我的导师和我认为这可能是因为一台机器结束了,而另一台机器还在执行它的操作,所以我们添加了同步和屏障(我实际上不知道两者之间的区别,所以我测试了几个具有相同结果的组合)。

如果您需要剩下的代码(Common.h、GraphFileReader或GraphNeighbors),我可以上传它并在这里发布链接,以避免发布大量的帖子。

EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2012-03-01 14:52:40

由于您正在考虑同步错误,所以我将简化您得到的错误消息:

图:(boost)adjacency_list.hpp:2679: boost::out_edges(vertex_descriptor v,adjacency_list& g):断言`v.owner == g.processor()‘failed。 1级出口状态:被6号信号打死

信号6由abort()触发,而后者又是由上述断言失败触发的。

我对这个图库一无所知,但根据list.hpp的说法,您的处理器1似乎正在调用out_edges并传递一个属于处理器0的顶点v

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/9517306

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档