假设我有一个部分连通的图,它代表许多不相关社区的成员。我想预测同一社区成员之间可能的友谊:在0到10之间的滑动范围内,他们会喜欢对方吗?我有他们的一些特点,无论他们是基督徒,还是喜欢体育,也有一些地理特征,他们之间的距离。
他们之间的联系可能是他们是否是社交媒体平台上的朋友。在网络中,它们不一定与边缘相连。
我正在使用pytorch_geometric为每个社区构建一个图表,并在社交媒体平台上添加连接的边缘。每个方向都有一个边,所以图是双向的。然后创建Data()实例。
Data(x=x, edge_index=edge_index)其中x是一个具有节点特性和edge_index的数组。
x = array([[ 0, 4, 6, 0, 0, 1],
[ 1, 4, 6, 0, 0, 1],
[ 2, 4, 6, 0, 0, 1],
[ 3, 4, 6, 0, 1, 0],
[ 4, 4, 6, 0, 1, 0],
...])
edge_index = [[0, 1],
[0, 9],
[0, 10],
[0, 11],
[1, 2],
[1, 7],
[1, 12],
[2, 3],
[2, 6],
[2, 13],
[3, 4],
...]不知道从这里开始训练和预测人际关系的最佳途径是什么。在这种情况下,一般使用什么?文档中提到了几个选项: EdgeConv、DynamicEdgeConv、GCNCon。我不知道该先尝试什么。对于这类问题,有什么可用的吗?还是必须设置自己的MessagePassing类?
Data()接受一个参数y来对节点进行训练。对于这类问题,我可以实际使用pytorch_geometric吗?还是必须返回到pytorch?
发布于 2019-11-04 16:58:06
似乎最简单的方法是使用自动编码器模型来完成这一任务。在示例文件夹中有一个autoencoder.py,它演示了它的用法。它的要点是,它接受单个图,并试图从它学习的编码潜在空间中预测节点之间的链接(请参阅recon_loss)。这个例子是一个大图,为了我的目的,我有多个图,这意味着每个图的边被分开,并分别训练。
发布于 2022-03-25 14:32:03
以下是解决方案的粗略实现(反馈欢迎)。为了构建图形编码器,我遵循了这里的教程(https://antoniolonga.github.io/Pytorch_几何学_教程/posts/post6.html)
首先,我用一个社区结构在100个节点上绘制了一个图表。即两个紧密相连的社区。

要完成此图,我使用了以下代码
import numpy as np
import torch
import networkx as nx
from matplotlib import pylab as plt
import torch.nn.functional as F
from sklearn.metrics import roc_auc_score
import torch_geometric.transforms as T
from torch_geometric.nn import GCNConv
from torch_geometric.utils import negative_sampling
from torch_geometric.utils import train_test_split_edges
from torch_geometric.nn import GAE
import torch_geometric.data as data
from torch_geometric.utils.convert import to_networkx
import torch_geometric
# set seed for reproducibility
torch.manual_seed(1234)
np.random.seed(1234)
n_nodes = 100
tup_c1 = (0,50)
tup_c2 = (50,100)
n_edges_inter = 100
n_edges_intra = 1000
# have first 50 nodes of one type and other 50 nodes of other type
node_attr = (torch.hstack([torch.zeros(50), torch.ones(50)]))
node_attr= torch.reshape(node_attr, (n_nodes, 1))
# edges within cluster 1
rows_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra)
cols_11 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_intra)
edges_11 = torch.tensor([rows_11, cols_11])
# edges within cluster 2
rows_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra)
cols_22 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_intra)
edges_22 = torch.tensor([rows_22, cols_22])
# edges from 2-1
rows_21 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter)
cols_21 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter)
edges_21 = torch.tensor([rows_21, cols_21])
# edges from 1-2
rows_12 = np.random.choice([i for i in range(tup_c1[0], tup_c1[1])], n_edges_inter)
cols_12 = np.random.choice([i for i in range(tup_c2[0], tup_c2[1])], n_edges_inter)
edges_12 = torch.tensor([rows_12, cols_12])
# concatenate all edges
edges = torch.hstack([edges_11, edges_22, edges_21, edges_12])
# give edge weights, with inter cluster edges with less weights by a factor
factor = 1.0
edges_attr = torch.tensor(np.hstack([np.random.rand(2*n_edges_intra), factor*np.random.rand(2*n_edges_inter)]))然后我定义了一个数据集。节点特征仅仅是标识矩阵。
graph = data.Data(x=torch.eye(100), edge_index=edges, edge_attr=edges_attr)
data = train_test_split_edges(graph)然后我们定义一个GAE并训练它。
class GCNEncoder(torch.nn.Module):
def __init__(self, in_channels, out_channels):
super(GCNEncoder, self).__init__()
self.conv1 = GCNConv(in_channels, 2 * out_channels, cached=True) # cached only for transductive learning
self.conv2 = GCNConv(2 * out_channels, out_channels, cached=True) # cached only for transductive learning
# cached is useful when you have only one graph. When you have many it is less useful.
def forward(self, x, edge_index):
x = self.conv1(x, edge_index).relu()
return self.conv2(x, edge_index)
# parameters
out_channels = 2 # dimension of embedding space
num_features = 100 # identity matrix
epochs = 1000
# model
model = GAE(GCNEncoder(num_features, out_channels))
# move to GPU (if available)
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = model.to(device)
x = data.x.to(device)
train_pos_edge_index = data.train_pos_edge_index.to(device)
# inizialize the optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
def train():
model.train()
optimizer.zero_grad()
z = model.encode(x, train_pos_edge_index)
loss = model.recon_loss(z, train_pos_edge_index)
#if args.variational:
# loss = loss + (1 / data.num_nodes) * model.kl_loss()
loss.backward()
optimizer.step()
return float(loss)
def test(pos_edge_index, neg_edge_index):
model.eval()
with torch.no_grad():
z = model.encode(x, train_pos_edge_index)
return model.test(z, pos_edge_index, neg_edge_index)
for epoch in range(1, epochs + 1):
loss = train()
auc, ap = test(data.test_pos_edge_index, data.test_neg_edge_index)
if epoch % 100 == 0:
print('Epoch: {:03d}, AUC: {:.4f}, AP: {:.4f}'.format(epoch, auc, ap))
plt.imshow((z @ z.t()).detach())
plt.colorbar()
plt.title("edges probability: z @ z.T")
plt.savefig("example_out.png")
plt.show()通过解码嵌入空间,我们得到了相似的社区结构。

https://datascience.stackexchange.com/questions/56694
复制相似问题