AptaTrans is a deep learning framework for predicting aptamer-protein interactions (API) using transformer-based encoders
AptaTrans 是一个深度学习框架,用于使用基于 transformer 的编码器预测适配子-蛋白质相互作用 (API)
It aims to address the limitations of the conventional SELEX method for aptamer discovery, which is time-consuming and has limited success rates
它旨在解决传统 SELEX 方法在适配体发现方面的局限性,该方法耗时且成功率有限
AptaTrans uses pretrained encoders to handle aptamer and protein sequences at the monomer level and capture the physicochemical interactions between them
AptaTrans 使用预训练的编码器在单体水平上处理适配体和蛋白质序列,并捕获它们之间的物理化学相互作用
The model outperformed existing API prediction methods when evaluated on a benchmark dataset
在基准数据集上进行评估时,该模型的性能优于现有的 API 预测方法
The AptaTrans pipeline integrates with a generative algorithm, Apta-MCTS, to recommend aptamer candidates
AptaTrans 管道与生成算法 Apta-MCTS 集成,以推荐候选适配子
The authors expect AptaTrans to enhance the cost-effectiveness and efficiency of SELEX in drug discovery
作者预计 AptaTrans 将提高 SELEX 在药物发现中的成本效益和效率
AptaTrans 模型的关键组成部分
The key components of the AptaTrans model are: AptaTrans 模型的关键组件是:
Transformer-based encoders: AptaTrans uses transformer-based encoders, Encoderapta(·) and Encoderprot(·), to transform the tokenized aptamer and protein sequences into contextual vector representations. 1基于 transformer 的编码器:AptaTrans 使用基于 transformer 的编码器 Encoderapta(·) 和 Encoderprot(·),将标记化的适配子和蛋白质序列转换为上下文向量表示。1
Tokenization algorithms: AptaTrans employs tokenization algorithms to convert the aptamer and protein sequences into numerical representations. Specifically, it uses 3-mers for aptamer sequences and frequent contiguous substrings (FCS) for protein sequences. 1分词算法:AptaTrans 采用分词算法将适配子和蛋白质序列转换为数字表示。具体来说,它对适配子序列使用 3-mer,对蛋白质序列使用频繁的连续子串 (FCS)。
K-mer FCS
Interaction matrix: AptaTrans creates an interaction matrix by computing the dot products of the pairs between the aptamer and protein token embedding vectors. This interaction matrix serves as a feature map for the downstream layers.
交互矩阵:AptaTrans 通过计算适配体和蛋白质标记嵌入向量之间对的点积来创建交互矩阵。此交互矩阵用作下游层的特征图。
Convolutional layers: The model uses convolutional layers to extract information from the interaction matrix.
卷积层:该模型使用卷积层从交互矩阵中提取信息。
Fully connected layer: AptaTrans uses a fully connected layer to predict the binding scores between the aptamer and protein. 1全连接层:AptaTrans 使用全连接层来预测适配子和蛋白质之间的结合分数。1
Pretraining: To ensure optimal sequence embeddings, the transformer-based encoders are pretrained using self-supervised learning strategies that utilize the predictions of masked tokens and the secondary structures of the molecules. 1预训练:为了确保最佳的序列嵌入,基于 transformer 的编码器使用自监督学习策略进行预训练,这些策略利用掩蔽标记和分子二级结构的预测。1