
Transformer架构核心是输入输出编码器、多头注意力机制和前馈神经网络,前面介绍了编码器和注意力机制,本文通过前馈神经网络,将两者串联起来,实现一个完整的GPT模型。前馈神经网络(Feedforward Neural Network, FNN)是神经网络中最基础的结构,数据从输入层出发,经过隐藏层的处理,最终到达输出层,整个过程没有反馈循环。它解决了各层维度不一致的问题,实现输入到输出的拟合,下面我们隐藏注意力机制等复杂结构实现一个没有内容的GPT模型,一个GPT模型的配置如下
GPT_CONFIG_124M = {
"vocab_size": 50257, # Vocabulary size
"context_length": 1024, # Context length
"emb_dim": 768, # Embedding dimension
"n_heads": 12, # Number of attention heads
"n_layers": 12, # Number of layers
"drop_rate": 0.1, # Dropout rate
"qkv_bias": False # Query-Key-Value bias
}模型会根据这些参数构建神经网络,首先是编码层,包括了内容本身的编码和内容位置编码,两者叠加,然后是随机的Dropout防止过拟合。接着经过Transformer层的处理,通常包含 6 到 12 个 Block(如 GPT-2 有 12 层),每层包含自注意力机制、前馈神经网络和残差连接。然后正则化,对模型输出的最终特征进行归一化处理,通过调整输入分布(均值为0,方差为1)加速收敛,提升模型训练稳定性。最后将 Transformer 模块的最终特征(维度 emb_dim)映射到词汇表大小(vocab_size),输出每个词的概率分布(logits)。有了这个概率我们就可以进行文本生成。
import torch
import torch.nn as nn
class GPTModel(nn.Module):
def __init__(self, cfg):
super().__init__()
self.tok_emb = nn.Embedding(cfg["vocab_size"], cfg["emb_dim"])
self.pos_emb = nn.Embedding(cfg["context_length"], cfg["emb_dim"])
self.drop_emb = nn.Dropout(cfg["drop_rate"])
# Use a placeholder for TransformerBlock
self.trf_blocks = nn.Sequential(
*[TransformerBlock(cfg) for _ in range(cfg["n_layers"])])
# Use a placeholder for LayerNorm
self.final_norm = LayerNorm(cfg["emb_dim"])
self.out_head = nn.Linear(
cfg["emb_dim"], cfg["vocab_size"], bias=False
)
def forward(self, in_idx):
batch_size, seq_len = in_idx.shape
tok_embeds = self.tok_emb(in_idx)
pos_embeds = self.pos_emb(torch.arange(seq_len, device=in_idx.device))
x = tok_embeds + pos_embeds
x = self.drop_emb(x)
x = self.trf_blocks(x)
x = self.final_norm(x)
logits = self.out_head(x)
return logits
class TransformerBlock(nn.Module):
def __init__(self, cfg):
super().__init__()
# A simple placeholder
def forward(self, x):
# This block does nothing and just returns its input.
return x
class LayerNorm(nn.Module):
def __init__(self, normalized_shape, eps=1e-5):
super().__init__()
# The parameters here are just to mimic the LayerNorm interface.
def forward(self, x):
# This layer does nothing and just returns its input.
return x有了模型输出的参数,我们就可以用这个模型来生成文本,模型算出概率后,我们提取最后一个词的概率,然后通过softmax进行归一化处理,最后通过argmax找到概率最大的位置,有了概率最大的位置,我们就可以到词典反查,得到输出
def generate_text_simple(model, idx, max_new_tokens, context_size):
# idx is (batch, n_tokens) array of indices in the current context
for _ in range(max_new_tokens):
# Crop current context if it exceeds the supported context size
# E.g., if LLM supports only 5 tokens, and the context size is 10
# then only the last 5 tokens are used as context
idx_cond = idx[:, -context_size:]
# Get the predictions
with torch.no_grad():
logits = model(idx_cond)
# Focus only on the last time step
# (batch, n_tokens, vocab_size) becomes (batch, vocab_size)
logits = logits[:, -1, :]
# Apply softmax to get probabilities
probas = torch.softmax(logits, dim=-1) # (batch, vocab_size)
# Get the idx of the vocab entry with the highest probability value
idx_next = torch.argmax(probas, dim=-1, keepdim=True) # (batch, 1)
# Append sampled index to the running sequence
idx = torch.cat((idx, idx_next), dim=1) # (batch, n_tokens+1)
return idx我们使用模型进行词的生成
import tiktoken
tokenizer = tiktoken.get_encoding("gpt2")
start_context = "Hello, I am"
encoded = tokenizer.encode(start_context)
print("encoded:", encoded)
encoded_tensor = torch.tensor(encoded).unsqueeze(0)
print("encoded_tensor.shape:", encoded_tensor.shape)
model.eval() # disable dropout
out = generate_text_simple(
model=model,
idx=encoded_tensor,
max_new_tokens=6,
context_size=GPT_CONFIG_124M["context_length"]
)
print("Output:", out)
print("Output length:", len(out[0]))
decoded_text = tokenizer.decode(out.squeeze(0).tolist())
print(decoded_text)输出如下:
encoded: [15496, 11, 314, 716] encoded_tensor.shape: torch.Size([1, 4]) Output: tensor([[15496, 11, 314, 716, 27018, 24086, 47843, 30961, 42348, 7267]]) Output length: 10 Hello, I am Featureiman Byeswickattribute argue至此我们完成了,模型构建到文本预测的过程。但是其中有一个问题还没有解决,那就是如何训练模型,得到模型参数。我们下一章进行分解。
本文分享自 golang算法架构leetcode技术php 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!