NanoGPT Chinese Lyrics
A GPT-2 Small style language model trained on Chinese song lyrics for text generation.
Model Description
- Architecture: GPT-2 Small (Decoder-only Transformer)
- Parameters: ~85M
- Embedding Dimension: 768
- Attention Heads: 12
- Layers: 12
- Context Length: 1024 tokens
- Tokenizer: SentencePiece BPE (vocab_size=10000)
- Position Encoding: RoPE (Rotary Position Embedding)
Training Data
Trained on approximately 100,000 Chinese song lyrics (~20M tokens).
Usage
Requirements
pip install torch sentencepiece huggingface_hub
Download Model
from huggingface_hub import snapshot_download
# 下载所有文件到本地目录
snapshot_download(repo_id="yiwenX/nanogpt-chinese-lyrics", local_dir="./nanogpt-lyrics")
或使用命令行:
huggingface-cli download yiwenX/nanogpt-chinese-lyrics --local-dir ./nanogpt-lyrics
Inference
import os
import torch
import sentencepiece as spm
# 切换到模型目录
os.chdir("./nanogpt-lyrics")
from model import GPTLanguageModel
# Load model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
checkpoint = torch.load('best.pt', map_location=device, weights_only=False)
model = GPTLanguageModel(checkpoint['config']['vocab_size']).to(device)
model.load_state_dict(checkpoint['model'])
model.eval()
# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load('chinese_lyrics_bpe.model')
# Generate text
prompt = "今夜我"
input_ids = torch.tensor([sp.encode(prompt)], dtype=torch.long, device=device)
with torch.no_grad():
output_ids = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)
generated_text = sp.decode(output_ids[0].tolist())
print(generated_text)
Command Line
cd nanogpt-lyrics
python inference.py --prompt "今夜我" --max_tokens 200
Files
| File | Description |
|---|---|
best.pt |
Model weights (PyTorch checkpoint) |
chinese_lyrics_bpe.model |
SentencePiece tokenizer model |
chinese_lyrics_bpe.vocab |
Tokenizer vocabulary |
model.py |
Model architecture definition |
tokenizer.py |
Tokenizer utilities |
config.py |
Model configuration |
inference.py |
Inference script |
Model Architecture
Input -> Token Embedding (RoPE in Attention)
|
+-------------------+
| Transformer | x 12 layers
| Block |
| +-------------+ |
| | LayerNorm | |
| | Multi-Head | |
| | Attention | |
| | + Residual | |
| +-------------+ |
| | LayerNorm | |
| | FeedForward | |
| | + Residual | |
| +-------------+ |
+-------------------+
|
LayerNorm
|
Linear Head
|
Output (vocab_size)