NanoGPT Chinese Lyrics

A GPT-2 Small style language model trained on Chinese song lyrics for text generation.

Model Description

Architecture: GPT-2 Small (Decoder-only Transformer)
Parameters: ~85M
Embedding Dimension: 768
Attention Heads: 12
Layers: 12
Context Length: 1024 tokens
Tokenizer: SentencePiece BPE (vocab_size=10000)
Position Encoding: RoPE (Rotary Position Embedding)

Training Data

Trained on approximately 100,000 Chinese song lyrics (~20M tokens).

Usage

Requirements

pip install torch sentencepiece huggingface_hub

Download Model

from huggingface_hub import snapshot_download

# 下载所有文件到本地目录
snapshot_download(repo_id="yiwenX/nanogpt-chinese-lyrics", local_dir="./nanogpt-lyrics")

或使用命令行：

huggingface-cli download yiwenX/nanogpt-chinese-lyrics --local-dir ./nanogpt-lyrics

Inference

import os
import torch
import sentencepiece as spm

# 切换到模型目录
os.chdir("./nanogpt-lyrics")

from model import GPTLanguageModel

# Load model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
checkpoint = torch.load('best.pt', map_location=device, weights_only=False)

model = GPTLanguageModel(checkpoint['config']['vocab_size']).to(device)
model.load_state_dict(checkpoint['model'])
model.eval()

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load('chinese_lyrics_bpe.model')

# Generate text
prompt = "今夜我"
input_ids = torch.tensor([sp.encode(prompt)], dtype=torch.long, device=device)

with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)

generated_text = sp.decode(output_ids[0].tolist())
print(generated_text)

Command Line

cd nanogpt-lyrics
python inference.py --prompt "今夜我" --max_tokens 200

Files

File	Description
`best.pt`	Model weights (PyTorch checkpoint)
`chinese_lyrics_bpe.model`	SentencePiece tokenizer model
`chinese_lyrics_bpe.vocab`	Tokenizer vocabulary
`model.py`	Model architecture definition
`tokenizer.py`	Tokenizer utilities
`config.py`	Model configuration
`inference.py`	Inference script

Model Architecture

Input -> Token Embedding (RoPE in Attention)
              |
    +-------------------+
    |   Transformer     | x 12 layers
    |      Block        |
    |  +-------------+  |
    |  | LayerNorm   |  |
    |  | Multi-Head  |  |
    |  | Attention   |  |
    |  | + Residual  |  |
    |  +-------------+  |
    |  | LayerNorm   |  |
    |  | FeedForward |  |
    |  | + Residual  |  |
    |  +-------------+  |
    +-------------------+
              |
         LayerNorm
              |
         Linear Head
              |
    Output (vocab_size)

References

Downloads last month: -; Downloads are not tracked for this model. How to track