NanoGPT Chinese Lyrics

A GPT-2 Small style language model trained on Chinese song lyrics for text generation.

Model Description

  • Architecture: GPT-2 Small (Decoder-only Transformer)
  • Parameters: ~85M
  • Embedding Dimension: 768
  • Attention Heads: 12
  • Layers: 12
  • Context Length: 1024 tokens
  • Tokenizer: SentencePiece BPE (vocab_size=10000)
  • Position Encoding: RoPE (Rotary Position Embedding)

Training Data

Trained on approximately 100,000 Chinese song lyrics (~20M tokens).

Usage

Requirements

pip install torch sentencepiece huggingface_hub

Download Model

from huggingface_hub import snapshot_download

# 下载所有文件到本地目录
snapshot_download(repo_id="yiwenX/nanogpt-chinese-lyrics", local_dir="./nanogpt-lyrics")

或使用命令行:

huggingface-cli download yiwenX/nanogpt-chinese-lyrics --local-dir ./nanogpt-lyrics

Inference

import os
import torch
import sentencepiece as spm

# 切换到模型目录
os.chdir("./nanogpt-lyrics")

from model import GPTLanguageModel

# Load model
device = 'cuda' if torch.cuda.is_available() else 'cpu'
checkpoint = torch.load('best.pt', map_location=device, weights_only=False)

model = GPTLanguageModel(checkpoint['config']['vocab_size']).to(device)
model.load_state_dict(checkpoint['model'])
model.eval()

# Load tokenizer
sp = spm.SentencePieceProcessor()
sp.load('chinese_lyrics_bpe.model')

# Generate text
prompt = "今夜我"
input_ids = torch.tensor([sp.encode(prompt)], dtype=torch.long, device=device)

with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=100, temperature=0.8, top_p=0.9)

generated_text = sp.decode(output_ids[0].tolist())
print(generated_text)

Command Line

cd nanogpt-lyrics
python inference.py --prompt "今夜我" --max_tokens 200

Files

File Description
best.pt Model weights (PyTorch checkpoint)
chinese_lyrics_bpe.model SentencePiece tokenizer model
chinese_lyrics_bpe.vocab Tokenizer vocabulary
model.py Model architecture definition
tokenizer.py Tokenizer utilities
config.py Model configuration
inference.py Inference script

Model Architecture

Input -> Token Embedding (RoPE in Attention)
              |
    +-------------------+
    |   Transformer     | x 12 layers
    |      Block        |
    |  +-------------+  |
    |  | LayerNorm   |  |
    |  | Multi-Head  |  |
    |  | Attention   |  |
    |  | + Residual  |  |
    |  +-------------+  |
    |  | LayerNorm   |  |
    |  | FeedForward |  |
    |  | + Residual  |  |
    |  +-------------+  |
    +-------------------+
              |
         LayerNorm
              |
         Linear Head
              |
    Output (vocab_size)

References

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support