Improve Vevo2 Model Card: Add pipeline tag, links, and description
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -1,5 +1,4 @@
|
|
| 1 |
---
|
| 2 |
-
license: cc-by-nc-nd-4.0
|
| 3 |
datasets:
|
| 4 |
- amphion/Emilia-Dataset
|
| 5 |
language:
|
|
@@ -9,20 +8,28 @@ language:
|
|
| 9 |
- ko
|
| 10 |
- de
|
| 11 |
- fr
|
|
|
|
| 12 |
tags:
|
| 13 |
- tts
|
| 14 |
- vc
|
| 15 |
- svs
|
| 16 |
- svc
|
| 17 |
- music
|
|
|
|
| 18 |
---
|
| 19 |
|
| 20 |
# Vevo2
|
| 21 |
|
| 22 |
-
[
|
|
|
|
|
|
|
| 23 |
|
| 24 |
## Usage
|
| 25 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
import os
|
| 27 |
import torch
|
| 28 |
from huggingface_hub import snapshot_download
|
|
@@ -263,7 +270,6 @@ if __name__ == "__main__":
|
|
| 263 |
style_ref_text=taiyizhenren_text,
|
| 264 |
timbre_ref_wav_path=taiyizhenren_path,
|
| 265 |
)
|
| 266 |
-
|
| 267 |
```
|
| 268 |
|
| 269 |
## Citations
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- amphion/Emilia-Dataset
|
| 4 |
language:
|
|
|
|
| 8 |
- ko
|
| 9 |
- de
|
| 10 |
- fr
|
| 11 |
+
license: cc-by-nc-nd-4.0
|
| 12 |
tags:
|
| 13 |
- tts
|
| 14 |
- vc
|
| 15 |
- svs
|
| 16 |
- svc
|
| 17 |
- music
|
| 18 |
+
pipeline_tag: text-to-speech
|
| 19 |
---
|
| 20 |
|
| 21 |
# Vevo2
|
| 22 |
|
| 23 |
+
[\ud83d\udcda Paper](https://huggingface.co/papers/2508.16332) • [\ud83c\udfe0 Project Page](https://versasinger.github.io/) • [\ud83d\udcbb Code](https://github.com/microsoft/Vevo2)
|
| 24 |
+
|
| 25 |
+
Vevo2 is a unified framework for controllable speech and singing voice generation. It addresses challenges like the scarcity of annotated singing data and enables flexible controllability through two innovative audio tokenizers: a music-notation-free prosody tokenizer and a unified content-style tokenizer. The framework consists of an auto-regressive (AR) content-style modeling stage for control over text, prosody, and style, and a flow-matching acoustic modeling stage for timbre control. Joint speech-singing training and a multi-objective post-training task further enhance its ability to follow text and prosody. Experimental results demonstrate Vevo2's mutual benefits for both speech and singing voice generation, showcasing strong generalization and versatility across various synthesis, conversion, and editing tasks.
|
| 26 |
|
| 27 |
## Usage
|
| 28 |
```python
|
| 29 |
+
# To run the code, first clone the Vevo2 GitHub repository:
|
| 30 |
+
# git clone https://github.com/microsoft/Vevo2
|
| 31 |
+
# Then install its dependencies to access models.svc.vevo2.vevo2_utils
|
| 32 |
+
|
| 33 |
import os
|
| 34 |
import torch
|
| 35 |
from huggingface_hub import snapshot_download
|
|
|
|
| 270 |
style_ref_text=taiyizhenren_text,
|
| 271 |
timbre_ref_wav_path=taiyizhenren_path,
|
| 272 |
)
|
|
|
|
| 273 |
```
|
| 274 |
|
| 275 |
## Citations
|