Safetensors
tts
vc
svs
svc
music

Improve Vevo2 Model Card: Add pipeline tag, links, and description

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: cc-by-nc-nd-4.0
3
  datasets:
4
  - amphion/Emilia-Dataset
5
  language:
@@ -9,20 +8,28 @@ language:
9
  - ko
10
  - de
11
  - fr
 
12
  tags:
13
  - tts
14
  - vc
15
  - svs
16
  - svc
17
  - music
 
18
  ---
19
 
20
  # Vevo2
21
 
22
- [![arXiv](https://img.shields.io/badge/Vevo-Paper-COLOR.svg)](https://arxiv.org/abs/2508.16332)
 
 
23
 
24
  ## Usage
25
  ```python
 
 
 
 
26
  import os
27
  import torch
28
  from huggingface_hub import snapshot_download
@@ -263,7 +270,6 @@ if __name__ == "__main__":
263
  style_ref_text=taiyizhenren_text,
264
  timbre_ref_wav_path=taiyizhenren_path,
265
  )
266
-
267
  ```
268
 
269
  ## Citations
 
1
  ---
 
2
  datasets:
3
  - amphion/Emilia-Dataset
4
  language:
 
8
  - ko
9
  - de
10
  - fr
11
+ license: cc-by-nc-nd-4.0
12
  tags:
13
  - tts
14
  - vc
15
  - svs
16
  - svc
17
  - music
18
+ pipeline_tag: text-to-speech
19
  ---
20
 
21
  # Vevo2
22
 
23
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2508.16332) • [\ud83c\udfe0 Project Page](https://versasinger.github.io/) • [\ud83d\udcbb Code](https://github.com/microsoft/Vevo2)
24
+
25
+ Vevo2 is a unified framework for controllable speech and singing voice generation. It addresses challenges like the scarcity of annotated singing data and enables flexible controllability through two innovative audio tokenizers: a music-notation-free prosody tokenizer and a unified content-style tokenizer. The framework consists of an auto-regressive (AR) content-style modeling stage for control over text, prosody, and style, and a flow-matching acoustic modeling stage for timbre control. Joint speech-singing training and a multi-objective post-training task further enhance its ability to follow text and prosody. Experimental results demonstrate Vevo2's mutual benefits for both speech and singing voice generation, showcasing strong generalization and versatility across various synthesis, conversion, and editing tasks.
26
 
27
  ## Usage
28
  ```python
29
+ # To run the code, first clone the Vevo2 GitHub repository:
30
+ # git clone https://github.com/microsoft/Vevo2
31
+ # Then install its dependencies to access models.svc.vevo2.vevo2_utils
32
+
33
  import os
34
  import torch
35
  from huggingface_hub import snapshot_download
 
270
  style_ref_text=taiyizhenren_text,
271
  timbre_ref_wav_path=taiyizhenren_path,
272
  )
 
273
  ```
274
 
275
  ## Citations