Safetensors
tts
vc
svs
svc
music
nielsr HF Staff commited on
Commit
467cc35
·
verified ·
1 Parent(s): fa99f66

Improve Vevo2 Model Card: Add pipeline tag, links, and description

Browse files

This PR significantly improves the model card for the Vevo2 model by:
- Adding `pipeline_tag: text-to-speech` to the metadata, enhancing its discoverability on the Hugging Face Hub.
- Including direct links to the paper, project page, and GitHub repository at the top of the card for easy access to related resources.
- Adding a concise description of the model, derived from its abstract, to provide users with a quick overview.
- Retaining the existing, comprehensive sample usage section and citations. A helpful comment has been added to the usage section to guide users on cloning the repository.

Please review these updates.

Files changed (1) hide show
  1. README.md +9 -3
README.md CHANGED
@@ -1,5 +1,4 @@
1
  ---
2
- license: cc-by-nc-nd-4.0
3
  datasets:
4
  - amphion/Emilia-Dataset
5
  language:
@@ -9,20 +8,28 @@ language:
9
  - ko
10
  - de
11
  - fr
 
12
  tags:
13
  - tts
14
  - vc
15
  - svs
16
  - svc
17
  - music
 
18
  ---
19
 
20
  # Vevo2
21
 
22
- [![arXiv](https://img.shields.io/badge/Vevo-Paper-COLOR.svg)](https://arxiv.org/abs/2508.16332)
 
 
23
 
24
  ## Usage
25
  ```python
 
 
 
 
26
  import os
27
  import torch
28
  from huggingface_hub import snapshot_download
@@ -263,7 +270,6 @@ if __name__ == "__main__":
263
  style_ref_text=taiyizhenren_text,
264
  timbre_ref_wav_path=taiyizhenren_path,
265
  )
266
-
267
  ```
268
 
269
  ## Citations
 
1
  ---
 
2
  datasets:
3
  - amphion/Emilia-Dataset
4
  language:
 
8
  - ko
9
  - de
10
  - fr
11
+ license: cc-by-nc-nd-4.0
12
  tags:
13
  - tts
14
  - vc
15
  - svs
16
  - svc
17
  - music
18
+ pipeline_tag: text-to-speech
19
  ---
20
 
21
  # Vevo2
22
 
23
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2508.16332) • [\ud83c\udfe0 Project Page](https://versasinger.github.io/) • [\ud83d\udcbb Code](https://github.com/microsoft/Vevo2)
24
+
25
+ Vevo2 is a unified framework for controllable speech and singing voice generation. It addresses challenges like the scarcity of annotated singing data and enables flexible controllability through two innovative audio tokenizers: a music-notation-free prosody tokenizer and a unified content-style tokenizer. The framework consists of an auto-regressive (AR) content-style modeling stage for control over text, prosody, and style, and a flow-matching acoustic modeling stage for timbre control. Joint speech-singing training and a multi-objective post-training task further enhance its ability to follow text and prosody. Experimental results demonstrate Vevo2's mutual benefits for both speech and singing voice generation, showcasing strong generalization and versatility across various synthesis, conversion, and editing tasks.
26
 
27
  ## Usage
28
  ```python
29
+ # To run the code, first clone the Vevo2 GitHub repository:
30
+ # git clone https://github.com/microsoft/Vevo2
31
+ # Then install its dependencies to access models.svc.vevo2.vevo2_utils
32
+
33
  import os
34
  import torch
35
  from huggingface_hub import snapshot_download
 
270
  style_ref_text=taiyizhenren_text,
271
  timbre_ref_wav_path=taiyizhenren_path,
272
  )
 
273
  ```
274
 
275
  ## Citations