RMSnow
/

Vevo2

@@ -1,5 +1,4 @@
 ---
-license: cc-by-nc-nd-4.0
 datasets:
 - amphion/Emilia-Dataset
 language:
@@ -9,20 +8,28 @@ language:
 - ko
 - de
 - fr
 tags:
 - tts
 - vc
 - svs
 - svc
 - music
 ---
 # Vevo2
-[![arXiv](https://img.shields.io/badge/Vevo-Paper-COLOR.svg)](https://arxiv.org/abs/2508.16332)
 ## Usage
 ```python
 import os
 import torch
 from huggingface_hub import snapshot_download
@@ -263,7 +270,6 @@ if __name__ == "__main__":
         style_ref_text=taiyizhenren_text,
         timbre_ref_wav_path=taiyizhenren_path,
     )
 ```
 ## Citations

 ---
 datasets:
 - amphion/Emilia-Dataset
 language:
 - ko
 - de
 - fr
+license: cc-by-nc-nd-4.0
 tags:
 - tts
 - vc
 - svs
 - svc
 - music
+pipeline_tag: text-to-speech
 ---
 # Vevo2
+[\ud83d\udcda Paper](https://huggingface.co/papers/2508.16332) • [\ud83c\udfe0 Project Page](https://versasinger.github.io/) • [\ud83d\udcbb Code](https://github.com/microsoft/Vevo2)
+Vevo2 is a unified framework for controllable speech and singing voice generation. It addresses challenges like the scarcity of annotated singing data and enables flexible controllability through two innovative audio tokenizers: a music-notation-free prosody tokenizer and a unified content-style tokenizer. The framework consists of an auto-regressive (AR) content-style modeling stage for control over text, prosody, and style, and a flow-matching acoustic modeling stage for timbre control. Joint speech-singing training and a multi-objective post-training task further enhance its ability to follow text and prosody. Experimental results demonstrate Vevo2's mutual benefits for both speech and singing voice generation, showcasing strong generalization and versatility across various synthesis, conversion, and editing tasks.
 ## Usage
 ```python
+# To run the code, first clone the Vevo2 GitHub repository:
+# git clone https://github.com/microsoft/Vevo2
+# Then install its dependencies to access models.svc.vevo2.vevo2_utils
 import os
 import torch
 from huggingface_hub import snapshot_download
         style_ref_text=taiyizhenren_text,
         timbre_ref_wav_path=taiyizhenren_path,
     )
 ```
 ## Citations