Introduction:

  • Singing voice conversion aims to convert the voice of one singer to that of other singers while keeping the singing content and prosody.
  • In our experiments, the model is fine-tuned using 15 minutes of singing data for each singer to be converted. In the first part we provide the audio of the two target singers (Female and Male) for timbre reference.
  • In the second part, Source Audio and Source Score represent the singing content and prosody we want to keep. To Male and To Female represent the result of conversion to Target Singer.
  • pitch shift (PS): we replace the source score Ssource with the shifted score Sshifted as input. Sshifted is obtain by the formula: Sshifted = Ssource + Ptarget - Psource, where Ptarget is obtained by averaging the pitch of Target Singer and Psource is obtained by averaging the pitch of Source Audio.


  • Audio Samples

    Samples of Target Singer

      Female
      Male

    Singing Voice Coversion

    1. xiang xin jiou huei cuen zai

    2. Source Audio (voc.) Source Score
      wav
      To Female (w/o PS) To Female (w/ PS)
      wav
      To Male (w/o PS) To Male (w/ PS)
      wav

    3. uo hai bu ken xiang xin

    4. Source Audio (voc.) Source Score
      wav
      To Female (w/o PS) To Female (w/ PS)
      wav
      To Male (w/o PS) To Male (w/ PS)
      wav

    More Samples

    1. Alto to Soprano

    2. Target Singer
      wav
      Source Audio (voc.) To Soprano
      wav
      Source Audio (voc.) To Soprano
      wav

    3. Soprano to Alto

    4. Target Singer
      wav
      Source Audio (voc.) To Alto
      wav
      Source Audio (voc.) To Alto
      wav

    5. Bass to Tenor

    6. Target Singer
      wav
      Source Audio (voc.) To Tenor
      wav
      Source Audio (voc.) To Tenor
      wav

    7. Tenor to Bass

    8. Target Singer
      wav
      Source Audio (voc.) To Bass
      wav
      Source Audio (voc.) To Bass
      wav