Singing voice conversion aims to convert the voice of one singer to that of other singers while keeping the singing content and prosody.
In our experiments, the model is fine-tuned using 15 minutes of singing data for each singer to be converted. In the first part we provide the audio of the two target singers (Female and Male) for timbre reference.
In the second part, Source Audio and Source Score represent the singing content and prosody we want to keep. To Male and To Female represent the result of conversion to Target Singer.
pitch shift (PS): we replace the source score Ssource with the shifted score Sshifted as input. Sshifted is obtain by the formula: Sshifted = Ssource + Ptarget - Psource, where Ptarget is obtained by averaging the pitch of Target Singer and Psource is obtained by averaging the pitch of Source Audio.