Introduction:

Singing voice conversion aims to convert the voice of one singer to that of other singers while keeping the singing content and prosody.

In our experiments, the model is fine-tuned using 15 minutes of singing data for each singer to be converted. In the first part we provide the audio of the two target singers (Female and Male) for timbre reference.

In the second part, Source Audio and Source Score represent the singing content and prosody we want to keep. To Male and To Female represent the result of conversion to Target Singer.

pitch shift (PS): we replace the source score S_source with the shifted score S_shifted as input. S_shifted is obtain by the formula: S_shifted = S_source + P_target - P_source, where P_target is obtained by averaging the pitch of Target Singer and P_source is obtained by averaging the pitch of Source Audio.