Singing Transcription Using Pitch Detection: Analysis Of Results

Average Error - Best Two Results

Trained Female - 244.5
Untrained Female - 369
Untrained Male - 1108

From this experiment I was able to determine that pitch is a reasonable measure for singing quality. The major problems that I came across is that simply using pitch does not take into account octave changes which may not be easily perceived by human listeners. On several occasions, the samples changed from, for example, a D5 to a D6. For this system that was a significant error, however, for someone listening it might not be noticeable.

The result is that this is most useful as a tool to allow singers to train their vocal mimicry. It forces the singer to stay in pitch and in time, which if practiced would lead to an overall improvement of that singer's vocal control.

Another limitation of this system is that it is not song independent. In order for it to work, you need to be testing against a specific song. Moreover, the correct frequency and timing needs to be known a priori. This makes the system less robust than an implementation that could be used on an arbitrary song.

As compared to other works in the field, this implementation was the only one I ran across that used pitch as the only metric for speech quality. A few other implementations used other more subtle variations in vocal quality, HMM's with training data, and other classification methods to attempt to differentiate between a good and bad singer.

Given the limitations of the system, I am extremely happy with the results.

Singing Transcription Using Pitch Detection

Wednesday, April 23, 2008

Analysis Of Results

1 comment:

Blog Archive

About Me