Automatic music transcription plays an important role in music signal processing. Generally speaking, the transcription consists of two steps -- note segmentation and multi-pitch estimation. Most of the system estimates multi-pitch based on Iterative Method. However, the iterative method attempts to solve a problem by finding successive approximations to the solution starting from an initial guess and cost more time in computation, so ZHOU Ruohua and YAN Yonghong of ThinkIT Lab of Institute of Acoustics, Chinese Academy of Science carried out a series of researches and put forward a computationally efficient approach to automatic polyphonic music transcription.
The approach employs the Resonator Time-Frequency Image (RTFI) as a basic time-frequency analysis tool. The approach is mainly composed of two stages: energy-based onset detection and multiple pitch estimation. The proposed method makes preliminary pitch estimations by a simple peak-picking in the pitch energy spectrum, which is produced from original energy spectrum according to harmonic grouping principle; and then the incorrect estimations are cancelled according to spectral irregularity or some assumptions about harmonic structure of music notes.