Les Enseignants-Chercheurs du Campus de Metz

CentraleSupélec Campus de Metz
2 rue Edouard Belin
57070 Metz

Phone: +33(0)387 76 47 73
Fax: +33(0)387 76 47 00

My page...
Teaching activities
Research :
Research topics
ma tete
> Home> Enseignants-Chercheurs


 Enseignant Chercheur


Research topics:

Over the last few years, Stéphane Rossignol has been designing and programming sound processing applications. Half-automatic and automatic indexing of sound signals are concerned.

These works deal mostly with the temporal segmentation and indexing of musical sound signals. Three interdependent schemes of segmentation are defined, each corresponding to a different level of signal attributes.

  • 1) The first scheme, the so-called "source" scheme, concerns the distinction between speech/music/different noises/etc. These sounds are coming for instance from movie sound tracks and from radio broadcasts.

    "Features" are examined. They intend to measure and underline distinct properties of speech signal, of musical signal, etc. They are combined into the multidimensional classification frameworks described in the literature. The performance obtained for each combination of features and using each classification system is discussed.

  • 2) The second segmentation scheme, the so-called "characteristics" scheme, refers to labels such as: silence/sound, voiced/unvoiced, harmonic/inharmonic, monophonic/polyphonic, with vibrato/without vibrato, with tremolo/without tremolo, violin/piano/etc. Most of these characteristics are considered as features by themselves when the third segmentation scheme, which is described in details below, is performed.

    Vibrato detection, vibrato parameter (its frequency and its magnitude) estimation, and vibrato extraction from the fundamental frequency trajectory are particularly studied. Several techniques are developed. The performance of the system is discussed on real world data.

    The vibrato is extracted from the fundamental frequency trajectory in order to obtain a "flat" melodic evolution. This "flat" fundamental frequency trajectory can thus be used for the segmentation of musical excerpts into notes (third segmentation scheme), and can also be used for the modification and/or further processings of these sounds.

    The vibrato detection is operated only if the source "music" is identified when the first segmentation scheme is performed.

  • 3) The third scheme leads to the segmentation into "notes or into phones or more generally into steady state parts", according to the nature of the sound: instrumental part, singing voice excerpt, speech, percussive part...

    The analysis can be cutted out in four steps. This point of view is too straightforward, but worthy informative. The first step is to extract a large set of features. A feature will be all the more appropriate as its time evolution presents strong and short peaks when transitions occur, and as its variance and its mean remain at very low levels when describing a steady state part. Three kinds of transitions exist: fundamental frequency transients, energy transients and frequency content transients; each of them corresponds to one of the criterions used to psycho-acoustically differentiate sounds from each other. Secondly, each of these features is automatically thresholded. Thirdly, a final decision function, based on the set of the thresholded features, is derived. It provides the segmentation marks. Within this framework, data fusion techniques are studied. Lastly, for monophonic and harmonic sounds, the automatic transcription is performed. The performance of the system on real world samples is discussed.

The data obtained in a given scheme are propagated from lower numbered to higher numbered schemes in order to improve their performance.

The length of the segments provided by the "sources" segmentation scheme can be of the order of a few minutes. The length of the segments provided by the "characteristics" segmentation scheme is commonly smaller: it is of the order of a few dozens of seconds, say. The length of the segments provided by the "steady state parts" segmentation scheme is most often smaller than a second.

The unification and maintaining of the developped softwares, as new techniques are used, are carried out. Especially, these softwares are organized into five sets of softwares:

  • 1) "Segmentation", with the main goal of performing the segmentation into notes (music) or into phones (music and speech).

    Notably, new "features" are continuously developed and evaluated. For instance, recently (2007), features based on kernel methods, such as the SVMs, have been studied.

  • 2) "Sources", with the goal of performing the source segmentation.

  • 3) Various "pitch-trackers/partial-trackers".

  • 4) The "characteristics" processing, and particularly the vibrato one.

  • 5) A "user interface", with the goal of being, ideally, multimodal, this in order to be as ergonomic as possible, allowing thus the manual indexing process being as flexible and fast as possible. The main goal of this user interface is to allow a half-automatic indexing process, this because the automatic segmentation/indexing can not be fully performed yet. It's a question of helping to build large databases of indexed musical sound signals. This kind of databases exist for speech, but they are much less numerous for musical sound signals.