Automatic annotation of large collections of images for image retrieval
I am working on the automatic annotation of images for use in a content-based image retrieval system. Because query-by-example (QBE) is an awkward and lengthy process, the use of keywords to describe aspects of images is preferred. However, this requires an accurately annotated image collection. Manual annotation is tedious, so machine learning algorithms can be applied to a partially labeled database. Beginning with a set of labeled images, the labels can be propagated throughout the collection using similarity measures and clustering.
An important part of my work is exploiting user interaction from retrieval systems (relevance feedback, for example) to learn more about modeling the semantic relationships between documents.
My research is funded by The Swiss National Center of Competence in Research (NCCR) on Interactive Multimodal Information Management (IM2).
Voice Profiling
The topic of my masters thesis was voice profiling, but more specifically emotion detection from the human speech signal. I developed a real-time system for use over telephone networks which allows the detection of speaker emotion in environments such as call-centres. There is often a high amount of emotion (mainly frustration and anger) expressed from customers calling call-centres. Apart from the ethical implications, modeling the emotion of a caller can allow call-centre staff to more appropriately handle the calls and deal with disputes. Automatic call-centre support systems, where the callers interact with a speech recognition system, could detect confusion or frustration, and forward the customer to human support staff for further assistance.
The system is speech activated (using a energy and zero crossing thresholds), so when a person begins talking, their speech is recorded until they finish. Fundamental frequency (F0), formant frequency (F1-F3), energy, and rhythm features are extracted from each segmented utterance. These features are then used as input to a classification system which has been trained on specific emotions (anger, surprise, sadness, joy, fear, disgust). The output of the classifier determines the predicted emotion of the speaker.
I experimented with several classification models, from artificial neural networks to support vector machines to hybrid and ensemble methods which combined the predictions of several classifiers. Although ensemble classifiers increased the complexity of the system (with respect to training time), they were found to perform better than the traditional models.
Publications
D. Morrison, S. Marchand-Maillet, E. Bruno. Capturing the semantics of user interaction: A review and case study. Richard Chbeir, Aboul-Ella Hassanien, Ajith Abraham, Youakim Badr (Eds). Accepted in Emergent Web Intelligence, Springer, 2008.
D. Morrison, S. Marchand-Maillet, E. Bruno. Semantic clustering of images using patterns of relevance feedback. In Proceedings of 6th International Workshop on Content-based Multimedia Indexing, London, UK, June 18-20 2008.
D. Morrison, S. Marchand-Maillet, E. Bruno. Automatic image annotation with relevance feedback and latent semantic analysis. In Proceedings of 5th International Workshop on Adaptive Multimedia Retrieval (AMR), Paris, France, July 5-6 2007.
D. Morrison, S. Marchand-Maillet, E. Bruno. Hierarchical Long-Term Learning for Automatic Image Annotation. In Proceedings of 2nd International Conference on Semantic and Digital Media Technologies (SAMT), Genova, Italy, December 5-7 2007.
D. Morrison, R. Wang, L.C. De Silva. Ensemble methods for spoken emotion recognition in call-centres. In Speech Communication. Vol 49, pp 98-112, 2007. [BibTex]
D. Morrison, R. Wang, W.L. Xu, L.C. De Silva. Incremental Learning for Spoken Affect Classification and its Application in Call-Centres. In the International Journal of Intelligent Systems Technologies and Applications (IJISTA). Vol. 2, Nos. 2/3, 2007. [BibTex]
D. Morrison, L.C. De Silva. Voting ensembles for spoken affect classification. In the Journal of Networks and Computer Applications (JNCA). Vol 30, pp. 1356-1365, 2007.
D. Morrison, R. Wang, L.C. De Silva, W.L. Xu. Real-time Spoken Affect Classification and its Application in Call-Centres. Proceedings of the International Conference on Information Technology and Applications (ICITA), Sydney, Australia, July 3-7, 2005. pp. 483-486. IEEE Computer Society. 2005. [BibTex]
D. Morrison, R. Wang, L.C. De Silva. Spoken Affect Classification using Neural Networks. Proceedings of IEEE Conference on Granular Computing, Beijing, China, July 25-27, 2005. pp. 583-586. IEEE Computer Society. 2005. [BibTex]