Paper accepted at CHR2025 conference
Our new paper More Sound, More Soundness? Improving Authorship Attribution with Phonemes will be soon presented at conference Computational Humanities Research. I am kind of proud because it is my first paper about digital humanities, and I hope it is the first of a long series.
Here is the abstract:
This paper assesses whether turning written French poetry into a speech-oriented representation can improve the performance of authorship attribution methods. To this end, we develop a phonetic transcription system to automatically convert poems from six authors – including the disputed Illuminations of Rimbaud – into phonetic transcriptions, and adapt existing tools to ingest and process phonetic data. The output of this grapheme-to-phoneme task is then enriched with minimal prosodic cues, namely the creation of synthetic tokens based on punctuation and the addition of basic French liaisons. Using the same trigram features and classifier across all representations, we observe that moving from orthographic to phonetic transcriptions with a modest prosodic enrichment raises the F-score from 0.89 to 0.95, while reducing inter-author confusion. These results suggest that even lightweight speech-based features, produced with reproducible rules and open tools, can meaningfully enhance stylometric analysis of French verse and warrant further study for contested texts.
Slides will be posted after the presentation.

Portrait of Arthur Rimbaud, by Jean-Louis Forain, 1872, private collection (public domain).