About Forensic Voice Identification

Forensic voice identification is the scientific process of comparing voice recordings to assess the likelihood that they were produced by the same speaker. It is used in legal contexts where voice evidence is relevant, such as criminal investigations, counter-terrorism, and civil disputes.

Unlike informal voice recognition, forensic speaker comparison follows strict protocols to ensure analytical reliability and transparency. It can involve both human expert judgment and the use of automatic systems based on biometric voice modeling.

Key Principles

  • ✅ Based on acoustic and linguistic features of speech
  • ✅ Uses structured comparison and statistical reasoning
  • ✅ Designed for forensic applicability and expert reporting
  • ✅ Requires methodological transparency and validation

The Mission of BiometriaForense.it

BiometriaForense.it was created to support professionals working in forensic speaker comparison. Our goal is to provide high-quality tools, clear scientific explanations, and practical resources for those involved in the evaluation of voice evidence. We believe in transparent, evidence-based forensic practice.

Who is it for?

  • 🧑‍⚖️ Forensic experts and technical consultants
  • 🕵️‍♂️ Law enforcement and intelligence agencies
  • 👩‍⚖️ Legal professionals handling voice evidence
  • 🎓 Researchers and students in phonetics, linguistics, and forensic science

Forensic Voice Comparison in Detail

Forensic speaker recognition (also referred to as forensic speaker identification, or forensic speaker verification, despite some conceptual differences) is an application of voice biometrics aimed at determining if, and to what extent, an anonymous voice recording can be attributed to a specific individual of known identity. This is achieved through technical-scientific methods suitable for proving facts or evidence in legal proceedings.

Usually, forensic speaker recognition involves comparing one or more recordings of a known individual’s voice with a recording of an unknown (anonymous) individual’s voice. For this reason, the term forensic voice comparison is technically preferable to "speaker recognition" or "speaker identification."

The problem involves many variables, most of which stem from the intrinsic variability of a single individual’s speech (intra-speaker variability), the variability between different individuals (inter-speaker variability), environmental noise broadly defined that overlaps the voice and is captured with it, as well as the devices and channels used to capture, transmit, and record the voice signal.

The purpose of this article is to provide an introduction to the problem, emphasizing the necessity and opportunity for experts to use scientifically validated forensic voice comparison methodologies. Further in-depth topics will be addressed in upcoming articles. For inquiries, please contact info@biometriaforense.it.

The Question and the Expert’s Role

The outcome of forensic voice comparison should consist of information—preferably quantitative—regarding the strength of the biometric evidence (the voice recordings). It should not directly provide a categorical decision (e.g., whether the anonymous voice belongs or does not belong to the known individual).

In forensic contexts, the decision about whether the anonymous voice belongs to the known individual, or more generally about the guilt or innocence of the accused, is the exclusive domain of the judicial authority. The judge forms their conviction also based on the results of the forensic voice comparison carried out by the expert (technical consultant or forensic examiner).

This crucial distinction—between the role of the expert (who does not decide) and that of the judge (who is responsible for all decisions)—is often overlooked when formulating the expert’s task:

"...the expert is asked to analyze the voice recorded on the CD of the guarantee interrogation to verify whether it is the same as the voice recorded on CD XXX..."

"...to compare the voice of XXX in the intercepted phone calls YYY with a voice sample obtained from the direct recording of the defendant’s voice, in order to establish whether the voices are compatible..."

However, the expert should always remember that their role is to quantitatively evaluate the strength of the biometric voice evidence in support of one hypothesis or another (typically the prosecution and defense hypotheses), refraining from making categorical judgments on the hypotheses under consideration by the judge. Instead, the expert provides the judge with useful elements so that the judge, taking into account all other evidence in the trial, can reach a decision that will affect the defendant.

Methodologies Proposed in the Literature

Traditionally, methodologies (approaches) for extracting information from voice signals for forensic voice comparison purposes fall into four types, further classified as “subjective” and “objective”:

Subjective Approaches

These are primarily based on the expert’s opinion, with little or no quantitative measurement of the signals:

  • Auditory: (Nolan, 1997; Rose, 2006; Jessen, 2008) Mainly practiced by phoneticians, it relies heavily on their experience to identify, document, and compare any relevant voice features under investigation, possibly including some basic acoustic measurements.
  • Spectrographic (Voiceprinting): (cf. Kersta, 1962; Tosi et al., 1972; Rose, 2002; Morrison, 2010, 2014) Based largely on visual comparison of spectrograms of important parts of the audio recordings being compared.

Objective Approaches

These rely mainly on quantitative measurements of the signals:

  • Acoustic-Phonetic (Semi-Automatic): This approach involves quantitative measurements of the acoustic properties of comparable units (e.g., phonemes) of recorded voices, typically performed using signal processing software under expert supervision, along with statistical modeling of features (cf. Nolan, 1997; Rose, 2002, 2006; Jessen, 2008).
  • Automatic: (Jessen, 2008) Based on quantitative measurements of voice signals (commonly using MFCCs), but requires much les

ENFSI Recommendations on Evaluating and Presenting the Strength of Scientific Evidence: The Likelihood-Ratio Framework

The ENFSI (European Network of Forensic Science Institutes) has adopted the recommendations of many scientists supporting the so-called Bayesian framework or likelihood-ratio approach for evaluating the strength of scientific evidence, including forensic voice comparison (see this guideline, especially page 22).

For example, in forensic voice comparison, suppose we must compare a recording of an anonymous voice with a voice sample provided by the defendant. The underlying hypotheses are: the anonymous voice originated from the defendant (prosecution hypothesis) versus the anonymous voice does not belong to the defendant (defense hypothesis), but rather to some other person from a relevant reference population.

Due to the non-negligible variability of voice recordings, there is generally a non-zero probability that the defendant could produce a voice sample similar to the anonymous recording, and a non-zero probability that any other person could produce a similar sample.

Given these two hypotheses and the evidence E, the expert should calculate a quantity represented by the ratio of the probability of the evidence E under the prosecution hypothesis (the anonymous recording and the defendant’s sample have the same origin), to the probability of the same evidence E under the defense hypothesis (the anonymous recording and the defendant’s sample have different origins). In formulas:

Likelihood Ratio (LR) = P(E | prosecution hypothesis) / P(E | defense hypothesis)

In other words, the probability of “observing” the characteristics of the anonymous voice sample, assuming it belongs to the defendant, is compared to the probability of observing the same characteristics, assuming the sample belongs to another person.

The expert can only evaluate these two probabilities of observing the evidence E under the respective hypotheses, but cannot calculate the probability of each hypothesis (i.e., the probability of guilt or innocence) given the evidence E. This latter assessment is the sole responsibility of the judge.


A study from some years ago (Romito and Galatà, 2007) revealed a discouraging picture regarding the diffusion of scientifically sound methodologies for forensic voice comparison, as well as a lack of specific training.

The significance of this issue highlighted in that research, along with more recent experience, suggests that the problem is still very present in Italy. Therefore, further reflection on the role of professionals called to conduct forensic voice comparisons is necessary, starting precisely from their perspective.

References

  • Grimaldi M., D’Apolito S., Gili Fivela B., Sigona F., Illusione e Scienza nella Fonetica Forense: Una Sintesi, Mondo Digitale, AICA, September 2014, ISSN: 1720-898X.
  • Jessen, Michael. “Forensic phonetics”. Language and Linguistics Compass, 2 (2008): 671–711. DOI:10.1111/j.1749-818x.2008.00066.x.
  • Kersta, Lawrence G. “Voiceprint identification”. Nature, 196 (1962): 1253–1257. DOI: 10.1038/1961253a0.
  • Morrison, Geoffrey S. “Forensic voice comparison”. In Expert Evidence, edited by Ian Freckelton and Hugh Selby, Chapter 99. Sydney: Australia Thomson Reuters, 2010.
  • Morrison, Geoffrey S. “Distinguishing between forensic science and forensic pseudoscience: testing of validity and reliability, and approaches to forensic voice comparison”. Science & Justice, 54 (2014): 245–256.
  • Nolan, Francis. “Speaker recognition and forensic phonetics”. In The handbook of phonetic sciences, edited by William J. Hardcastle and John Laver, 744–676. Oxford: Blackwell, 1997.
  • Paoloni, Andrea, Mauro Falcone and Antonio Federico. “The Parametric Approach in Forensic Speaker Recognition”, Proceedings of the COST 250 Workshop Speaker Recognition by Man and by Machine: Directions for Forensic Applications, 45–51. Ankara, Turkey, 1998.
  • Robertson, Bertrand and Tony G.A. Vignaux. Interpreting scientific evidence. Evaluating Forensic Science in the Courtroom. In Expert Evidence, edited by Ian Freckelton and Hugh Selby, Chapter 28. Sydney, Australia: Thomson Reuters, 2000.
  • Romito, Luciano and Vincenzo Galatà. “Speaker Recognition: Stato dell’arte in Italia. Valutazione dei corpora, dei metodi e delle professionalità coinvolte”. In Scienze Vocali e del linguaggio – Metodologie di valutazione e risorse linguistiche, edited by Veronica Giordani, Valentina Bruseghini and Piero Cosi, pp. 223–242. Torriana: EDK Editore, 2007.
  • Rose, Phil. Forensic speaker identification. London: Taylor and Francis, 2002.
  • Rose, Phil. “Technical forensic speaker recognition”. Computer Speech and Language, 20 (2006): 159–191. DOI:10.1016/j.csl.2005.07.003.
  • Sigona F., Grimaldi M., “Tools for Forensic Speaker Recognition”, in Forensic Communication: Theory and Practice: A Study of Discourse Analysis and Transcription, Edited by Franca Orletti and Laura Mariottini, Cambridge Scholars Publishing, August 2017.
  • Sigona F., Grimaldi M., «Il riconoscimento del parlante in ambito forense: uno studio indipendente sul software IDEM/SPREAD in uso ai Carabinieri», in Sicurezza e Giustizia, N. IV, December 2015, ISSN: 2039-9669.
  • Tosi, Oscar, et al. “Experiment on voice identification”. Journal of the Acoustical Society of America, 51 (1972): 2030–2043. DOI:10.1121/1.1913064.
Scroll to Top