A Badly Mismatched Battle of the Experts
We covered yesterday’s testimony of the state witnesses at length here, so no point in going over that again. Today we’re going to take a deep dive into the expert witness testimony of two more of the defense expert witnesses–Dr. J.P. (Peter) French and Dr. George Doddington. Their testimony today, in combination with Dr. Hirotaka Nakasone’s expert witness testimony last Thursday, seems devastating to the testimony and credibility of the state’s expert witnesses, Mr. Tom Owens and Dr. Alan R. Reich.
Even based merely on their relative training and experience the defense expert witnesses appear to vastly outclass those brought forward by the State. For example:
State Expert #1, Tom Owens: A one-man audio forensics operation whose highest academic degree is a BA in History, who has no formal academic training in speech recognition or speaker identification, whose standards for analysis are self-written and in any case subject to self-violation in order to ensure a “finding”, who uses as his primary speech analysis tool a computerized “biometric” system in which he has a substantial financial stake, and which in any case he has used for less than two years and tested about as an expert witness only once before.
State Expert #2, Dr. Alan Reich, PhD: A long-retired college professor with an avocational interest in speech recognition and identification who manages to hear words and phrases in audio recordings that cannot be heard by world-leading experts in the field, much less by layman.
Compare and contrast with the defense expert witnesses:
Defense Expert #1, Dr. Hirotaka Nakasone, PhD: A senior scientist with a 17-year career with the Federal Bureau of Investigation, currently the Head of the Bureau’s Voice Recognition Program, and the Head of the international working group establishing the first formal scientific standards for speech recognition and speaker identification.
Defense Expert #2, Dr. J.P. (Peter) French, PhD: Dr. French earned his PhD in the analyses of recorded conversations, he is the Director of J.P. French Associates, the United Kingdom’s longest established independent forensic laboratory specializing in the analysis of speech, audio, and language with 6 full-time scientific staff, and a Professor in the Department of Language and Linguistic Science, University of York.
Defense Expert #3, Dr. George Doddington, PhD: Dr. Doddington conducted his doctoral thesis on speaker recognition, and since 1970 he has been leading the development and evaluation of speech recognition and speaker identification technologies and methodologies at a wide variety of top-level institutions, including Texas Instruments, the Defense Advanced Research Projects Agency (DARPA), the National Security Agency (NSA), and currently at the National Institute for Standards and Technology (NIST). Interestingly, he was deeply involved in the development of the voice recognition technology used in the F-16 fighter aircraft.
The mismatch in training and experience is obvious.
But training and experience don’t tell the whole story–the actual testimony of the defense expert witnesses is particularly damaging to the State’s expert witness testimony, as I describe in my usual detail below. This post is long, but if you make it to the end you once again get my bonus CRAZY IDEA OF THE DAY: The worse the state experts performed, the MORE likely the Judge permits their testimony.
Now let’s get to it.
Dr. J.P. (Peter) French, PhD: “It’s axiomatic in the community that you can’t compare screaming with speech.”
After being established as an expert witness, Dr. French was asked about the procedures used in his offices when a sample is delivered for analysis. He discussed a lengthy and detailed process involving careful record-keeping, preservation of original samples, and so forth, all signs consistent with a professionally-run organization with established operating procedures.
In assessing a sample for further forensic analysis he said they look at three general parameters. (1) Will the sample be adequate in terms of quality, considering factors such as background noise, contamination, bandwidth signal; (2) The duration of the recording to see if it provides an adequate representation of speech sound to allow reliable comparison; and (3) looking at how distinctive or unusual the voice patterns appear to be on initial examination.
Interestingly, Dr. French said he would not use any minimal recording length or minimal number of words to decide whether an audio recording was adequate for analysis. Hypothetically, he said, a very short recording of someone with, say, a heavy foreign accent could be used to exclude a suspect without such an accent. It really depends on how distinctive or unusual the voice patterns were to be compared.
“It’s pretty much axiomatic in the community that you can’t compare screaming with speech.”
When asked whether a sample recording of someone shouting or screaming would create particular challenges to forensic audio analysis, Dr. French was adamant that they would. In fact, he said it would be impossible to compare a normal speaking voice to a scream made in distress, or vice versa.
French: “When people shout, we have to realize that shouting is not simply speaking made louder, people have got completely different vocal settings when we shout than when we speak. It is very difficult, if someone is genuinely shouting rather than just speaking with a raised voice, to do a comparison.”
West: “Is it generally accepted by you, your lab, and more broadly in the scientific community that shouting or screaming samples are not suitable for voice comparison.”
French: “Yes. I have never come across a case in the 30 years of my career where anybody has attempted to compare screaming, for instance, with normal voice. . . . It’s pretty much just axiomatic in the community that you can’t compare screaming with speech.”
West: “In your lab, if someone gave you an unknown voice where someone was shouting or screaming or shouting or crying out in some way, would you find that extremely difficult if not impossible to analyze?”
A person’s normal voice tells you nothing about their scream, and vice versa
West: “Is there some research that you are aware of that has specifically been done on people who are shouting or crying out in obvious distress, as opposed to shouting for attention, the life-threatening situation or the death cry that has sometimes been described.”
French: “Yes. I have a PhD student who has just completed her PhD specifically on that topic. She took distress cries from people who are in extreme distress, or who are in fact in extremis, about to die, or have been fatally wounded, or are in fear of their lives, from real forensic recordings that have been submitted to my lab over the years, and she analyzed the properties of those and compared them where it was possible if there was a sample of speech from the same person, at normal voice levels, she looked at the differences between the death cries and the normal speech of the person.
West: “Was the research at all encouraging that this could be done or could even move towards being done.”
French: “While she didn’t look specifically at whether you could compare the two for speaker identification purposes, but the results of that study, if one was to extrapolate from them, would be clearly towards the fact that you can’t, because the way that people react to those situations is very unpredictable. You can’t say from someone’s normal voice what they are going to sound like under severe attack. And similarly if you have a recording of someone under severe attack, you can’t move back from that to their normal speaking voice.”
A short time later Dr. French explained in great detail exactly why such a comparison is impossible—if you’re not interested feel free to skip the money-quote at the end of the next block of text. (It’s in bold.).
French: “If I could loop back a little bit and tell you a bit about how we routinely carry out the analysis in the laboratory, the specific parameters that we analyze them on, I could then tell you which ones you can still examine if someone is yelling and which ones you can’t.”
West: “Yes, thank you.”
French: The method that we carry out and which is very prevalent, in fact we I had a PhD student carry out a survey of voice comparison practices internationally, she surveyed 40 analysts form different countries, 75% use them the combined auditory-phonetic and the acoustic-phonetic methods.
“If I can explain what that means, the auditory-phonetic involves analytic listening. In other words, you are listening to the speech sample and analyzing them by ear in terms of a variety of different parameters. This is not like ordinary listening, like you’re doing to me at the moment and I’m doing to you where we’re listening to the content of the speech, what is being said, instead we listen for how it is being said. Within the auditory-phonetic test we look at how the individual consonant sounds and the vowel sounds are pronounced and we’d be comparing that across the samples. So, for instance, we might be looking at how the “t” ‘s consonants are pronounced, the “l” ’s, the “m” ‘s, or whatever are pronounced. IN order to do this we have a much sharper tool than the Roman alphabet that we all use to read and write with, it wouldn’t really do the job, so we use the international phonetic alphabet which is an extended system of symbols and notations that help you capture the fine grained nuances, the details, of pronunciation and we compare those across the samples. We would also be listening to the prosody of speech, in other words the speech rhythms, where the major stresses fall, we would be looking at the rate of speaking, the rate of articulation in syllables per second, we would be looking at the intonation of the speech, the rise hand fall, which is the rise and fall of the pitch of the voice across utterances, the melody of the speech. We would be attending to the voice quality, the timbre, in other words what sort of voice is it, a creaky voice, breathy voice, a harsh voice, a nasal voice, a de-nasal voice, is there evidence of tension in the larynx when the person speaks [demonstrates], is the larynx being raised or lowered, we would also be looking at the tongue/body orientation, if it’s forward or back. To carry all this out we would use a scheme, a University of Edinburgh-developed scheme, called the VPA, the Vocal Profile Analysis, which as 38 different settings. So we would score these values [on all the parameters just described]. So collectively [these all] make up the auditory-phonetic tests, the analytic listening tests that we do.
“In addition to those there is a second group of tests known as acoustic tests or instrumental tests, nowadays these are computer-based tests using specialized software to measure things like the average voice pitch, which is measured as a fundamental frequency, the rate at which the vocal cords are vibrating, that’s estimated by the computer program and averages over a series of utterances and compared across the samples. We also look at the acoustic resonances within vowel sounds, looking at the frequency with which they occur, the so-called phonemes, and we would be creating overlay graphs to overlay the values obtained from the question recording, the criminal recording, to the recording of the suspect, we would be looking at where the main energy loci were, in other words where the main focus was with consonant sounds like p, t, k , in addition to these sorts of things would be the acoustic tests, measuring physical parameters of the speech signal by the computer program. In addition to that we would also be taking into account individual habits, such as lip smacking, that sort of thing, and patterns of disordered breathing between utterances, between looking at things such as hesitation markers, the um’s and uh’s of speech, those sort of sounds that people make when they hesitate, we’d be analyzing those by ear and by computer. We’d be looking at how people simplify speech by missing out or deriding sounds when they are speaking and whether the patterns of elimination were the same in the two samples. We’d also take into account broader linguistic factors, patterns of terms taken from conversation, whether people used filler sounds such as “ain’t it,” at the end of sentences, like “like”, or “sort of” or that sort of thing. And really we’d be looking at a whole range of different parameters of the speech, and the speech signal, and also taking into account this broader linguistic information.”
Having describied that incredibly lengthy detailed list of factors taken into account in matching speaking samples, Dr. French delivered the death blow to the Prosecution’s claims about the screaming on the Witness #11 911 tape:
French: “With screaming virtually none of that [the comparative elements] is available to compare with the normal voice. You can derive it from the normal voice of the suspect, but it’s the alleged criminal sample of screaming, the sorts of features that we focus on are just not there for comparison.” (emphasis added)
Even if the screaming had lasted for half an hour you could not come up with a result
The blows to the prosecution’s expert witnesses didn’t stop there, even if they were delivered with laughter. The defense began to focus on the specific screaming in this case, as found on the Witness #11 911 tape, exploring Dr. French’s earlier comment that he did not use any particular minimum duration of a recording
West: “When you talked earlier about there being no speicif clength of speech that’s necessary, does there have to be enough speech that carries some of these markers that allows you to find these individual, specific identifiers.”
West: “So while it may not take two or three minutes, there has to be enough speech of whatever length for you to perform this analysis.”
French: “Yes, and critically it has to be speech and not screaming.”
West: “Well how much screaming would it take in order for you to do this?”
French: “Well . . . [laughs out loud] . . . it depends on how extreme the screaming was. Can I move on to the specifics of this case? If the screaming in this case had been for several minutes—net screaming—maybe even half an hour, I don’t think you could come up with a result. [Laughs out loud.] It’s not something you can specify in terms of length, because if it went on and on and on in exactly the same way as the few seconds we have in the recording in this case it wouldn’t be in the least helpful, it wouldn’t allow us to move to a conclusion, no matter how much of it there was. “(emphasis added)
Biometric systems are not designed to match a normal voice to a scream
West: “If you were to attempt to do a biometric analysis of a voice sample such as this, what would the methodology or approach be to this, this type of analogy.”
French: “Firstly let me say you just wouldn’t do it. [Laughs out loud.] Biometric systems just aren’t designed to do this. They are designed to compare reasonably good quality samples of people speaking in modal voice, I mean normal voice, they are not designed to do this at all, and [laughs out loud] . . .
Owens’ duplication/raising pitch of sample would not be accepted within the scientific community
French then went on to talk specifically about the “Owen method” of audio sample preparation:
French: “The methodology is something separate from [the technology], the methodology is a set of methods somebody follows in a particular instance, and once you say the technology employee in this case by Mr. Owens is accepted, the general sort of technology, I can’t speak about a particular system, leaving aside the issue of EasyVoice, but the methodology utilized would not be accepted within the wider scientific community because of lack of testing of the system, and also various other things that had happened. For instance the duplication of the material. Initially the sample was rejected on the basis of it being too short, so what the analyst had done was actually to repeat it, to duplicate it, to loop it, by repeating it, in other words to fool the system into believing that the sample being put in was longer than it was. I mean, that would not be an accepted methodology in the scientific community. And the artificial raising of pitch in the reconstruction exemplar recording until it was like the pitch found in the actual screams from the 911 call, I mean that would be a totally novel methodology to my knowledge, as well, that wouldn’t be accepted within the scientific community.” (emphasis added)
Without knowing beforehand, you couldn’t tell from scream if even speaking English, nor even if male or female
The defense then turned Dr. French’s attention to the analysis and testimony of Dr. Reich:
West: “We were speaking about Dr. Reich’s work, you seem to have some reservation or even some confusion about what it was he actually did. Are you able to speak any more as a speech scientist and someone in both the academic and forensic scientific community about his analysis.”
French: “Yes, there are a number of things that I find disturbing. [Lengthy detailed explanation of his concerns, with more laughing out loud.] “
West: “Are you saying that there is no accepted literature, no accepted method in the scientific community to identify a speaker’s age by the frequency that Dr. Reich has described, under these circumstances?”
French: “There are no accepted methodologies to identify a speaker’s age from the constituent elements, from the phonemes of their screams. No.”
West: “Do you think that a 29-year-old in a life-threatening situation would vocally be able to make that sound.
French: “I think a 50-year-old would.”
West: “So that’s not age related at all, in your opinion.”
. . .
French: “If you were simply presented with the screams in this case, with no background information, if it were simply edited out of the recording end-to-end and given to an analyst, I don’t think you could even be sure that the person was speaking in English. I’m think you could even be sure that the person was male or female.” (emphasis added)
. . .
West: “Dr. Reich is claiming to be able to understand speech attributed to both Trayvon Martin and Mr. Zimmerman in the 911 call, and also to Mr. Zimmerman in his non-emergency call to the police. As I understand your evidence you don’t find any of that speech in those recordings.”
French: “No, I don’t. No . . . from the Zimmerman 911 call . . . claimed “do you think I’m crazy here,” then later on the words, “these assholes,” “dear God” is attributed to the speaker, and then after those words, “get off of me.” I can’t hear any of those words.
West: “When you say you can’t hear it, did you try through you and your colleagues and your laboratory?”
French: “Indeed, yeah. The problem is you can amplify them, but they don’t become more clearly speech, they just become louder, that’s what amplification is. It doesn’t help you decide what is speech and what is not speech, generally speaking. And in my opinion those are not speech.”
A rather awkward discussion was initiated by the prosecution when they chose to focus in cross-examination on Dr. French’s opinion that speech analysis was not a valid means to determine a speaker’s age.
Mantei: “Did I understand you to say that you really can’t draw any correlation between the age of a person and the overall pitch of their voice?”
French: “When they are screaming.”
Mantei: “OK. In general terms, would you say that a younger person who had not either complete puberty or finished developing will have a higher pitched voice than they will after they finished.”
West felt obliged to address the puberty issue on re-direct. He didn’t get far, however, for obvious reasons:
French: “I’m just becoming a little concerned, Mr. West, that I might be straying outside of my area of expertise, and giving information that the average judge or juror, trier of fact, would themselves be able to answer just as well as me.”
It appears, however, that Mantei was preparing foundation for an argument that perhaps the 17-year-old Trayvon Martin had not yet completed puberty at the time he was killed.
Which seems . . . odd.
Dr. George Doddington, PhD: “It’s ridiculous.” “Similarly ridiculous.” “God, this is absurd.”
Dr. Doddington’s testimony was, if anything, even more derisive of the State’s experts than was that of Dr. French. Although he refrained from saying so explicitly, a critical reading between the lines of his testimony, and the consideration of some very long pauses taken at judicial moments, suggests that in truth he believes that the forensic sciences in general, and forensic speech analysis in particular, barely qualify as “science” at all.
Speaker identification on 1 second of even good speech is absurd
The interesting parts of this testimony arose when the defense, in the person of Counselor O’Mara, got around to asking Dr. Doddington about his opinion of the work of the defense’s expert witnesses. (This stuff is so good in it’s pure form I don’t see how I can add much value, but I’ve gone ahead and bolded the sweet spots.)
O’Mara: “You’ve had an opportunity to review Mr. Owens report?”
O’Mara: “And the other reporst that were generated as well in this case, Dr. Reich as well?”
O’Mara: “Primarily [with regard] to Mr. Owen, or his system, presently, basd upon your work for NIST and some of your experiments you helped create, are there any other variables we have in this fact secnerio we have in this case that cause you concern about the accuracy or reliability of [lost signal, presumably “of Mr. Owen’s findings.”]”
Doddington: “Well, I reviewed this, Mr. Owens said he could do speaker recognition with just one second of speech, and then he revised that and said he could do it on 1/8th of a second. Doing speaker recognition to any level of reliability whatsoever on 1 second of good speech is absurd.” (emphasis added)
. . .
Doddington: “I think using one second is ridiculous.”
O’Mara: “How about using 2.54 seconds.”
Doddington: “Similarly ridiculous.” (emphasis added)
Screaming “destroys” the value of an audio sample for purposes of speech comparison
O’Mara: “Are there additional variables that would affect the quality of the sampling. You are familiar with the particulars of this case, correct?”
O’Mara: “In this case there was a certain type of speech, screaming or shouting speech. You are aware of that.”
Doddington: “Yes. [Laughs out loud.] I’m sorry it’s all ridiculous.” (emphasis added)
O’Mara: “We’ll get to that. When you had screaming or the type of speech that we have here, what does that do to the ability to assess it or evaluate it.” (emphasis added)
Doddington: “It destroys it.” (emphasis added)
Comparing normal speech to screams is a “fool’s mission”
O’Mara: “Has NIST done any evaluations or any studies in an attempt to see the efficieincy or eligibility of speaker recognition with screams.”
Doddington: “No, that’s a fool’s mission.” (emphasis added)
Replicating the same audio sample to meet biometric minimal requirements is “ridiculous”
O’Mara: You were here for Mr. Owens’ testimony, correct?”
Doddington: “Yes, I was.”
O’Mara: “And you heard that one of the things he accomplished to make his machine work was to double the length of the sample. Were you here for that testimony?” (emphasis added)
Doddington: “Yes, unfortunately I was.” (emphasis added)
O’Mara: “Do you have a concern with doing that from an evaluative perspective as to what that might do to the underlying testing.”
Doddington: “It’s pretty obvious.”
“O’Mara: “If you would . . .”
Doddington: [Firmly.] “Doing that is ridiculous.” (emphasis added)
O’Mara: “Why is that?”
Doddington: “You’re not adding anything [useful]. If you take the same thing and repeat it over, on what basis can you say that would improve the performance? It’s a violation of common sense. Can I take one second of speech data and play it over and over for 10 second and get the same results as I would from a 10 second sample? No.”
Changing pitch or other variable in the exemplar all but guarantees a non-match
O’Mara: “When you take speech and you take the pitch of speech and raise it up, three or four or five multiples, to make it smilar to the comparison speech, is that problematic in addressing the speech comparison.”
Doddington: “I think you asked that question of Dr. French also, and I was surprised at how mild he wsas in his response. It’s ridiculous.“
O’Mara: “Why so?
Doddington: “Because the pitch frequency is a completely separate mechanism to a first or second of magnitude from the vocal tract frequencies, which are defined by the resonances in the mouth, not by how fast the vocal cords are flapping. So, for example, if you basically scale up the frequency you are purposely distorting the phoneme frequencies, so they are not going to match at all . . . there’s just no proper basis for comparing the phoneme frequcnies when you distort the frequency scale.”
O’Mara: “So if you were to do that in Mr. Owens’ box, or in any evaluative process, if you were to increase pitch or change a variable, and then look for a match, what would you expect to get.” (emphasis added)
Doddington: “Not expect to get a match.” (emphasis added)
O’Mara: “Which is of course what Mr. Owens’ machine came up with. A non-match [between the scream and Mr. Zimmerman’s exemplar.] (emphasis added)
Doddington: “Yes.”(emphasis added)
Words that only Dr. Reich was able to hear on tape is the “imaginary stuff”
O’Mara: Were you here for Mr. Reich’s testimony as well.”
Doddington: “Yes, I was.”
O’Mara: “Have you had any opportunity to listen to the tapes that are subject to this proceeding?”
Doddington: “Yes I listened to the call to the dispatcher, the non-emergency call, and the 911 call.”
O’Mara: “When you accomplished similar analysis by amplifying it, doing what you did to filter it, were you able to hear what Dr. Reich heard?” (emphasis added)
Doddington: [Laughs out loud.] “No!” (emphasis added)
O’Mara: “Anywhere, we’re talking now about the first call, the non-emergency call.”
O’Mara: “Do you know about what Dr. French testified, I think Dr. Nakasone also, this listener bias concept?”
Doddington: “Yes, I’m familiar with it.”
O’Mara: “What is that?”
Doddington: “That is the ability of a listerner to hear what he wants to hear or what he’s pre-conditioned to hear.”
O’Mara: “And how significant an affect is that on an individual’s ability to perceive an event.” (emphasis added)
Doddington: “Apparently for Mr. Reich it is very effective.” (emphasis added)
O’Mara: “Similar question with your review of the 911 call. Did you have an opportunity to listen to t hat?”
O’Mara: “Did you hear anything on there that Dr. Reich thinks that he heard. Not the yells or screams . . .” (emphasis added)
Doddington: “Just the imaginary stuff, right?” (emphasis added)
O’Mara: “We’ll leave that up to the court to determine. Let’s call it what Dr. Reich heard and I’ll ask you if you heard any of the additional language that he says he heard.”
Doddington: [Laughs out loud.] “God, this is absurd.” (emphasis added)
O’Mara: “Is that a no?”
Doddington: “That’s a no.”
Although Reich says 16-bit audio essential to finding it was Trayvon screaming–he actually had only 8-bit audio
Then a final blockbuster:
Doddington: “I have to say, one of the things that Mr. Reich put in his report is that it is critically important for this data be 16-bit data. . . . [Note: Dr. Reich had essentially written and testified that having the 16-bit data was essential to the integrity of his findings.] It looked a bit odd to me. . . . I thought this doesn’t look like 16 bits. . . . I discovered that in fact the data in both of those recordings is actually 8-bit data. The top 8-bits of the recordings are all zero. And the only data in the recording is the lower 8 bits. It is in fact 8 bit data. (emphasis added)
O’Mara: “So it is not a 16-bit piece of sound.”
Doddington: “It’s a 16-bit format, but it’s actually 8-bit data.” (emphasis added)
Judge Nelson could decide to allow Owen and Reich to testify regardless of the weakness of their expertise and findings. Given the testimony developed during this Frye hearing, however, I imagine the defense is salivating at that prospect.
CRAZY IDEA OF THE DAY: THE WORSE THE STATE EXPERTS PERFORMED, THE MORE LIKELY JUDGE PERMITS THEIR TESTIMONY
A cynical observer might suspect that the relatively poor performance of the State’s witnesses, Mr. Owens and Dr. Reich, actually makes it more, rather than less, likely that Judge Nelson would permit their testimony at trial.
Surely the State dare not present either Mr. Owens or Dr. Reich to the jury, and subject them at trial to what would unquestionably be devastating cross-examination.
If that is the case, Judge Nelson can permit their testimony in the confident knowledge that they would never actually appear, and thereby provide no basis for the defense to argue that they have been harmed by her decision.
At the same time the State obviously has no complaint if it has gotten what it wanted–permission for their witnesses to appear. The fact that the permission may be of no help, from a practical perspective, is not Judge Nelson’s problem.
Finally, the outside forces that have been driving much of the public narrative of the Zimmerman case might make much noise out of Judge Nelson denying the State’s experts permission to testify, but they can make little hay out of her doing exactly what both the State and they claim they want: granting permission for the State’s witnesses to testify at trial.
In short, granting permission for the State’s witnesses to testify, despite the train-wreck of their Frye appearances, could constitute a win-win-win for Judge Nelson.
Boy, it would be tough to go through life with so cynical a perspective.
Next step: Jury selection on Monday.
Andrew F. Branca is a MA lawyer with a long-standing interest in the law of self defense. He authored the seminal book “The Law of Self Defense” (second edition shipping June 22–save 30% and pre-order TODAY!), and manages the Law of Self Defense web site and blog. Many thanks to the Professor for the invitation to guest-blog on the Zimmerman trial here on Legal Insurrection!DONATE
Donations tax deductible
to the full extent allowed by law.