Having covered speaker recognition and voice ID and intelligibility and stealth recording in my past forensics audio posts, I’m going to close out the series with the topic of authentication. Authentication in regards to the field of forensics audio entails making sure that the recordings used as evidence is authentic and hasn’t been tampered with, as well as determining whether something is a duplicate or a modified version of the original. This is super important because if the integrity and originality of a piece of audio evidence is in question, then the jury may question the reliablility or trustworthiness of the evidence presented.
Tape Authentication and the “Watergate Scandal”
We live in a day and age where almost everything exists digitally. However, there was a time when tapes were the dominant form of audio media. Tape authentication is a long and hard process that involves looking at the magnetic tape under a microscope using a magnetic developing solution to figure out if there are any discrepancies based on the characteristics of the recorder used. Tape authentication was actually what brought forensics audio to fame in 1973 in the President Nixon’s “Watergate Scandal” case.
On June 17, 1972, five men were arrested for planting audio surveillance bugs in the Democratic National Committee offices at the Watergate Hotel in Washington DC. Following a tip from an informant (later revealed as the FBI), it was discovered that these men were directly associated with President Nixon. President Nixon secretly taped his conversations as well as his phone calls in the Oval Office, which contained information regarding this incident. The prosecution subpoenaed these tape recordings. Nixon refused to give them the recordings, citing executive privileges but provided audio transcripts.
It was brought to attention that there was an 18.5 minute gap in between Nixon and his Chief of Staff, H.R. Haldeman’s recorded conversation on June 20, 1972. The president’s loyal secretary claimed responsibility for mistakenly erasing the gap. The US District Court assigned 6 technical audio experts to evaluate how the erasure occurred as well as if it was possible to recover any of the lost audio. Upon inspection, the panel found that the speech recording was authentic, using a Sony 800B recorder as reported by the Executive Office, and “the tape showed no signs of splicing, tampering, or copying.” The buzz sound on the recording appeared to be the result of re‑recording a type of noise known as an electric network frequency interference to cover up the original audio. The recorder used was a Uher 5000, as seen below. They further deduced that this buzz is not a malfunction of the recorder and required someone to manually press keys on the reorder in order to create the buzz and that the erasure was so strong that it made “recovery of the original conversation virtually impossible.”
A detailed look into their findings (87 pages total) can be found here.
Though this was not conclusive proof of President Nixon’s wrongdoing, it casted doubt in the public’s mind on his lack of knowledge of the coverup and also caused him to lose a lot of popular support. Later in August 1974, more tapes of Nixon’s recorded conversations surfaced. One of them named “Smoking Gun” – listen here for the original recording and transcript, proved to show that President Nixon was indeed aware of the Democratic “burglaries” and approved of having the CIA halt the FBI’s investigation of the Watergate case claiming that it was a national security operation. Soon thereafter, Nixon resigned on August 9, 1974 (later to receive a pardon from President Ford for his wrongdoing) and Haldeman was found guilty on counts of perjury, conspiracy, and obstruction of justice and imprisoned for 18 months.
Electric Network Frequency
Notice how “Electronic Network Frequency” was mentioned earlier when we talked about the re-recording of noise in the Watergate scandal. This is actually a very important concept that is key to authentication. When a recorder records something, it also records an Electric Network Frequency, or ENF, which is a subtle power hum, sometimes referred to as a “trace.” The ENF has been pretty consistent over time because the power companies keep very close tabs on it. In the UK, the frequency fluctuates slightly around 50Hz and in the US, it centers around 60Hz. So for instance, at 3:34pm, the ENF might be 49.6Hz and at 3:35pm, the ENF might be 50.4Hz. If you create a detailed database with the minute-by-minute ENF fluctuations over a period of time at a certain location, you can look at the ENF trace in a recording and identify the exact time and date that the recording was made. This helps with authentication because a tampering of the recording would have an inconsistent ENF trace and signify that the recording has been modified.
Below, Detective Phil Manchester talks about how to isolate the ENF from a recording:
The process of extracting the ENF signature from a recording involves band‑pass filtering between the range 49‑51 Hz, without any resampling of the material, to separate the ENF waveform from the original recording. The results may then be plotted and analysed against the database to prove or disprove the recording’s integrity and qualify when the actual recording took place, thus providing evidential and scientific authentication of the material in question.
Note though that the ENF data would be needed in order to make this database happen. While European power companies have provided ENF data and created databases, American power companies have yet to do this.
Digital Authentication – Noise Floor and Frequency Analysis
In our present day, digital authentication is the most common form of authentication. What is the process like for digital authentication? One thing forensics audio experts look for when listening to audio recordings is the noise floor to see if there are any inconsistencies in what would otherwise be a consistent background (i.e. is there a break in the hum, AC, wind, or any background ambience soundscapes?). Another thing they listen for is inconsistency in the pacing of words or speech, which might suggest editing of the words. A more detailed and objective way of evaluating this is through spectrograms and frequency charts.
Generating a spectrogram to show the frequency of people’s voices, as well as the noise floor, and the overall recording is helpful in determining if edits happen. If there is an edit to someone’s speech, it can cause a jump in the frequency shift of the person’s voice. For noise floors, sometimes you will see two noise floors at the same time or as in the case above, the absence of or a different noise floor (notice the black gap in between, signifying the noise floor discrepancy).
Visually inspecting the audio file’s waveform closely and zooming in and checking for phasing, can also often indicate if an edit has been made.
What software is good for audio authentication? SpeechPro SIS II, a common tool experts use in the field as mentioned in my speech recognition post, has a unique and super cool plugin called EdiTracker. It uses 6 different authentication methods that directly relates to much of what we have discussed. It calculates the recorder’s parameters and compares it to characteristics of a signal allegedly recorded with the same device to see if they match.The parameters that are used to evaluate the recorder include frequency response, total harmonic distortions, detonation, amplitude modulation, and speed.
It also searches for signs of previous digital processing by looking for traces of anti-aliasing filters; it evaluates the harmonic’s phase shift of the recording by scanning the audio for narrow band filters either from ENF or other electrical appliances that may be in the recording; it also does background noise scanning for changes in the noise floor, as well as auditory-linguistic analysis. EdiTracker requires a minimum signal duration of 10 seconds. In short, EdiTracker adds confidence and accuracy to a forensics audio expert’s analysis and results.
Chain of Custody
As part of a forensics audio expert’s due diligence, it’s important to acquire a full log of the chain of custody for the audio evidence you are evaluating. Who first had possession of the evidence and what equipment did they use to create it? Was this ownership transferred to someone else and at what given time and point? Having a timeline established from the chain of custody will provide points of reference if there is any missing information in the metadata of the audio file. Forensics audio expert Ed Primeau states:
Without a complete chain of custody, it can become very easy for the opposing attorney or prosecutor to challenge or dismiss the evidence presented. Having a complete chain of custody form, as well as any other accompanying forms and including any visual proof of retrieval, such as pictures or video, greatly helps prove the authenticity and admissibility of the evidence in the courtroom.
This concludes my dive into the exciting topic of Forensics Audio. Thank you for going down this rabbit hole with me. 🙂 I hope that it was an enjoyable and educational read that sheds some perspective on how audio can be applicable in our every day world and make a direct impact in changing the lives of others!