Quantitative Integrity

The Ghost in the Data

Why I distrust every plot without error bars and the dangerous fiction of “clean” results.

The phone rang at exactly 5:09 AM. In the pre-dawn grayness of a Tuesday that felt like a Monday, that sound is a jagged blade. I fumbled for the device, my thumb sliding across the glass with the clumsiness of the half-awake, expecting a crisis.

Instead, a voice-raspy, energetic, and entirely too loud for the hour-asked if “Bernice” was ready for the garage sale. I told the caller there was no Bernice here, only a very tired man and a pile of unread manuscripts. He apologized, but the damage was done. I was awake. And when I’m awake and caffeinated by 5:29 AM, I start looking for reasons to be annoyed with the world of quantitative data.

The Art of Smoothing

I spent 19 years teaching financial literacy, and if there is one thing I have learned, it is that people use numbers to hide the truth at least as often as they use them to reveal it. In finance, we call it “smoothing.” In materials science, apparently, we call it “leaving out the error bars.”

I was staring at a PDF on my monitor, a paper published in a fairly reputable journal that I’d been meaning to get through for 9 days. The authors were claiming a 14.9% increase in surface hardness for a specific grade of stainless steel, achieved through a new, “optimized” heat treatment protocol.

Standard

Optimized

Fig 1: The seductive lie of the “clean” comparison. The optimized result lacks the whiskers of reality.

The chart was beautiful. It was clean. It featured two bars: “Standard” and “Optimized.” The “Optimized” bar was visibly taller. It looked like a victory. But as I squinted at the screen, rubbing the sleep from my eyes, I realized the chart was missing something fundamental. There were no whiskers. No confidence intervals. No standard deviation. Just two solid blocks of color standing in a vacuum.

This is where the distrust begins. It doesn’t tell you what happened; it tells you what the author wants you to believe happened. In my world, if an investment fund tells you they had a 19% return without showing you the volatility, you run for the hills. In metallurgy, if a paper claims a 14.9% hardness jump without showing the scatter, you should probably stay in the valley.

I know a guy, a metallurgist named Elias who has been in the industry for 29 years. He’s the kind of guy who carries a magnifying glass in his pocket and smells like machine oil and old library books. He recently sent an email to the lead author of a similar paper. Elias wanted to know the raw data.

He wanted to see how many indentations were made, what the surface preparation looked like, and most importantly, the variance between the 49 points they supposedly measured.

“The underlying scatter data are proprietary due to an ongoing patent application.”

– Lead Author’s Response

Elias, being the stubborn soul he is, emailed the journal editor. He pointed out that claiming a 14.9% increase is meaningless if the measurement uncertainty is 19%. The difference they were celebrating could very well be noise-a literal ghost in the machine.

The editor’s response was even more depressing: “While we encourage the inclusion of error analysis, it is unfortunately common in this sub-field to omit them if the trend is deemed significant by the reviewers.”

The “Deemed Significant” Trap

There it is. We have a peer review system that acts as a filter, but it’s a filter that catches typos and formatting errors while letting massive statistical sinkholes slide right through. The reviewers were looking at the language, the citations, and the perceived “impact.” They weren’t looking at the math. Or rather, they were looking at the math and seeing what they expected to see.

I’ve made mistakes like this myself. About 9 years ago, I put out a spreadsheet for a client that projected a 9% growth rate based on a single year of “clean” data. I ignored the 19% swing in the previous quarter because I thought it was an outlier.

The Outlier Was The Truth

9% projected growth vs. 19% actual volatility.

I was wrong. The outlier was the truth, and my clean data was the lie. I had to apologize to a room full of 29 angry board members. It was the most embarrassing 49 minutes of my career, and I haven’t forgotten the sting of it.

Messy Science vs. Clean Marketing

The problem with indentation work specifically is that it is inherently messy. You are pushing a diamond tip into a material and measuring how it resists. But materials aren’t perfectly uniform. They have grains, inclusions, and residual stresses.

A single

indentation tester

can give you a different reading if you move it just 49 microns to the left. If you don’t report that variance, you aren’t doing science; you’re doing marketing.

When you look at a plot with no error bars, you are looking at a hypothesis presented as a fact. The author is essentially saying, “Trust me, the average is the reality.” But in engineering, the average is rarely what kills you. It’s the tail of the distribution.

I think about the folks over at Zhanghua Pharmaceutical Equipment. They build massive, complex machines-agitated nutsche filter dryers and the like. These are machines where “close enough” isn’t a phrase that gets used.

$999,000

Batch Value at Stake

In high-stakes pharmaceutical processing, material integrity isn’t an academic preference-it’s a multi-million dollar necessity.

If a pharmaceutical company is processing a batch of life-saving drugs worth $999,000, they need to know that the materials of construction are exactly what they claim to be. They don’t want a “clean” chart. They want the messy, honest truth of the material’s limits.

Laziness and the Pressure to Publish

Why has this become so common? I suspect it’s a mix of laziness and the “publish or perish” pressure. It takes time to do 29 indentations instead of 9. It takes even more time to calculate the 99.9% confidence interval and explain why the data looks like a shotgun blast instead of a straight line.

And let’s be honest: a chart with big, overlapping error bars doesn’t look as “significant.” It looks uncertain. And journals, like investors, hate uncertainty.

I remember a 5am call I got years ago-not a wrong number, but a panic call from a student. He was worried because his portfolio had dipped by 1.9% in a day. He thought the world was ending.

I had to explain to him that a 1.9% dip was well within the daily “error bar” of the market. It was noise. If he’d had a chart that showed the historical variance, he wouldn’t have been awake and sweating.

The Young’s Modulus Silence

I’m currently looking at a table in this paper-Table 3, to be exact. It lists the Young’s Modulus of three different samples. 199 GPa, 209 GPa, and 219 GPa. There isn’t a single plus-minus sign in the entire column.

Sample A

199 GPa

Sample B

209 GPa

Sample C

219 GPa

Are those values the average of three tests? Or 29? Was the temperature in the lab 19 degrees Celsius or 29? Did they account for the tip rounding of the indenter, which can drift by 49 nanometers over the course of a long study?

When I see this lack of rigor, I don’t just distrust the data; I distrust the author’s entire worldview. If you are willing to skip the most basic requirement of quantitative reporting-telling us how much you might be wrong-then I have to assume you don’t actually know.

Consequences of the Shaved Whisker

This isn’t just an academic gripe. It has real-world consequences. Imagine an engineer designing a high-pressure vessel based on that 14.9% “improvement.” They trim the safety factor. They save 9% on material costs.

They look like a hero to the accounting department. But because the original paper didn’t show the error bars, the engineer didn’t realize that the “improvement” was actually just a lucky sample. Under real-world stress, the material behaves like the “Standard” grade, or worse.

The vessel fails. People get hurt. The cost of that missing error bar suddenly becomes much higher than the price of the journal subscription.

We need a culture shift. We need to stop rewarding “clean” results and start rewarding “honest” ones. I want to see papers where the error bars are so large they overlap. I want to see researchers admit that their data is inconclusive.

That is where real progress happens. When you realize the difference isn’t significant, you stop wasting time on a dead end and start looking for the real variable that matters.

It’s now 6:29 AM. The sun is starting to peek over the horizon, and the birds are making a racket. I’m still annoyed about the 5am call, and I’m still annoyed about this paper.

I think I’ll write a letter to the editor. Not a mean one, just a pointed one. I’ll ask for the error bars. I’ll ask for the standard deviation. And I’ll probably mention that I’ve been reading their journal for 19 years and I expect better.

In my financial literacy classes, I always told my students: “If someone shows you a line that only goes up, they are either lying to you or they are lying to themselves.” The same applies to materials science. If the bars are perfectly level and the trends are perfectly smooth, someone is hiding the mess.

I’ll keep my distrust. It’s served me well for 59 years. I’ll keep looking for the whiskers, the scatter, and the noise. Because in the end, the only thing you can really trust is the acknowledgement of uncertainty.

The next time you’re reading a technical report or a peer-reviewed paper, and you see a beautiful, clean plot with no error bars, do me a favor. Close the tab. Delete the PDF. Go get a cup of coffee.

You’d be better off talking to a guy named Elias or even a wrong-number caller named Bernice. At least they aren’t pretending that the world is a series of perfect, unvarying integers. We deserve the full picture, even if it’s ugly. Especially if it’s ugly.

Current Bank Balance

$1,499.09

“I hope the balance ends in a 9. It usually does when I’m the one doing the math.”

I realize now that I’ve spent the last 49 minutes ranting about a single paper. But that’s the thing about numbers. They matter. They are the bedrock of everything we build, from the smallest medical implant to the largest industrial filter dryer.

If the bedrock is made of ghosts, the whole structure is a haunting waiting to happen. I’m going to go make another pot of coffee. It’s going to be a long day. 29 more papers to go, and I’m betting at least 19 of them are missing their whiskers.

Wish me luck. I’m going to need it, or at least a very high confidence interval. Actually, I don’t need luck. I just need people to stop lying with their charts. Is that too much to ask at 6:59 AM? Probably. But I’ve never been one for low expectations.

One last thing: if you are a researcher reading this, and you are currently hovering over the “save” button on a plot with no error bars-stop. Do the math. Show the scatter. Be the person who values the truth more than the “significance.”

I think I’ll call that wrong number back and tell them Bernice moved. To a lab. Somewhere with very, very small error bars. It feels like the right thing to do. 99.9% sure of it.