Turning Correlation into Causation – How Deeper Knowledge and Insight is Generated

My chief writing partner – Mac

Correlation implying causation — as we’ve heard it 1000 times, don’t believe it. And it’s true — DON’T believe it. Well, at first glance. It’s so easy to come up with funny examples — all you really have to do is match one upward (or downward) trend with another, and if the rate of change/slope/timescale for the change is the same, ta-da! Instant high correlation! Buzzfeed walks through some funny ones in that link above, like the increase in global average temperature being indexed to an increasing pirate shortage.

You can lay these examples out at your next round of drinking games and speculate exactly WHY decreasing numbers of pirates might be behind Anthropogenic Global Warming (not enough shipping sunk?) but hopefully, you’ll maintain some healthy level of skepticism and scrutiny.

Before we sink into the deeper knowledge AND empathy structure analysis here is the most basic rule-of-thumb behind deciding if correlation actually IMPLIES causation — identifying a physical mechanism or dynamic that involves both topics. You’ve got to at least get to Legalistic/Algorithmic value set to have a hope of understanding a real connection. Is one of these really what mathematicians call an independent variable of another?

This is really an offshoot of the fun parlor game I just recommended above. But it does involve synthesizing knowledge from outside the field, and really takes apart the conspiratorial thinking relatively quickly. Does organic food cause autism? Can you link a mechanism in the brain that causes autism to some lack of pesticide consumption? Or can you draw two or three causal links that takes you from your incipient need to eat pesticides to protect your unborn children from autism? If you can’t, well, you have to STFU.

There’s a deeper way to understand correlation vs. causation, though, and it gets us back to the Knowledge Structure stack, and one of the core concepts of this blog — Reliability vs. Validity. Pure correlation consists of taking two different data streams, with attached temporal/spatial scales (or some other independent index — look, gang, don’t ‘gotcha’ an old digital signal processing expert with that shit!) and then, well, correlating them.

If you need the math to feel comfortable, well, start here! Do not be denied!

But that’s the end of the math for us. Let’s get down to business.

Correlation, as expressed above, is itself an algorithm that poops out a number that shows how well two data streams match. Let’s just post the Knowledge Structure understanding that flows from social structure below so you can remind yourself of where all this flows from, empathy- and human interaction-wise.

Basic Social Structure/Knowledge Structure Diagram

Someone walks into your office with two columns of numbers. You have no idea where those two columns came from — they’re just two columns.

So… you accept the Authority of the person trotting into your life with these two columns of numbers that they actually mean something. Susie says “I’ve been doing research on pirates and Anthropogenic global warming (AGW), and there’s some extremely disturbing trends I’ve observed in the data. From my vantage point, we better start recruiting pirates stat!”

You don’t really know Susie very well. She seems nice enough, and she DOES have the official title of Data Collection Master, given to her (ostensibly) after a long process of certification/education. So, you take that AGW and Pirate data and feed it into your Excel spreadsheet Algorithm— you ARE, after all, titled Data Analyst Master — and poop out a near perfect correlation of the two streams. All this “makes sense” to you. After all, Susie has an impressive title. And so do you.

You have no reason to believe that Susie has fudged the data (let’s get rid of the psychopathic distortion angle here.) She’s acting in good faith, and so are you. If she walks in with the same data, and asks you to analyze it, you’ll get the same answer. Both of you know how to create a data set, and analyze it.

What this means is that the analysis is REPEATABLE — as well as RELIABLE. That sounds pretty good. But that’s what both social structures are known for. REPEATABLE and RELIABLE sounds good to scientists. They don’t want to hear that the data can change its mind. This knowledge structure maps well to their social structures, and as such, everything makes sense.

But the problem with the social structures producing the data (as we’ve represented it) is that they are CLOSED systems. You’re inside the organization making/recording the data, or you aren’t. Someone can’t just walk through the door and start handing you Pirate population numbers, or records of AGW temperature. Which means, in our theoretical example, the data is not GROUNDED outside the implied experience of either Susie or yourself. It’s subject to your beliefs (Pirates are a GREAT solution for AGW!) and really not much else. And like as not, both the data streams were also collected INDEPENDENTLY. The Pirate Census organization went out and counted pirates. The AGW recorded ocean temperature equally separately.

What that means is someone can walk through the door and potentially influence you (they might show some pirate atrocity that might cause you to re-think your earlier support of increasing the number of pirates!) and it might gross you out enough to change the result. Or something else — you might see the data and remember your Correlation Organization binds you to a code of honor that says you’ll just push the buttons and give Susie back the magic number. There are many potential scenarios.

But if we want CAUSATION, we’re going to have to walk up the Knowledge Structures, that emerge from Relational Structures that are also valid. Above our heads are four relevant Knowledge Structures, all of which might complicate things, but in the process of doing that complexifying, will increase (or decrease) the VALIDITY of the conclusion.

Causation might be established by a high Performance observer in the field, noting that when a pirate ship sails through a bay, the ocean temperature drops. Such an observer would be more believable if they were trained, say, in pirate identification and census, or in ocean temperature measurement. They would be more believable because, once again, they would be a more RELIABLE observer. The process of pirate observation certification would certainly help — even as it comes from the lower Value Sets/Relational Structures. But it’s the boots on the ground and watching the connected phenomena happen that would lend to better appreciation of causation. That observer would use their own judgment (hence the need for agency, and a functional heuristic) on how to interpret various data streams to position their Pirate Observation Ship (POS!) and their temperature probes to establish a meaningful connection.

And if there were a larger Community of POS-s , they could increase both reliability and validity. Or they could blend a different set of perspectives to lend credence to the correlation.

If there’s a takeaway here, it’s that additive perspectives matter. When you go out and interview people, you have to integrate their personal experience into our larger understanding of how pirates and ocean temperature function. When I think about Nora Bateson’s “Warm Data” construct, these two levels fall into that category. Often we can’t get to a generalized equation relating overall ocean temperature to lack of pirate passage. But we can combine the testimonials of lots of people to get at some aggregate sense of the truth. (“Arrr, we were just hoisting’ the Jolly Roger when the temperature in the ocean dropped 5 degrees!”)

What the next two levels of social structure offer — Global Systemic, and Global Holistic — are Knowledge Structure constructions that are now far more overarching than Warm Data, or anything we’re collecting from grounded heuristics of varying validity, with different observers. We’re either getting a methodical system laid out to actually validate our correlation (Global Systemic) or an overarching set of mathematical equations (like Einstein’s Theory of Relativity) that can tell us what ocean temperature and pirate density is around the globe — and matches the data – true Global Holistic thinking. The Holy Grail of Pirate effects on climate.


Let’s do a more simple comparison (that’s real!) to show how all this actually matters. Let’s say I have three scientists together at a conference. They’re all specialists in measuring the force of gravity. But let’s pretend they haven’t figured out anything BUT how to measure downward force on an object falling to the Earth. No Newton’s Law – which is pretty close to Global Holistic — we use it to calculate spacecraft trajectories to Jupiter, after all!

All our scientists are in Legalistic Hierarchies, which means that they have access to methodically collecting data as part of their core knowledge structure. They all belong to the Downward Force Measuring Society, and have been trained to follow exquisite procedures to come up with their results. No agency required! And no relational trust either. So no empathy.

Scientist #1 stands up, and says “We’ve been doing a fine job measuring this downward force on balls in our lab in East Skeezix, NY! Our test objects are accelerating toward the ground at 9.81 m/s2 consistently!”

Scientist #2 stands up, and says “We ALSO have been doing a tremendous job measuring gravity in West Windy, OK! We as well have been dropping small balls, and have recorded an acceleration of 9.81 M/s2 consistently!”

Scientist #3 stands up, and says “I’ll bet if you drop a ball over in the Walmart parking lot on the other side of town, you’ll record an acceleration of 9.81 m/s2 as well!”

What do the other scientists say? Stuck in the lower value sets/social structures, they pronounce “You CAN’T say that. You DIDN’T make the measurement!”

Reductionist science at its finest.

The series ‘Genius’ in the first season covers the life of Albert Einstein, and the episodes I watched actually cover the various conflicts in the Value Sets pretty well. The German empiricists were not so far off from the gravity scientists above when Einstein announced his Theory of Relativity. Compounding the hatred was the fact that Einstein was Jewish, and the Nazis were coming to power. That certainly didn’t help.

And it was none other than Einstein’s mentor, Max Planck, that said, science advances one funeral at a time. Mostly — or from large, connected communities, or the vanishingly rare heads of singular geniuses. And, let’s face it. Most of us are not Einsteins.

What’s the takeaway? We establish RELIABILITY with the lower social structures. We establish VALIDITY through grounding with the outside world with the higher social structures. And we had better have a method that supplants our innate tendency to jump to a conclusion that supports our beliefs. Understanding the role of case studies, as well as larger deep theory helps us to make sure we fill in the blanks in both arenas.

And the core of valid case studies, or trusting the right people? Empathetic development. It always comes back to that.

What’s the potential for future peril that we might see, if we can’t get this lesson? The disconnected example I used above – pirates and AGW seems pretty silly. The problem is that AI is moving rapidly into the space where seemingly distant outcomes can be supported convincingly by pretty sophisticated analyses. Advanced biometric analyses of faces (which are heavily race-dependent!) are now possible. Similar correlative mechanisms are being used to identify potential shop-lifters and such. One can see relatively quickly that a deeper understanding of this whole correlation vs. causation is going to be at the root of a lot of ground wars in lots of our society. There’s no guarantee that we’re going to evolve a deeper understanding of how faces work before we identify the superficial characteristics associated with race and ethnicity.


2 thoughts on “Turning Correlation into Causation – How Deeper Knowledge and Insight is Generated

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s