Kauai Sunset, 2007
One of the latest analytical tools making news in the forefront of tech. is the idea of Big Data. What is Big Data, and how is it different from ‘Little Data’? The primary characteristic of Big Data is complexity — more facets, with more correlates, requiring different types of analysis tools than ever before. Big Data, basically by definition, strains the capacities of the current batch of relational data management sets available today.
And to read the press, Big Data is going to solve all our problems in the world. It’s going to show us how everything in our world is related. Here’s a link from the famous consulting company, McKinsey, extolling the abilities of Big Data, while at the same time warning folks to get on the Big Data bus.
Make no mistake. Big Data can be a powerful tool. The very idea of collecting and understanding data from a variety of sources heretofore unavailable gives the potential for insight into how we execute many tasks in our daily lives. Last October, for example, I was in the PACCAR/Kenworth truck plant in Renton, taking a tour. In that facility, every time a screw is drilled into a truck frame, its seating and position is recorded. Think about that from a quality control perspective. You can know if, for not just a given model, but an individual unit, whether quality parameters have been met — whether literally every screw in the design of a truck has been mounted onto the truck frame. Incredible.
Big Data is also intrinsically tied up with the Internet of Things (IoT). What might be a paradigm for understanding that? IoT is actually a combination of three things — the various sensors that detect conditions in the natural world, the fiber optic nerve bundles that carry them back to the computers/brains that then make sense of the sensory inputs, and the computers/brains that then process the data, and send commands to modify the behavior of the larger system through actions at the interface between the IoT device and reality. Once one grasps the implications of this model, one can understand that IoT, combined with Big Data, is going to create the larger, distributed nervous system for the world we will occupy. As such, the whole field of Big Data becomes very important. Because how we implement Big Data will be, at some level, how we want the Internet Overmind to work.
Note that I said “want”. Whatever we do will be a complex system, with emergent behavior that will be unpredictable. That doesn’t necessarily mean ‘unpredictable’ as in bad, nor does it mean ‘unpredictable’ as in good. You might consider your limbic reaction to get a head check on your morning v-Meme! It just means that with however many billions of interactions, that will continue to grow in the future, we better be prepared to accept that there are some serious Unknown Unknowns out there.
But there are some things that we can know about Big Data — namely, we can understand the v-Meme of the lenses that we use to look at the data, if we choose. And therein lies the rub. What we want to do with the data will dictate, at least in part, how we view it, and how we view our actions regarding its transformation. And what we want to do will largely depend our own own perspective, summed up in our own v-Meme.
Why does this matter? This one concept is the first step along a path toward establishing validity of what our observations might be of Big Data. Let’s tear this idea apart and see if we can make sense of it.
First off, let’s start with our understanding of cognition and metacognition. Cognition is knowing what we know. Metacognition is being aware of our cognition — in other words, knowing what we know, knowing what we don’t know, and also being aware of not knowing all the things we don’t know.
How this applies to Big Data is as follows: if we look at a Big Data set, structured in some kind of a schema, or a pattern of data matching, we can certainly pull inferences out by sampling some of the lines in our Big Data set. Here’s a simplified example:
Our schema might look like this:
Name Age Recent Purchase
Bill Jones 42 Tennis Balls
etc……. with millions more rows of data.
An Authoritarian perspective on this data might be “I’ve looked at this data set, and it clearly shows that 42-year-old men like to buy tennis balls.”
When asked why he said that, our pure v-Meme Authoritarian (let’s call him Jim) might say “well, I’m a 42 year old man. I like to buy tennis balls. And here — right here in this database on what people like to buy is a line that shows I’m not the only one!” If you remind him that there are millions of lines in the schema, if he was a pure authoritarian, he might respond “Why are you telling me I’m wrong?” Jim’s an Authoritarian, also prone to dichotomous thinking and egocentric projection. “I know what I know — and right here is the evidence to support it!” One can see you’re already down the rabbit hole for convincing Jim otherwise. The data, even though it appears connected through the schema, is still a knowledge fragment, able to be processed by Authoritarian Jim. Further, there’s only one solution. Jim has said so, and since the veracity of information is controlled inside his head, he’s sure he’s right. And to him, he’s also supported his answer. He’s using a Big Data set, after all, and we all know that these are indeed the latest methods! McKinsey said so! We now see also how status plays into Jim’s conclusions. Never mind that the world is a more complex place. And if Jim’s doing the ordering, you can bet it’s going to be tennis balls, or something else Jim likes. Because if Jim likes it, everyone’s going to like it. And it’s backed up by data!
Let’s move on up to a Legalistic/Absolutistic v-Meme framework. Tina is now looking at the data, and she has a background in statistics. In fact, she’s the Chief Statistician in the department, and has three Assistant Statisticians report to her. Tina has data transformation tools at her disposal. So Marlene, a manager, comes to Tina, and says “Tina — I’d like some help placing orders for next year. Can you use your Big Data magic to figure out what we should order?” Tina says “coming right up!”
So Tina goes to the database, looks at the same schema, and starts applying some set of algorithmic transformations to the data, in the hopes that it will tell her what to tell Marlene. Same schema, lots of data. Applying algorithms (and they could be sophisticated algorithms) Tina may even do things like code objects into classes — balls of all sorts might get a Bin Number, horseshoes might get another. The schemes (and schemas) may become more complex. There may be a legacy object coding. Whatever. The key is that Tina runs her algorithms, and then she looks at them. She figures out what item sold the most. And then she reports to Marlene. “Order more balls,” she says. “They’re our best-seller.”
Now things get interesting. If Marlene asks Tina WHY she should order more balls, there are a number of responses Tina may say. But likely, the core of her argument will be “well, that’s what the data tells us to do. We have to trust the data.” The lens that Tina views the data through is what I call a meta-linear transform. Regardless of the complexity of the analysis done, the algorithms applied to the same data set will yield the same answers. And here’s the rub. Even though Tina says “well, that’s what the data tells us,” implicit in all this analysis are characteristics of the Legalistic v-Meme. We know what we know. There is no awareness that there may be things we DON’T know. Tina’s a no-nonsense statistician. “I’m perfectly data driven, and rational,” she’ll tell you. But likely, if she does tell you this, it may be true that she is logical. But she likely doesn’t understand the meta-dynamic that created the schema in the first place. Or how and why anyone generated the object codes that grouped objects into balls and horseshoes in the first place. Buried back in history, someone made a decision to look at the data that way, and that implicit mental model is buried in the way we set up collecting the data in the first place. In so many ways, it’s not only the data that’s talking. It’s the way we set up the grouping of the data in the first place.
And here’s the other thing — because we have data, we KNOW it. From the perspective of the community, the answer has arisen straight out of the data. By the very definition of objectivity, these people are objective! If you, on the outside of the system, attempt to question those people on the inside, their immediate response is always the same. “Where’s the data to back up your theory? We didn’t make any assumptions.”
One can now start to see the difference between cognition and the larger issues of metacognition — or for lack of a better term, knowing and wisdom — knowing what you don’t know. Without metacognition, one can’t understand what you don’t know, or interpret it. Any larger metacognition is actually set in the way the schema for the data is constructed. And to the person on the inside — in this case, Tina — she can’t know. Her Legalistic/Absolutistic v-Meme brain wiring prevents it. And it doesn’t matter whether it is Data — or Big Data. With all the v-Memes at Legalistic and below, one is limited in awareness only to the things one knows — no matter the level of data transformation. That doesn’t mean that there aren’t higher understandings built into the system. But these Guiding Principles are implicitly buried in the schema, with few clues on how to divine these. Tina isn’t aware of them.
This kind of thinking has many situations where it can be insightful and productive. If you’re a company selling sports equipment, and you need to make sure your stock levels are adequate, various meta-linear transformations may exactly be the types of answers that you are looking for. Also implicit in this type of system is the notion of slow change (if any), or rather, predictable change. Your company might decide it needs to stock up on tennis balls seasonally, and this periodic depletion of tennis ball stocks might just be part of the plan. The business environment, more or less, doesn’t change year to year. So this might just be the ticket.
But it’s not going to work very well if people start using tennis balls for things other than their intended purpose. Let’s say there’s an increase in tennis ball sales, and the real reason is people are buying more pit bulls, and taking them to the park and playing fetch. Because the pit bulls have powerful jaws, the dogs are popping more tennis balls, inflating demand. The real reason for the 42 year old contingent buying tennis balls is because of dog tennis ball consumption. Big Data will not tell you that.
How can we understand how to get more out of Big Data? That’s the subject of the next post. But here’s the rub — what we get out of Big Data will necessarily correspond to the v-Meme level of the mental model that we apply. And the v-Meme level will by definition dictate both the individuality, validity, as well as connectivity of both data and insights that we make. And if we do that explicitly, then we stand a much better chance of understanding how Big Data should inform our decisions — and when we need to start asking more questions.