The ‘Adjacent Possible’ of Big Data: What Evolution Teaches About Insights Generation

Originally published on WIRED


Stuart Kauffman in 2002 introduced the the “adjacent possible” theory. This theory proposes that biological systems are able to morph into more complex systems by making incremental, relatively less energy consuming changes in their make up. Steven Johnson uses this concept in his book “Where Good Ideas Come From” to describe how new insights can be generated in previously unexplored areas.

The theory of “adjacent possible extends to the insights generation process. In fact, it offers a highly predictable and deterministic path of generating business value through insights from data analysis. For enterprises struggling to get started with data analysis of their big data, the theory of “adjacent possible” offers an easy to adopt and implement framework of incremental data analysis.

Why Is the Theory of Adjacent Possible Relevant to Insights Generation

Enterprises often embark on their big data journeys with the hope and expectation that business critical insights will be revealed almost immediately just from the virtue of being on a big data journey and they building out their data infrastructure. The expectation is that insights can be generated often within the same quarter as when the infrastructure and data pipelines have been setup. In addition, typically the insights generation process is driven by analysts who report up through the typical management chain. This puts undue pressure on the analysts and managers to show predictable, regular delivery of value and this forces the process of insights generation to fit into project scope and delivery. However, the insights generation process is too ambiguous, too experimental that it rarely fits into the bounds of a committed project.

Deterministic delivery of insights is not what enterprises find on the other side of their initial big data investment. What enterprises almost always find is that data sources are in a disarray, multiple data sets need to be combined while not primed for blending, data quality is low, analytics generation is slow, derived insights are not trustworthy, the enterprise lacks the agility to implement the insights or the enterprise lacks the feedback loop to verify the value of the insights. Even when everything goes right, the value of the insights is simply miniscule and insignificant to the bottom line.

This is the time when the enterprise has to adjust its expectations and its analytics modus operandi. If pipeline problems exist, they need to be fixed. If quality problems exist, they need to be diagnosed (data source quality vs. data analysis quality). In addition, an adjacent possible approach to insights needs to be considered and adopted.

The Adjacent Possible for Discovering Interesting Data

Looking adjacently from the data set that is the main target of analysis can uncover other related data sets that offer more context, signals and potential insights through their blending with the main data set. Enterprises can introspect the attributes of the records in their main data sets and look for other data sets whose attributes are adjacent to them. These datasets can be found within the walls of the enterprise or outside. Enterprises that are looking for adjacent data sets can look at both public and premium data set sources. These data sets should be imported and harmonized with existing data sets to create new data sets that contain a broader and crisper set of observations with a higher probability of generating higher quality insights.

The Adjacent Possible for Exploratory Data Analysis

In the process of data analysis, one can apply the principle of adjacent possible to uncovering hidden patterns in data. An iterative approach towards segmentation analysis with a focus on attribution through micro segmentation, root cause analysis change and predictive analysis and anomaly detection through outlier analysis can lead to a wider set of insights and conclusions to drive business strategy and tactics.

Experimentation with different attributes such as time, location and other categorical dimensions can and should be the initial analytical approach. An iterative approach to incremental segmentation analysis to identify segments where changes in key KPIs or measures can be attributed to, is a good starting point. The application of adjacent possible requires the iterative inclusion of additional attributes to fine tune the segmentation scheme can lead to insights into significant segments and cohorts. In addition, adjacent possible theory can also help in identifying systemic problems in the business process workflow. This can be achieved by walking upstream or downstream in the business workflow and by diagnosing the point of process workflow breakdown or slowdown through the identification of attributes that correlate highly with the breakdown/slowdown.

The Adjacent Possible for Business Context

The process of data analysis is often fraught with silo’d context i.e. the analyst often does not have the full business context to understand the data or understand the motivation for a business driven question or understand the implications of their insights. Applying the theory of adjacent possible here implies that by introducing the idea of collaboration to the insights generation process by inviting and including team members who each might have a slice of the business context from their point of view can lead to higher valued conclusions and insights. Combining the context from each of these team members to design, verify, authenticate and validate the insights generation process and its results is the key to generating high quality insights swiftly and deterministically.

Making incremental progress in the enterprise’s insights discovery efforts is a significant and valuable method to uncover insights with massive business implications. The insights generation process should be treated as an exercise in adjacent possible and incremental insights identification should be encouraged and valued. As this theory is put in practice, enterprises will find themselves with a steady churn of incrementally valuable insights with incrementally higher business impact.

The 2+2=5 Principle and the Perils of Analytics in a Vacuum

Published Originally on Wired

Strategic decision making in enterprises playing in a competitive field requires collaborative information seeking (CIS). Complex situations require analysis that spans multiple sessions with multiple participants (that collectively represent the entire context) who spend time jointly exploring, evaluating, and gathering relevant information to drive conclusions and decisions. This is the core of the 2+2=5 principle.

Analytics in a vacuum (i.e non collaborative analytics) due to missing or partial context is highly likely to be of low quality, lacking key and relevant information and fraught with incorrect assumptions. Other characteristics of non collaborative analytics is the usage of general purpose systems and tools like IM and email that are not designed for analytics. These tools lead to enterprises drowning in a sea of spreadsheets, context lost across thousands of IMs and email and an outcome that is guaranteed to be sub optimal.

A common but incorrect approach to collaborative analytics is to think of it as a post analysis activity. This is the approach to collaboration for most analytics and BI products. Post analysis publishing of results and insights is very important however, pre-publishing collaboration plays a key role in ensuring that the generated results are accurate, informative and relevant. Analysis that terminates at the publishing point has a very short half life.

Enterprises need to think of analysis as a living and breathing story that gets bigger over time as more people collaborate and lead to more data, new data, disparate data leads to the inclusion of more context negating incorrect assumptions, missing or low quality data issues and incorrect semantical understanding of data.

Here are the most common pitfalls that we have observed, of analytics carried out in a vacuum.

Wasted resources. If multiple teams or employees are seeking the same information or attempting to solve the same analytical problem, a non collaborative approach leads to wasted resources and suboptimal results.

Collaboration can help the enterprise streamline and divide and conquer the problem more efficiently and faster with lower time and manpower. Deconstructing an analytical hypothesis into smaller questions and distributing them across multiple employees leads to faster results.

Silo’ed analysis and conclusions. If results of analysis, insights and decisions are not shared systematically across the organization, enterprises face a loss of productivity. This lack of context between employees tasked with the same goals causes organizational misalignment and lack of coherence in strategy.

Enterprises need to ensure that there is common understanding of key data driven insights that are driving organizational strategy. In addition, the process to arrive at these insights should be transparent and repeatable, assumptions made should be clearly documented and a process/mechanism to challenge or question interpretations should be defined and publicized.

Assumptions and biases. Analytics done in a vacuum is hostage the the personal beliefs, assumptions, biases, clarity of purpose and the comprehensiveness of the context in the analyzer’s mind. Without collaboration, such biases remain uncorrected and lead to flawed foundations for strategic decisions.

A process around and freedom to challenge, inspect and reference key interpretation and analytical decisions made en route to the insight is critical for enterprises to enable and proliferate high quality insights in the organization.

Drive-by analysis. When left unchecked with top down pressure to use analytics to drive strategic decision making, enterprises see an uptake in what we call “drive-by analysis.” In this case, employees jump in to their favorite analytical tool, run some analysis to support their argument and publish these results.

This behavior leads to another danger of analytics without collaboration. These can be instances where users, without full context and understanding of of the data, semantics etc perform analysis to make critical decisions. Without supervision, these analytics can lead the organization down the wrong path. Supervision, fact checking and corroboration are needed to ensure that correct decisions are made.

Arbitration. Collaboration without a process for challenge, arbitration and an arbitration authority is often found to be, almost always at a later point in time when it is too late, littered with misinterpretations and factually misaligned or deviated from strategic patterns identified in the past.

Subject matter experts or other employees with the bigger picture, knowledge and understanding of the various moving parts of the organization need to, at every step of the analysis, verify and arbitrate on assumptions and insights before these insights are disseminated across the enterprise and used to affect strategic change.

Collaboration theory has proven that information seeking in complex situations is better accomplished through active collaboration. There is a trend in the analytics industry to think of collaborative analytics as a vanity feature and simple sharing of results is being touted as collaborative analytics. However, collaboration in analytics requires a multi pronged strategy with key processes and a product that delivers those capabilities, namely an investment in processes to allow arbitration, fact checking, interrogation and corroboration of analytics; and an investment in analytical products that are designed and optimized for collaborative analytics.