Why CIOs Should Turn To Cloud Based Data Analysis in 2015

Originally Published on DataFloq

cloudsroad (1)

CIOs are under tremendous pressure to quickly deliver big data platforms that can enable enterprises to unlock the potential of big data and better serve their customers, partners and internal stakeholders. Early  adopter CIOs of big data report clear advantages of seriously considering and choosing the cloud for data analysis. These CIOs make a clear distinction between business critical and business enabling systems and processes. They understand the value that the cloud brings to data analysis and exploration and how it enables the business arm to innovate, react and grow the businesses.

Here are the 5 biggest reported advantages of choosing the cloud for data analysis

Speed – Faster Time to Market

Be it the speed of getting started with data analysis, the time it takes to have a software stack that can enable analysis or the time it takes to provision access to data, a cloud based system offers a faster boot time for the data initiative. This is music to the business ears as they are able to extract value from data sooner than later.

The cloud also offers faster exploration, experimentation, action and reaction based on data analysis. For example, a cloud-based system can be made to auto scale given the number of users querying the system, the number of concurrent ongoing analysis, the data that is entering the system and the data that is being stored or processed in the system. Without any long hardware procurement times, the cloud can often be the difference between critical data analysis that drives business growth and missed opportunities.

Another consideration mentioned by CIOs is the opportunity cost of building out full scale analytics systems. With limited budgets and time, focusing on generating core business value turns out to be more beneficial than spending those resources on reinventing a software stack that has already been built by a vendor.

Extensibility – Adjusting to Change

A very unique advantage of operating in the cloud is the ability to adjust to changes in business, the industry or competition. Dynamic enterprises introduce new products, kill underperforming products, invest in mergers and acquisitions. Each such activity creates new systems, processes and data sets. Having a cloud based stack that not just scales but offers a consistent interface reduces the problem of combining this data (and securing and maintaining) from a O(n!) problem to a O(n) problem making it a much cheaper proposition.

Cost – Lower, Cheaper

CIOs love the fact that cloud based data analysis stacks are cheaper to build and operate. Requiring no initial investment, CIOs get to pay for what they use and if the cloud auto scales, it makes for simpler capacity growth plans and easier to perform long term planning without the danger of over provisioning. Given the required data analysis capacity can often be spiky (varies sharply by time depending on planning and competitive activities), is impacted by how prevalent the data driven culture is in an enterprise (and how the culture changes over time) and the volume and variety of data sources (this can be change at the rate of how the enterprise grows and maneuvers), it is very hard for the CIO to predict required capacity. Imperfect estimates can lead to wasted resources or/and unused wasted capacity.

Risk Mitigation – Changing Technological Landscape

Data analysis technologies and options are in a flux. Especially in the area of big data, technologies are growing and maturing at different rates with new technologies being introduced regularly. In addition, it is very clear given the growth of these modern data processing and analysis tools and the recent activity of analytics and BI vendors, the current capabilities available to business are not addressing the pain points. There is a danger of moving in too early and adopting and depending on a certain stack might end up being the wrong decision or leave the CIO with a high cost to upgrade and maintain the stack at the rate it is changing. Investing in a cloud based data analysis system hedges this risk for the CIO. Among the options available for the CIO in the cloud are Infrastructure as a Service, Platform as a Service or Analytics as a Service and the CIO can choose the optimal solution for them depending on bigger tradeoffs and decisions beyond the data analysis use cases.

IT as the Enabler

Tasked with security and health of data and processes, CIOs see their role changing to an enabler role where they are able to ensure that the data and processes are protected while still maintaining control in the cloud. For example, identifying and tasking employees as the data stewards ensures that a single person or team understands the structure and relevancy of various data sets and can act as the guide and central point of authority to enable various employees to analyze and collaborate. The IT team’s role can now focus on acting as the Data Management team and ensure that feedback and business pain points are quickly addressed and the learnings are incorporated into the data analysis pipeline.

A cloud based data analysis system also offers the flexibility to let the analysis inform the business process and workflow design. A well designed cloud based data analysis solution and its insights should be pluggable into the enterprise’s business workflow through well defined clean interfaces such as an insight export API. This ensures that any lessons learnt by IT can be easily fed back as enhancements to the business.

Similarly, a cloud based data analysis solution is better designed for harmonization with external data sources, both public and premium. The effort required to integrate external data sources and build a refresh pipeline for these sources is sometimes not worth the initial cost given business needs to iterate with multiple such sources in their quest for critical insights. A cloud based analytics solution offers a central point for such external data to be collected. This frees up IT to focus on providing services to procure such external data sources and make them available for analysis as opposed to procurement and infrastructure services to provision the data sources.

A cloud based solution also enables IT to serve as deal maker of sorts by enabling data sharing through data evangelism. IT does not have to focus on many to many data sharing between multiple sub organizations and arms of the enterprise but serve as a data and insight publisher focusing on the proliferation of data set knowledge and insights across the enterprise and filling a critical gap in enterprises of missed data connections and insights that go uncovered.

The 2+2=5 Principle and the Perils of Analytics in a Vacuum

Published Originally on Wired

Strategic decision making in enterprises playing in a competitive field requires collaborative information seeking (CIS). Complex situations require analysis that spans multiple sessions with multiple participants (that collectively represent the entire context) who spend time jointly exploring, evaluating, and gathering relevant information to drive conclusions and decisions. This is the core of the 2+2=5 principle.

Analytics in a vacuum (i.e non collaborative analytics) due to missing or partial context is highly likely to be of low quality, lacking key and relevant information and fraught with incorrect assumptions. Other characteristics of non collaborative analytics is the usage of general purpose systems and tools like IM and email that are not designed for analytics. These tools lead to enterprises drowning in a sea of spreadsheets, context lost across thousands of IMs and email and an outcome that is guaranteed to be sub optimal.

A common but incorrect approach to collaborative analytics is to think of it as a post analysis activity. This is the approach to collaboration for most analytics and BI products. Post analysis publishing of results and insights is very important however, pre-publishing collaboration plays a key role in ensuring that the generated results are accurate, informative and relevant. Analysis that terminates at the publishing point has a very short half life.

Enterprises need to think of analysis as a living and breathing story that gets bigger over time as more people collaborate and lead to more data, new data, disparate data leads to the inclusion of more context negating incorrect assumptions, missing or low quality data issues and incorrect semantical understanding of data.

Here are the most common pitfalls that we have observed, of analytics carried out in a vacuum.

Wasted resources. If multiple teams or employees are seeking the same information or attempting to solve the same analytical problem, a non collaborative approach leads to wasted resources and suboptimal results.

Collaboration can help the enterprise streamline and divide and conquer the problem more efficiently and faster with lower time and manpower. Deconstructing an analytical hypothesis into smaller questions and distributing them across multiple employees leads to faster results.

Silo’ed analysis and conclusions. If results of analysis, insights and decisions are not shared systematically across the organization, enterprises face a loss of productivity. This lack of context between employees tasked with the same goals causes organizational misalignment and lack of coherence in strategy.

Enterprises need to ensure that there is common understanding of key data driven insights that are driving organizational strategy. In addition, the process to arrive at these insights should be transparent and repeatable, assumptions made should be clearly documented and a process/mechanism to challenge or question interpretations should be defined and publicized.

Assumptions and biases. Analytics done in a vacuum is hostage the the personal beliefs, assumptions, biases, clarity of purpose and the comprehensiveness of the context in the analyzer’s mind. Without collaboration, such biases remain uncorrected and lead to flawed foundations for strategic decisions.

A process around and freedom to challenge, inspect and reference key interpretation and analytical decisions made en route to the insight is critical for enterprises to enable and proliferate high quality insights in the organization.

Drive-by analysis. When left unchecked with top down pressure to use analytics to drive strategic decision making, enterprises see an uptake in what we call “drive-by analysis.” In this case, employees jump in to their favorite analytical tool, run some analysis to support their argument and publish these results.

This behavior leads to another danger of analytics without collaboration. These can be instances where users, without full context and understanding of of the data, semantics etc perform analysis to make critical decisions. Without supervision, these analytics can lead the organization down the wrong path. Supervision, fact checking and corroboration are needed to ensure that correct decisions are made.

Arbitration. Collaboration without a process for challenge, arbitration and an arbitration authority is often found to be, almost always at a later point in time when it is too late, littered with misinterpretations and factually misaligned or deviated from strategic patterns identified in the past.

Subject matter experts or other employees with the bigger picture, knowledge and understanding of the various moving parts of the organization need to, at every step of the analysis, verify and arbitrate on assumptions and insights before these insights are disseminated across the enterprise and used to affect strategic change.

Collaboration theory has proven that information seeking in complex situations is better accomplished through active collaboration. There is a trend in the analytics industry to think of collaborative analytics as a vanity feature and simple sharing of results is being touted as collaborative analytics. However, collaboration in analytics requires a multi pronged strategy with key processes and a product that delivers those capabilities, namely an investment in processes to allow arbitration, fact checking, interrogation and corroboration of analytics; and an investment in analytical products that are designed and optimized for collaborative analytics.

It’s the End of the (Analytics and BI) World as We Know It

Published Originally on Wired

“That’s great, it starts with an earthquake, birds and snakes, an aeroplane, and Lenny Bruce is not afraid.” –REM, “It’s the End of the World as We Know It (and I Feel Fine)”

REM’s famous “It’s the End of the World…”song rode high on the college radio circuit back in the late 1980s. It was a catchy tune, but it also stands out because of its rapid-fire, stream-of-consciousness lyrics and — at least in my mind — it symbolizes a key aspect of the future of data analytics.

The stream-of-consciousness narrative is a tool used by writers to depict their characters’ thought processes. It also represents a change in approach that traditional analytics product builders have to embrace and understand in order to boost the agility and efficiency of the data analysis process.

Traditional analytics products were designed for data scientists and business intelligence specialists; these users were responsible for not only correctly interpreting the requests from the business users, but also delivering accurate information to these users. In this brave new world, the decision makers expect to be empowered themselves, with tools that deliver information needed to make decisions required for their roles and their day to day responsibilities. They need tools that enable agility through directed, specific answers to their questions.

Decision-Making Delays

Gone are the days when the user of analytics tools shouldered the burden of forming a question and framing it according to the parameters and interfaces of the analytical product. This would be followed by a response that would need to be interpreted, insights gleaned and shared. Users would have to repeat this process if they had any follow up questions.

The drive to make these analytics products more powerful also made them difficult to use to business users. This led to a vicious cycle: the tools appealed only to analysts and data scientists, leading to these products becoming even more adapted to their needs. Analytics became the responsibility of a select group of people. The limited population of these experts caused delays in data-driven decision making. Additionally, they were isolated from the business context to inform their analysis.

Precision Data Drill-Downs

In this new world, the business decision makers realize that they need access to information they can use to make decisions and course correct if needed. The distance between the analysis and the actor is shrinking, and employees now feel the need to be empowered and armed with data and analytics. This means that analytics products that are one size fits all do not make sense any more.

As the decision makers look for analytics that makes their day to day job successful, they will look towards these new analytics tools to offer the same capabilities and luxuries that having a separate analytics team provides, including the ability to ask questions repeatedly based on responses to a previous question.

This is why modern analytics products have to support the user’s “stream of consciousness” and offer the ability to repeatedly ask questions to drill down with precision and comprehensiveness. This enables users to arrive at the analysis that leads to a decision that leads to an action that generates business value.

Stream of conciousness support can only be offered through new lightweight mini analytics apps that are purpose-built for specific user roles and functions and deliver information and analytics for specific use cases that users in a particular role care about. Modern analytics products have to become combinations of apps to empower users and make their jobs decision and action-oriented.

Changes in People, Process, and Product

Closely related to the change in analytics tools is a change in the usage patterns of these tools. There are generally three types of employees involved in the usage of traditional analytics tools:

  • The analyzer, who collects, analyzes, interprets, and shares analyses of collected data
  • The decision maker, who generates and decides on the options for actions
  • The actor, who acts on the results

These employees act separately to lead an enterprise toward becoming data-driven, but it’s

a process fraught with inefficiencies, misinterpretations, and biases in data collection, analysis, and interpretation. The human latency and error potential makes the process slow and often inconsistent.

In the competitive new world, however, enterprises can’t afford such inefficiencies. Increasingly, we are seeing the need for the analyzer, decision maker, and actor to converge into one person, enabling faster data-driven actions and shorter time to value and growth.

This change will force analytics products to be designed for the decision maker/actor as opposed to the analyzer. They’ll be easy to master, simple to use, and tailored to cater to the needs of a specific use case or task.

Instant Insight

The process of analytics in the current world tends to be after-the-fact analysis of data that drives a product or marketing strategy and action.

However, in the new world, analytics products will need to provide insight into events as they happen, driven by user actions and behavior. Products will need the ability to change or impact the behavior of users, their transactions, and the workings of products and services in real time.

Analytics and BI Products and Platforms

In the traditional analytics world, analytics products tend to be bulky and broad in their flexibility and capabilities. These capabilities range from “data collection” to “analysis” to “visualization.” Traditional analytics products tend to offer different interfaces to the decision makers and the analyzers.

However, in the new world of analytics, products will need to be minimalistic. Analytics products will be tailored to the skills and needs of their particular users. They will directly provide recommendations for specific actions tied directly to a particular use case. They will provide, in real time, the impact of these actions and offer options and recommendations to the user to fine tune, if needed.

The Decision Maker’s Stream of Consciousness

In context of the changing people, process, and product constraints, analytics products will need to adapt to the needs of decision makers and their process of thinking, analyzing, and arriving at decisions. For every enterprise, a study of the decision maker’s job will reveal a certain set of decisions and actions that form the core of their responsibilities.

As we mentioned earlier, yesterday’s successful analytical products will morph into a set of mini analytics apps that deliver the analysis, recommendations, and actions that need to be carried out for each of these decisions/actions. Such mini apps will be tuned and optimized individually for each use case individually for each enterprise.

These apps will also empower the decision maker’s stream of consciousness. This will be achieved by emulating the decision maker’s thought process as a series of analytics layered to offer a decision path to the user. In addition, these mini apps will enable the exploration of tangential questions that arise in the user’s decision making process.

Analytics products will evolve to become more predictive, recommendation-based, and action oriented; the focus will be on driving action and reaction. This doesn’t mean that the process of data collection, cleansing, transformation, and preparation is obsolete. However, it does mean that the analysis is pre-determined and pre-defined to deliver information to drive value for specific use cases that form the core of the decision maker’s responsibility in an enterprise.

This way, users can spend more time reacting to their discoveries, tapping into their streams-of-consciousness, taking action, and reacting again to fine-tune the analysis

The Importance of Making Your Big Data System Insightful

Originally Published on Wired

 

With all the emphasis these days that’s placed on combing through the piles of potentially invaluable data that resides within an enterprise, it’s possible for a business to lose sight of the need to turn the discoveries generated by data analysis into valuable actions.

Sure, insights and observations that arise from data analysis are interesting and compelling, but they really aren’t worth much unless they can be converted into some kind of business value, whether it’s, say, fine tuning the experience of customers who are considering abandoning your product or service, or modeling an abuse detection system to block traffic from malicious users.

Digging jewels like these out of piles of enterprise data might be viewed by some as a mysterious art, but it’s not. It’s a process of many steps, considerations, and potential pitfalls, but it’s important for business stakeholders to have a grip on how the process works and the strategy considerations that go into data analysis. You’ve got to know the right questions to ask. Otherwise, there’s a risk that data science stays isolated, instead of evolving into business science.

The strategic considerations include setting up an “insights pipeline,” which charts the path from hypothesis to insight and helps ensure agility in detecting trends, building new products, and adjusting business processes; ensuring that the analytical last mile, which spans the gap from analysis to a tangible business action, is covered quickly; building a “data first” strategy that lays the groundwork for new products to produce valuable data; and understanding how partnerships can help enterprises put insights to work to improve user experiences.

The Insights Pipeline

You can visualize an insights pipeline as a kind of flow chart that encompasses the journey from a broad business goal, question or hypothesis to a business insight.

The questions could look something like this: Why are we losing customers in the European market? Or, how can revenue from iOS users be increased? This kind of query is the first step in open-ended data exploration, which, as the name implies, doesn’t usually include deadlines or specific expectations, because they can suppress the serendipity that is a key part of the open-ended discovery process.

Data scientists engage in this kind of exploration to uncover business-critical insights, but they might not know what shapes these insights will take when they begin their research. These insights are then presented to business stakeholders, who interpret the results and put them to use in making strategic or tactical decisions.

The broad nature of open-ended exploration carries potential negatives. Because of the lack of refinement in the query, the insights generated might be unusable, not new, or even worthless, leading to low or no ROI. Without specific guidance, a data scientist could get lost in the weeds.

Closed-loop data exploration, on the other hand, is much more refined and focused on a very focused business function or question. For example, a data scientist might pursue this: Are there any customers who do more than $100 of business each day with an online business? If so, flag them as “very important customers” so they can receive special offers. There is very little ambiguity in the query.

In the insights pipeline, successful open-ended explorations can eventually be promoted to closed loop dashboards, once business stakeholders ratify the results.

Closed-loop analysis implements systems based on models or algorithms that slot into business processes and workflow systems. As the example above suggests, these kinds of questions enable fast, traffic-based decision-making and end-user servicing. They also don’t add development costs once they are put in place.

But the very specificity of the queries that define closed-loop data analysis can produce insights of limited value. And once the query is set up, the possibility of “insights staleness” arises. Revisiting the “very important customer” example, what if inflation makes the $100-per-day customer less valuable? The insight becomes outdated; this highlights the need to consistently renew and verify results.

This illustrates the importance of consistently retuning the model, and, sometimes, forming new questions or hypotheses to plug back into an open-ended exploration. For example, a system that filters incoming emails for spam can quickly become outdated as spammers change tactics or use new technologies. A closed-loop system like this often needs to be revamped entirely to reflect changes in smaller behavior.

The Analytical Last Mile

Making decisions is one of the most challenging parts of doing business. In IT, employees are very comfortable delivering reports or assembling dashboards. But deciding on an action plan based upon that information isn’t easy, and lots of insights but few decisions introduces a lag time that in turn erodes business value.

The analytical last mile represents the time and effort required to use analytics insights to actually improve the state of a businesses. You might have invested heavily in big data technologies and produced all kinds of dashboards and reports, but this adds up to very little if interesting observations aren’t converted into action.

The value of analytics and a data-driven culture is only realized when the analytical last mile is covered quickly and efficiently. The inability to do this often results in lost business efficiency and unrealized business value.

More often than not, human latency is to blame. It’s defined as the time it takes employees to collect the required information, perform analysis, and disseminate the resulting insight to decision makers, and, then, the time it takes decision makers to collaborate and decide on a course of action.

Covering the analytical last mile efficiently requires an investment in and emphasis on setting up streamlined data collection, analysis and decision-making processes.

A “Data First” Strategy

When you define, design, and introduce a new product or service, data generation, collection and analysis, and product optimization might be the last thing you’re thinking of. It should be the first.

A “data first” strategy ensures that the right kind of technology is in place to deliver insights that can improve the end user experience. Thinking through what kinds of user data might be collected ensures that the enterprise isn’t caught off guard when the new product or service begins to gain momentum.

Some of the data you should think about gathering includes:

  • Data generated by user actions and interactions, such monetary transactions, information requests, and navigation
  • Data that defines the profile attributes of the user, including information available from the user, the enterprise, or enterprise partners
  • Contextual data about the user’s social network activity triggered by the product or service, the user’s location in relation to use of the product or service, or the channels through which the product or service is being used or accessed

Instead of losing critical time scrambling to set up methodologies to gather this data, you’ll be prepared to do some fine-tuning to the product to boost the end user’s experience.

Partnerships

A lot of skills and capabilities are required to take a data-driven effort to optimize the user experience and turn that into an actual, tangible improvement in your customer’s experience and, ultimately, boost the enterprise’s bottom line.
Many of these skills are not traditionally part of a business’ core competencies, so partnerships are a great way to bring in outside expertise to help polish the customer experience. Some areas where enterprises look to partners for help include: the ability to reach customers with content, offers, deals, and ads across multiple channels, devices or platforms; the ability to access user transaction history across multiple services and products; and the capability to know users’ locations at any point in time.

There’s a reason that big data analysis has become such a catchphrase. It’s an amazingly powerful tool that can improve user experiences and boost the bottom line.

But it’s critical that business stakeholders have an awareness of the process, think about the right strategic considerations, and realize the importance of moving quickly and decisively once insights are delivered. Otherwise, it’s all too easy for a business to get mired in data science, instead of transforming a valuable insight into an even more valuable action.

Virtual Sensors and the Butterfly Effect

Originally Published on Wired.

In the early 1960s, chaos theory pioneer Edward Lorenz famously asked, “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” Lorenz theorized that small initial differences in an atmospheric system could result in large and unexpected future impacts.

Similar “butterfly effects” can surface in the increasingly interconnected and complex universe of enterprise partnerships and supply-chain and cross-product relationships. It’s a world where new or evolving products, services, partnerships, and changes in demand can have unexpected and surprising effects on users and other products, services, traffic, and transactions in a company’s ecosystem.

Monitoring these complex relationships and the potentially important changes that can reverberate through an enterprise’s network calls for an interconnected system of virtual “sensors,” which can be configured and tuned to help make sense of these unexpected changes. As enterprises increasingly interface with customers, partners, and employees via apps and application programming interfaces (APIs), setting up a monitoring network like this becomes a particularly important part of data analysis.

What are Sensors?

Traditional sensors are often defined as “converters” that transform a physically measured quantity into a signal that an observer can understand. Sensors are defined by their sensitivity and by their ability to have a minimal effect on what they measure.

Physical sensors can capture aspects of the external environment like light, motion, temperature, and moisture. They’re widely used in business, too. Retailers can employ them to measure foot traffic outside or inside their stores, in front of vending machines, or around product or brand categories. Airlines use physical sensors to measure how weather patterns affect boarding and take-off delays. Using a diversity of sensors enables the definition of an environment around the usage of a product or service in the physical world.

Besides investing in traditional data processing technologies, cutting-edge enterprises map their digital world by defining and building so-called virtual sensors. Virtual sensors collect information from the intersection of the physical and digital worlds to generate and measure events that define the usage of a digital product or service. A virtual sensor could be a data processing algorithm that can be tuned and configured to generate results that are relevant for the enterprise. The generated alert notifies the enterprise of a change in the environment or ecosystem in which the user is using a product or service.

How to Build a Virtual Sensor Network

Building a network of virtual sensors for your business calls for requirements similar to those of a physical sensor system:

  • Sensitivity, or the ability to detect events and signals with configurable thresholds of severity
  • Speed, or the ability to speedily collect and process signals to generate business-critical events
  • Diversity, or the ability to collect, collate, and combine signals from multiple sensors with the goal of generating business-critical events

To begin charting the web of relationships that impacts the demand and usage of various enterprises’ products and services, businesses should determine which other products and services in the marketplace are complements, supplements, and substitutes to their own. Deep understanding of such evolving and complex relationships can help enterprises with planning partnerships.

  • Supplementary products and services enhance the experience of another product or service. For example, flat panel TVs are enhanced by wall mounts, stands, warranty services, cable services, and streaming movie services.
  • Complementary products and services work in concert with other products and services to complete the experience for the end user. Demand for car tires, for example, tends to generate demand for gasoline.
  • Substitute products and services have an inverse effect on each other’s demand. For example, two retailers offering the same selection of products targeted to the same consumer.

Understanding these relationships is the starting point of creating a network of sensors to monitor the impact of changes in traffic or transactions of an outside product or service on an enterprise’s own products and services. Detecting this change within the appropriate sensitivity can often be the difference between an enterprise’s failure or success.

Take for example, a web portal that aggregates content from several content providers. This portal uses APIs to connect to these third-parties. In many cases, these content providers are automatically queried by the aggregator, regardless of whether an end user is interested in the content. If for any reason there is a spike in usage of the portal on a particular day, this will automatically trigger spikes in the traffic for each of the content providers. Without understanding the complementary connection to the portal and the associated shifting demand properties of the connection, the content providers will find it difficult to interpret the traffic spike, which will eat up resources and leave legitimate traffic unserviced.

Here’s a similar example. Let’s say a service can support 100 API calls spread among 10 partners. If this service receives an unexpected and unwanted spike in traffic from one partner that eats up half of its capacity, then it will only have 50 API calls left to distribute among the other nine partners. This in turn can lead to lost transactions and dissatisfied users.

With an awareness of the network, however, the service would understand that this one partner routinely only sends 10 calls on a normal day, and would be able to put restrictions in place that wouldn’t let the extra 40 calls eat up the capacity of other partners.

In these kinds of situations, virtual sensors can provide the awareness and insights into this web of interdependency, and help make sense of traffic spikes that otherwise might seem incomprehensible.

Sensor-Aware Data Investments

Building a network of physical and virtual sensors entails collecting diverse signals from a complex map of data sources and processing them to generate events that can help enterprises understand the environments around their end users. Investing in these networks enables enterprises to track and monitor external signals generated from sources that have the ability to impact the enterprise’s traffic, transactions, and overall health.

This ability, in turn, helps digitally aware businesses negate potential troubles caused by the digital butterfly effect, and take advantage of the opportunities presented by a strong grasp of what’s happening in user and partner ecosystems.

How Data Analysis Drives the Customer Journey

Originally Published on Wired

Driving down Highway 1 on the Big Sur coastline in Northern California, it’s easy to miss the signs that dot the roadside. After all, the stunning views of the Pacific crashing against the rocks can be a major distraction. The signage along this windy, treacherous stretch of road, however, is pretty important — neglecting to slow down to 15 MPH for that upcoming hairpin turn could spell trouble.

Careful planning and even science goes into figuring out where to place signs, whether they are for safety, navigation, or convenience. It takes a detailed understanding of the conditions and the driving experience to determine this. To help drivers plan, manage, and correct their journey trajectories, interstate highway signs follow a strict pattern in shape, color, size, location, and height, depending on the type of information being displayed.

Like the traffic engineers and transportation departments that navigate this process, enterprises face a similar challenge when mapping, building, and optimizing digital customer journeys. To create innovative and information-rich digital experiences that provide customers with a satisfying journey, a business must understand the stages and channels that consumers travel through to reach their destination. Customer journeys are multi-stage and multi-channel, and users require information at each stage to make the right decisions as they move toward their destination.

Signposts on the Customer Journey

To understand what kind of information must be provided — and when it must be supplied — it’s important to understand the stages users travel through as they form decisions to purchase or consume products or services.

  • Search: The user starts on a path toward a transaction by searching for products or services that can deliver on his or her use case
  • Discover: The user narrows down the search results to a set of products or services that meet the use case requirements
  • Consider: The user evaluates the short-listed set of products and services
  • Decide: The user makes a decision on the product or service
  • Sign up/set up: The user completes the setup or sign up required to begin using the chosen product or service
  • Configure: The user configures and personalizes the product or service, to the extent possible, to best deliver on the user’s requirements
  • Act: The user uses the product or service regularly
  • Engage: The user’s usage peaks, collecting significant levels of activity, transaction value, time spent on the product, and the willingness to recommend the product or service to their professional or personal networks
  • Abandon: The user displays diminishing usage of the product or service compared to the configuring, active, and engaged levels
  • Exit: The user ceases use of the product or service entirely

Analyzing how a customer uses information as they navigate their journey is key to unlocking more transactions and higher usage, and also to understanding and delivering on the needs of the customer at each stage of their journey.

At the same time, it’s critical to instrument products and services to capture data about usage and behavior surrounding a product or service, and to build the processes to analyze the data to classify and detect where the user is on their journey. Finally, it’s important to figure out the information required by the user at each stage. This analysis determines the shape, form, channel, and content of the information that will be made available to users at each point of their transactional journey.

The highway system offers inspiration for designing an information architecture that guides the customer on a successful journey. In fact, there are close parallels between the various types of highway signs and the kind of information users need when moving along the transaction path.

  • Regulatory: Information that conveys the correct usage of the product or service, such as terms of use or credit card processing and storage features
  • Warning: Information that offers “guardrails” to customers to ensure that they do not go off track and use the product in an unintended, unexpected way; examples in a digital world include notifications to inform users on how to protect themselves from spammers
  • Guide: Information that enables customers to make decisions and move ahead efficiently; examples include first-run wizards to get the user up and running and productive with the product or service
  • Services: Information that enhances the customer experience, including FAQs, knowledge bases, product training, references, and documentation
  • Construction: Information about missing, incomplete, or work-in-progress experiences in a product that enable the user to adjust their expectations; this includes time-sensitive information designed to proactively notify the user of possible breakdowns or upcoming changes in their experience, including maintenance outages and new releases

Information Analytics

Information analytics is the class of analytics designed to derive insights from data produced by end users during their customer journey. Information analytics provides two key insights into the data and the value it creates.

First, it enables the identification of the subsets of data that drive maximum value to the business. Certain data sets in the enterprise’s data store are more valuable than others and, within a data set, certain records are more valuable than others. Value in this case is defined by how users employ the information to make decisions that eventually and consistently drive value to the business.

For example, Yelp can track the correlation between a certain subset of all restaurant reviews on their site and the likelihood of users reading them and going to the reviewed restaurants. Such reviews can then be automatically promoted and ranked higher to ensure that all users get the information that has a higher probability of driving a transaction—a restaurant visit, in this case.

Secondly, information analytics enables businesses to identify customer segments that use information to make decisions that drive the most business transactions. Understanding and identifying such segments is extremely important, as it enables the enterprise to not only adapt the information delivery for the specific needs of the customer segment but also price and package the information for maximum business value.

For example, information in a weather provider’s database in its raw form is usable by different consumers for different use cases. However, the usage of this information by someone planning a casual trip is very different than a commodities trader who is betting on future commodity prices. Understanding the value derived by a user from the enterprise’ information is key to appropriate pricing and value generation for the enterprise.

Information Delivery

Mining and analyzing how users access information is critical to identifying, tracking, and improving key performance indicators (KPIs) around user engagement and user retention. If the enterprise does not augment the product experience with accurate, timely, and relevant information (according to the user’s location, channel and time of usage), users will be left dissatisfied, disoriented, and disengaged.

At the same time, a user’s information access should be mined to determine the combination of information, channel, and journey stage that drives value to the enterprise. Enterprises need to identify such combinations and promote them to all users of the product and service and subsequently enable a larger portion of the user base to derive similar value.

Mining the information access patterns of users can enable enterprises to build a map of the various touch points on their customer’s journey, along with a guide to the right information required for each touchpoint (by the user or by the enterprise) in the appropriate form delivered through the optimal channel. Such a map, when built and actively managed, ends up capturing the use of information by customers in their journey and correlates this with their continued engagement with — or eventual abandonment of — the product.

Enabling successful journeys for customers as they find and use products and services is critical to both business success and continued customer satisfaction. Contextual information, provided at the right time through the right channel to enable user decisions, is almost always the difference between an engaged user and an unsatisfied one — and a transaction that drives business value.

Four Common Mistakes That Can Make For A Toxic Data Lake

Originally Published on Forbes

Data lakes are increasingly becoming a popular approach to getting started with big data. Simply put, a data lake is a central location where all applications that generate or consume data go to get raw data in its native form. This enables faster application development, both transactional and analytical, as the application developer has a standard location and interface to write data that the application will generate and a standard location and interface to read data that it needs for the application.

However, left unchecked, data lakes can quickly become toxic, becoming a cost to maintain whereas the value delivered from them shrinks or simply does not materialize. Here are some common mistakes that can make your data lake toxic.

Your big data strategy ends at the data lake.

A common mistake is to choose a data lake as the implementation of the big data strategy. This is a common choice because building a data lake is a deterministic project that IT can plan for and deliver given a budget. However, the assumption that “if you build it, they will come” is not correct. Blindly hoping that the data lake is filled with data from the various applications and systems that already exist or will be built/upgraded in the future and hoping that the data that exists in the data lake will be consumed by data driven application is a common mistake.

Enterprises need to ensure that the data lake is part of an overall big data strategy where application developers are trained and mandated to use the data lake as part of their application design and development process. In addition, applications (existing, in development or planned) need to be audited for their data needs and the usage of the data lake needs to be planned for and incorporated into the design.
Enterprises need to ensure that their business strategy is bound to the data lake and vice versa. Without this, a data lake is bound to be stunted into an IT project that never really lives up to its potential of generating incremental business value.

In addition, enterprises need to ensure that the organization does not use the data lake as a dumping ground. Data that enters the data lake should be of high quality and generated in a form that makes it easier to understand and consume in data driven or analytic applications. Data that gets generated without any thought given to how it would be consumed often ends up being dirty and unusable.

The data in your data lake is abstruse.

If attention is not paid to it, data in a data lake can easily become hard to discover, search or track. Without thinking through what it means to discover and use data, enterprises filling up the data lake will simply end up with data sets that are either unusable or untrustworthy.

Best practices to avoid having data in the data lake be unusable is to focus on capturing, alongside the data, metadata about the data that includes the lineage of the data i.e. how it was created, where it was created, what its acceptable and expected schema is, what are the types, how often the data set is refreshed etc. In addition, each data set should have an owner (application, system or entity), categorization, tags, access controls and if possible the ability to preview a sample. This metadata organization ensures that application developers or data scientists looking to use the data can understand the data source and ensure that they use it correctly in their applications.

All data sets in your data lake should have an associated “good” definition. For example, every data set should have a definition of an acceptable data record including the data generation frequency, acceptable record breadth, expected volume per record and per time interval, expected and acceptable ranges for specific columnar values, any sampling or obfuscations applied and if possible, the acceptable use of the data.

The data in your data lake is entity-unaware.

Often, attention is not paid when data gets generated to carefully record the entities that were part of an event. For example, the identifiers for user, the service, the partner etc that came together in an event might not be recorded. This can severely restrict the data use cases that can be built on top of this data set. It is much easier to aggregate and obfuscate these identifiers in the data set.

Similarly, data that is not generated and stored at the highest possible granularity level carries the risk of having its applicability and value diminished. This often happens when less data is preferable due to storage or compute concerns. It can also happen when the logging of data is not asynchronous i.e. the logging impacts the transaction processing of the system.

The data in your data lake is not auditable.

Data lakes that do not track how their data is being used and are not able to produce, at any point in time, users that access the data, processes that use or enhance the data, redundant copies of the data and how they came about to be and derivations of data sets can quickly become a nightmare to maintain, upgrade and adapt.
Without such auditability built into the data lake, enterprises end up getting stuck with simply large data sets that consume disk, increase the time it takes to process data records while increasing the probability that data is misused or misinterpreted.

In addition, if the data lake does not offer additional services that make it easier for consumers of the data to decide and actually use the data, the expected value from the data lake can be severely restricted. Enterprises should consider building and maintaining application directories that track contributors and readers (applications) on the data sets in the data lake, an index of data sets organized by categories, tags, sources, applications etc including the ability to quickly surface related data sets, data sets with parent-child relationships.

As the volume of data grows and the number of disparate data sets grows and the number of consumers that interact and impact these data sets increases, enterprises will increasingly be faced with a data lake management nightmare and will be forced to set aside more IT resources to track and maintain their data lakes. Some simple guidelines and best practices on how data (and its use) is generated, stored and cataloged can ensure that the data lake does not get toxic and delivers on its promised value that was the reason for its creation in the first place.