Big Data – Foundational Theories in Big Data Strategy, Analytics and Product Management

Why CIOs Should Turn To Cloud Based Data Analysis in 2015

Posted on January 15, 2015January 15, 2015 by Kumar Srivastava

CIOs are under tremendous pressure to quickly deliver big data platforms that can enable enterprises to unlock the potential of big data and better serve their customers, partners and internal stakeholders. Early adopter CIOs of big data report clear advantages of seriously considering and choosing the cloud for data analysis. These CIOs make a clear distinction between business critical and business enabling systems and processes. They understand the value that the cloud brings to data analysis and exploration and how it enables the business arm to innovate, react and grow the businesses.

Here are the 5 biggest reported advantages of choosing the cloud for data analysis

Speed – Faster Time to Market

Be it the speed of getting started with data analysis, the time it takes to have a software stack that can enable analysis or the time it takes to provision access to data, a cloud based system offers a faster boot time for the data initiative. This is music to the business ears as they are able to extract value from data sooner than later.

The cloud also offers faster exploration, experimentation, action and reaction based on data analysis. For example, a cloud-based system can be made to auto scale given the number of users querying the system, the number of concurrent ongoing analysis, the data that is entering the system and the data that is being stored or processed in the system. Without any long hardware procurement times, the cloud can often be the difference between critical data analysis that drives business growth and missed opportunities.

Another consideration mentioned by CIOs is the opportunity cost of building out full scale analytics systems. With limited budgets and time, focusing on generating core business value turns out to be more beneficial than spending those resources on reinventing a software stack that has already been built by a vendor.

Extensibility – Adjusting to Change

A very unique advantage of operating in the cloud is the ability to adjust to changes in business, the industry or competition. Dynamic enterprises introduce new products, kill underperforming products, invest in mergers and acquisitions. Each such activity creates new systems, processes and data sets. Having a cloud based stack that not just scales but offers a consistent interface reduces the problem of combining this data (and securing and maintaining) from a O(n!) problem to a O(n) problem making it a much cheaper proposition.

Cost – Lower, Cheaper

CIOs love the fact that cloud based data analysis stacks are cheaper to build and operate. Requiring no initial investment, CIOs get to pay for what they use and if the cloud auto scales, it makes for simpler capacity growth plans and easier to perform long term planning without the danger of over provisioning. Given the required data analysis capacity can often be spiky (varies sharply by time depending on planning and competitive activities), is impacted by how prevalent the data driven culture is in an enterprise (and how the culture changes over time) and the volume and variety of data sources (this can be change at the rate of how the enterprise grows and maneuvers), it is very hard for the CIO to predict required capacity. Imperfect estimates can lead to wasted resources or/and unused wasted capacity.

Risk Mitigation – Changing Technological Landscape

Data analysis technologies and options are in a flux. Especially in the area of big data, technologies are growing and maturing at different rates with new technologies being introduced regularly. In addition, it is very clear given the growth of these modern data processing and analysis tools and the recent activity of analytics and BI vendors, the current capabilities available to business are not addressing the pain points. There is a danger of moving in too early and adopting and depending on a certain stack might end up being the wrong decision or leave the CIO with a high cost to upgrade and maintain the stack at the rate it is changing. Investing in a cloud based data analysis system hedges this risk for the CIO. Among the options available for the CIO in the cloud are Infrastructure as a Service, Platform as a Service or Analytics as a Service and the CIO can choose the optimal solution for them depending on bigger tradeoffs and decisions beyond the data analysis use cases.

IT as the Enabler

Tasked with security and health of data and processes, CIOs see their role changing to an enabler role where they are able to ensure that the data and processes are protected while still maintaining control in the cloud. For example, identifying and tasking employees as the data stewards ensures that a single person or team understands the structure and relevancy of various data sets and can act as the guide and central point of authority to enable various employees to analyze and collaborate. The IT team’s role can now focus on acting as the Data Management team and ensure that feedback and business pain points are quickly addressed and the learnings are incorporated into the data analysis pipeline.

A cloud based data analysis system also offers the flexibility to let the analysis inform the business process and workflow design. A well designed cloud based data analysis solution and its insights should be pluggable into the enterprise’s business workflow through well defined clean interfaces such as an insight export API. This ensures that any lessons learnt by IT can be easily fed back as enhancements to the business.

Similarly, a cloud based data analysis solution is better designed for harmonization with external data sources, both public and premium. The effort required to integrate external data sources and build a refresh pipeline for these sources is sometimes not worth the initial cost given business needs to iterate with multiple such sources in their quest for critical insights. A cloud based analytics solution offers a central point for such external data to be collected. This frees up IT to focus on providing services to procure such external data sources and make them available for analysis as opposed to procurement and infrastructure services to provision the data sources.

A cloud based solution also enables IT to serve as deal maker of sorts by enabling data sharing through data evangelism. IT does not have to focus on many to many data sharing between multiple sub organizations and arms of the enterprise but serve as a data and insight publisher focusing on the proliferation of data set knowledge and insights across the enterprise and filling a critical gap in enterprises of missed data connections and insights that go uncovered.

4 Strategies for Making Your Product ‘Smarter’

Posted on January 1, 2015 by Kumar Srivastava

Originally Published on Entrepreneur.com

“Smart” is the dominant trend in the area of entrepreneurship and innovation. In recent times, a plethora of new products have arrived that make an existing product “smarter” by incorporating sensors, connecting the product to their backend or adding intelligence in the product. Reimagining existing products to be smarter and better for the end user is a gold mine for innovation. Here are four ways to rethink your products and make them smarter.

1. Understand user intent and motivations.

Make your products smarter by making it listen and understand the intent of your user. What is the user trying to do at a given time or at a given location on a specific channel? By listening for signals that motivate the usage of your product, and accounting for how variations in these signals change how your product is used, you can predict and influence how your product should adjust to better serving the end user.

For example, a smart refrigerator can detect the contents, match it against the required ingredients for a decided dinner menu and remind the user to restock a certain missing ingredient.

2. Reach users at the right time.

You can make your products smarter by reaching the user at the right time with the right message, even if the user is not using the product at a given point in time. Making the product aware of the user’s environment offers the opportunity to craft a personalized message to enhance the user experience. You can then motivate and influence the user to use the product at the opportune time in the manner that is most beneficial for both the user and the product.

For example, a smart app can detect the user’s location in a particular grocery aisle and alert them an item they need to replace is on sale.

3. Enable good decisions.

Smart products help the user make the best decisions. By understanding the user’s context and their current environment, you can suggest alternatives, recommend choices or simply notify them of changes in their environment they might otherwise not have noticed. This capability enables the user to make informed choices and decisions, thus enhancing their experience and satisfaction from the product.

For example, by integrating traffic signals in a navigation system, the user can be notified of alternate routes when there are problems in their usual route.

4. Enhance user experience.

You can make your products smarter by enhancing the user’s experience, regardless of where they are in their journey with your product. If they are a new user, your product should help them onboard. If they are an active user, your product should make them more productive. If they are a dissatisfied user, your product should detect their dissatisfaction and offer the appropriate support and guidance to help them recover. In parallel, the product should learn from their situation and use this feedback in redesigning or refactoring the product.

For example, a product company that performs sentiment analysis on their twitter stream is able to swiftly detect user discontent and feed that into their support ticketing system for immediate response and follow up.

The ability to collect telemetry of how your product is being used, use sensors to detect the environment in which it is being used and use customer usage history in the backend to understand user intent has the potential to reinvigorate your existing products to be smarter and more beneficial for their users. Similarly, reimagining or innovating using the above principles offers entrepreneurs the opportunity to disrupt current products and markets and ride the “smart” wave to success.

The ‘Adjacent Possible’ of Big Data: What Evolution Teaches About Insights Generation

Posted on December 16, 2014December 16, 2014 by Kumar Srivastava

Originally published on WIRED

brunkfordbraun/Flickr

Stuart Kauffman in 2002 introduced the the “adjacent possible” theory. This theory proposes that biological systems are able to morph into more complex systems by making incremental, relatively less energy consuming changes in their make up. Steven Johnson uses this concept in his book “Where Good Ideas Come From” to describe how new insights can be generated in previously unexplored areas.

The theory of “adjacent possible extends to the insights generation process. In fact, it offers a highly predictable and deterministic path of generating business value through insights from data analysis. For enterprises struggling to get started with data analysis of their big data, the theory of “adjacent possible” offers an easy to adopt and implement framework of incremental data analysis.

Why Is the Theory of Adjacent Possible Relevant to Insights Generation

Enterprises often embark on their big data journeys with the hope and expectation that business critical insights will be revealed almost immediately just from the virtue of being on a big data journey and they building out their data infrastructure. The expectation is that insights can be generated often within the same quarter as when the infrastructure and data pipelines have been setup. In addition, typically the insights generation process is driven by analysts who report up through the typical management chain. This puts undue pressure on the analysts and managers to show predictable, regular delivery of value and this forces the process of insights generation to fit into project scope and delivery. However, the insights generation process is too ambiguous, too experimental that it rarely fits into the bounds of a committed project.

Deterministic delivery of insights is not what enterprises find on the other side of their initial big data investment. What enterprises almost always find is that data sources are in a disarray, multiple data sets need to be combined while not primed for blending, data quality is low, analytics generation is slow, derived insights are not trustworthy, the enterprise lacks the agility to implement the insights or the enterprise lacks the feedback loop to verify the value of the insights. Even when everything goes right, the value of the insights is simply miniscule and insignificant to the bottom line.

This is the time when the enterprise has to adjust its expectations and its analytics modus operandi. If pipeline problems exist, they need to be fixed. If quality problems exist, they need to be diagnosed (data source quality vs. data analysis quality). In addition, an adjacent possible approach to insights needs to be considered and adopted.

The Adjacent Possible for Discovering Interesting Data

Looking adjacently from the data set that is the main target of analysis can uncover other related data sets that offer more context, signals and potential insights through their blending with the main data set. Enterprises can introspect the attributes of the records in their main data sets and look for other data sets whose attributes are adjacent to them. These datasets can be found within the walls of the enterprise or outside. Enterprises that are looking for adjacent data sets can look at both public and premium data set sources. These data sets should be imported and harmonized with existing data sets to create new data sets that contain a broader and crisper set of observations with a higher probability of generating higher quality insights.

The Adjacent Possible for Exploratory Data Analysis

In the process of data analysis, one can apply the principle of adjacent possible to uncovering hidden patterns in data. An iterative approach towards segmentation analysis with a focus on attribution through micro segmentation, root cause analysis change and predictive analysis and anomaly detection through outlier analysis can lead to a wider set of insights and conclusions to drive business strategy and tactics.

Experimentation with different attributes such as time, location and other categorical dimensions can and should be the initial analytical approach. An iterative approach to incremental segmentation analysis to identify segments where changes in key KPIs or measures can be attributed to, is a good starting point. The application of adjacent possible requires the iterative inclusion of additional attributes to fine tune the segmentation scheme can lead to insights into significant segments and cohorts. In addition, adjacent possible theory can also help in identifying systemic problems in the business process workflow. This can be achieved by walking upstream or downstream in the business workflow and by diagnosing the point of process workflow breakdown or slowdown through the identification of attributes that correlate highly with the breakdown/slowdown.

The Adjacent Possible for Business Context

The process of data analysis is often fraught with silo’d context i.e. the analyst often does not have the full business context to understand the data or understand the motivation for a business driven question or understand the implications of their insights. Applying the theory of adjacent possible here implies that by introducing the idea of collaboration to the insights generation process by inviting and including team members who each might have a slice of the business context from their point of view can lead to higher valued conclusions and insights. Combining the context from each of these team members to design, verify, authenticate and validate the insights generation process and its results is the key to generating high quality insights swiftly and deterministically.

Making incremental progress in the enterprise’s insights discovery efforts is a significant and valuable method to uncover insights with massive business implications. The insights generation process should be treated as an exercise in adjacent possible and incremental insights identification should be encouraged and valued. As this theory is put in practice, enterprises will find themselves with a steady churn of incrementally valuable insights with incrementally higher business impact.

The 2+2=5 Principle and the Perils of Analytics in a Vacuum

Posted on December 1, 2014 by Kumar Srivastava

Published Originally on Wired

Strategic decision making in enterprises playing in a competitive field requires collaborative information seeking (CIS). Complex situations require analysis that spans multiple sessions with multiple participants (that collectively represent the entire context) who spend time jointly exploring, evaluating, and gathering relevant information to drive conclusions and decisions. This is the core of the 2+2=5 principle.

Analytics in a vacuum (i.e non collaborative analytics) due to missing or partial context is highly likely to be of low quality, lacking key and relevant information and fraught with incorrect assumptions. Other characteristics of non collaborative analytics is the usage of general purpose systems and tools like IM and email that are not designed for analytics. These tools lead to enterprises drowning in a sea of spreadsheets, context lost across thousands of IMs and email and an outcome that is guaranteed to be sub optimal.

A common but incorrect approach to collaborative analytics is to think of it as a post analysis activity. This is the approach to collaboration for most analytics and BI products. Post analysis publishing of results and insights is very important however, pre-publishing collaboration plays a key role in ensuring that the generated results are accurate, informative and relevant. Analysis that terminates at the publishing point has a very short half life.

Enterprises need to think of analysis as a living and breathing story that gets bigger over time as more people collaborate and lead to more data, new data, disparate data leads to the inclusion of more context negating incorrect assumptions, missing or low quality data issues and incorrect semantical understanding of data.

Here are the most common pitfalls that we have observed, of analytics carried out in a vacuum.

Wasted resources. If multiple teams or employees are seeking the same information or attempting to solve the same analytical problem, a non collaborative approach leads to wasted resources and suboptimal results.

Collaboration can help the enterprise streamline and divide and conquer the problem more efficiently and faster with lower time and manpower. Deconstructing an analytical hypothesis into smaller questions and distributing them across multiple employees leads to faster results.

Silo’ed analysis and conclusions. If results of analysis, insights and decisions are not shared systematically across the organization, enterprises face a loss of productivity. This lack of context between employees tasked with the same goals causes organizational misalignment and lack of coherence in strategy.

Enterprises need to ensure that there is common understanding of key data driven insights that are driving organizational strategy. In addition, the process to arrive at these insights should be transparent and repeatable, assumptions made should be clearly documented and a process/mechanism to challenge or question interpretations should be defined and publicized.

Assumptions and biases. Analytics done in a vacuum is hostage the the personal beliefs, assumptions, biases, clarity of purpose and the comprehensiveness of the context in the analyzer’s mind. Without collaboration, such biases remain uncorrected and lead to flawed foundations for strategic decisions.

A process around and freedom to challenge, inspect and reference key interpretation and analytical decisions made en route to the insight is critical for enterprises to enable and proliferate high quality insights in the organization.

Drive-by analysis. When left unchecked with top down pressure to use analytics to drive strategic decision making, enterprises see an uptake in what we call “drive-by analysis.” In this case, employees jump in to their favorite analytical tool, run some analysis to support their argument and publish these results.

This behavior leads to another danger of analytics without collaboration. These can be instances where users, without full context and understanding of of the data, semantics etc perform analysis to make critical decisions. Without supervision, these analytics can lead the organization down the wrong path. Supervision, fact checking and corroboration are needed to ensure that correct decisions are made.

Arbitration. Collaboration without a process for challenge, arbitration and an arbitration authority is often found to be, almost always at a later point in time when it is too late, littered with misinterpretations and factually misaligned or deviated from strategic patterns identified in the past.

Subject matter experts or other employees with the bigger picture, knowledge and understanding of the various moving parts of the organization need to, at every step of the analysis, verify and arbitrate on assumptions and insights before these insights are disseminated across the enterprise and used to affect strategic change.

Collaboration theory has proven that information seeking in complex situations is better accomplished through active collaboration. There is a trend in the analytics industry to think of collaborative analytics as a vanity feature and simple sharing of results is being touted as collaborative analytics. However, collaboration in analytics requires a multi pronged strategy with key processes and a product that delivers those capabilities, namely an investment in processes to allow arbitration, fact checking, interrogation and corroboration of analytics; and an investment in analytical products that are designed and optimized for collaborative analytics.

Pragmatic Big Data for the App Economy

Posted on December 1, 2014 by Kumar Srivastava

Meeting key business objectives (in the context of a platform strategy) such as increasing market share or profitability in the enterprise’s niche can be achieved by increasing the enterprise’s user base, the user engagement and improving the product mix.

Strategic Objectives

Big data analysis of the large volumes of data being generated in the app economy is key to designing and executing strategies that can help enterprises meet their strategic objectives.

Entities in the API Ecosystem such as end users, apps, developers, API and backend systems are continuously generating streams of data within the API value chain and outside the API value chain. For example, App Users are not only using their apps and APIs but sharing opinions on social media, looking for content on the internet and interacting with products andservices in the physical world. These continuously growing streams of data contain hidden signals that hold the key to meeting the enterprise’s strategic objectives.

Enterprises looking to gain a competitive edge in the app economy have the opportunity to harness the power of big data and contextual analytics to solve three key problems that directly offer increased profitability & market share through higher usage of their products and services and higher user satisfaction.

Understanding End Users

End users drive value in the API value chain. It is critical that the enterprise understand the behavior and actions of their end users and why users act and behave the way they do. It is critical that the enterprise is able to segment their end users by product usage, by value they drive to the bottom line and by how engaged these users are with their products and services.

Offering sticky products and services requires that the enterprise understand the value that end users are looking for in the products and services. Enterprises need to understand their desired profile of the end user. The desired profile is an intersection of the end users who find value in the enterprise’s products and services and the end users that are profitable for the enterprise.

Attracting the Best Developers

Developers are key to building great apps across a diverse use case set. A diverse app set attracts a broader set of end users directly translating to more end users, higher usage and higher profitability for the enterprise. Enterprises do not retain control over the end user experience on apps written by third party developers. This makes it critical for the enterprise to attract, detect, nurture and promote developers and apps that offer the best user experience and the best value for the enterprise.

Attracting the best developers requires enterprises to understand their desired developer profile, their current developer profile and emerging trends in the developer world being promoted and embraced by the “early adopter” developers. The ability to adapt and embrace these emerging developer trends can increase the attractiveness of an enterprise’s platform.

Enterprises need to understand how their developers communicate with them and how and where developers hang out or seek support. The ability to keep a tab on all developer communication avenues, quickly gather areas of dissatisfaction and move to address and quelch any unhappiness can be the difference between retaining the best developers who build the best, most profitable and desirable apps.

Monetizing Data

Understanding the value of an enterprise in eyes of end users, developers, partners and other enterprises is critical for building new and innovative products and services, improving existing products and services and building new business models around data monetization. Every time a request is made for an enterprise’s data, the metadata generated around the context of the request event can offer deep strategic insights into where value is concentrated in an enterprise’s data set.

Understanding and monetizing data requires enterprises to extract and process to build comprehensive accounting mechanisms across all data access mechanisms enabled through APIs, Apps and other data transfer mechanisms. Enterprises need to be able to understand the end user intent and request type metadata to determine the highest value data and data enabled use cases.

Conclusion

The App economy fueled by the shifting enterprise edge is and is expected to produce increasingly diverse and disparate streams of data that provide a wealth of opportunity for enterprises to apply big data and contextual analytical principles to solving the key problems in the app economy.

An enterprise’s competitive edge depends on their ability to uncover deep insights through platforms like Apigee Insights that offer the ability to gather, model, analyze the app economy data, generate insights, act on these insights, observe the change and adjust if necessary andmaking strong progress towards the enterprise’s strategic objectives.

Signals and Insights: Value, Reach, Demand

Posted on December 1, 2014 by Kumar Srivastava

Published Originally on Apigee

The mobile and apps economy means that the interaction between businesses and their customers and partners happens in an ever broader context, meaning that the amount of data that enterprises gather is exploding. Business is being done on multiple devices, and through apps, social networks, and cloud services.

It is important to think about signals when thinking about the value that is hidden in your enterprises data. Signals point towards insights. The ability to uncover, identify, and enhance these signals is the only way to make your big data work for you and succeed in the app economy.

Types of Signals

There are three types of signals that an enterprise should track and utilize in its decision making and strategic planning.

Value Signals

When customers use an enterprise’s products or services, they generate value signals. The actions that are part of searching, discovering, deciding, and purchasing a product or service offer signals into the perceived value of the product or service. These signals examined through the lens of user context (such as their profile, demographics, interests, past transaction history, and locality in time and space to interesting events and locations) deliver insights into business critical customer segments and their preference, engagement, and perceived value.

Reach Signals

When developers invest in the enterprise’s API platform and choose the APIs to create apps, they create reach signals. They are the signals around the attractiveness and perceived value of the enterprise’s products and services. Developers take on dependencies on APIs because they believe that such dependencies will help them in creating value for the end users of their apps and ultimately themselves. Developer adoption and engagement is a signal that offers a leading indicator and insight into the value and delivery of an enterprise’s products and services.

Demand Signals

When end users request information and data from the enterprises’ core data store, they generate demand signals on the enterprises’ information. These demand signals, within the user context deliver insights into the perceived value of the enterprise’s information along with context around the information (such as the source, type, freshness, quality, comprehensiveness and cache-ability). These insights offers a deep understanding of the impact of information on end user completed transactions and engagement.

Apigee Insights offers the expertise, mechanisms, and capabilities to extract and understand these signals from the enterprise data that sits within, at the edge, and outside the edge of the enterprise. Apigee Insights is built from the ground up to identify, extract and accentuate the value, reach and demand signals that drive business critical insights for the enterprise.

All (Big Data) Roads Lead To Your Customers

Posted on December 1, 2014December 1, 2014 by Kumar Srivastava

Originally Published on DataFloq

A large number of enterprise report a high level of inertia around getting started with Big Data. Either they are not sure about the problems that they need to solve using Big Data or they get distracted by the question of which Big Data technology to invest in and less on the business value they should be focusing on. This is often due to a lack of understanding of what business problems need to be solved and can be solved through data analysis. This causes enterprises to focus their valuable initial time and resources on evaluating new Big Data technologies without a concrete plan to deliver customer or business value through such investments. For enterprises that might find themselves in this trap, here are some trends and ideas to keep in mind.

Commoditization and maturation of Big Data technologies

Big Data technologies are going to get commoditized in the next couple of years. New technologies like Hadoop, HBase etc will mature with both their skills and partner ecosystem getting more diverse and stable. Increasing number of vendors will offer very similar capabilities and we will see these vendors compete increasingly on operational efficiency on the pivots of speed and cost. Enterprises who are not competing on the “Data efficiency” i.e. their ability to extract exponentially greater value from their data as compared to their competitors (notably AMZN, GOOG, YHOO, MSFT, FB and Twitter) should be careful to not overinvest in an inhouse implementation of Big Data technologies. Enterprises whose core business runs on data analysis need to continuously invest in data technologies to extract the maximum possible business value from their data. However, for enterprises that are still beginning or in the infancy of their Big Data journey, investing in a cutting edge technological solution is almost always the wrong strategy. Enterprises should focus on small wins using as much off the shelf components as possible to quickly reach the point of Big Data ROI offered out of customization free, off the shelf tools. When possible, enterprises should offload infrastructure operation and management to third party vendors while experimenting with applications and solutions that utilize these Big Data technologies. This ensures that critical resources are spent on solving real customer problems while critical feedback is being collected to inform future technology investments.

Technology Choices Without Business Impetus Are Not Ideal

The Big Data technology your business needs can vary by the problem that you are trying to solve. The needs of your business and the type of problems that you need to solve to offer simple, trustworthy and efficient products and services to your customers should determine and lead you to the right Big Data technology and vendors that match your needs. Enterprises need to focus on the business questions that need to be answered as opposed to the technology choice. Enterprises who do not have the business focus will spend crucial resources on optimizing their technology investments as opposed to solving real business problems and end up with little ROI. Planning and implementing Big Data technology solutions in a vacuum without clear problems and intended solutions in mind not only can lead to incorrect choices but can lead to wasted effort spent prematurely optimizing for and commitment to a specific technology

Evangelize Analytics Internally To Better Understand Technology Requirements

Appropriate Big Data technology decisions can only be made by ensuring that the needs and requirements of the various parts of the organization are correctly understood and captured. Ensuring the that culture in the enterprise promotes the use of data to answer strategic questions and track progress can only happen if analytical thinking and problem solving are used by all functions in the organization ranging from support to marketing to operations to products and engineering. Having these constituents represented in the technology stack decision process is extremely critical to ensure that eventual technology is usable and useful for the entire organization and does not get relegated to use by a very small subset of employees. In addition, the specific needs of certain users such as data exploration, insights generation, data visualization, analytics and reporting, experimentation, integration or publishing often require a combination of one or more technologies. Defining and clarifying the decision making process in an enterprise is needed to identify the various sets of technologies that need to be put together to build a complete data pipeline that is designed to enable decisions and actions.

All (Big Data) Roads Lead to Your Customers

For enterprises that are struggling to get started with Big Data analysis or have moved past the initial exploration stage in Big Data technology adoption, deciding what problems to tackle initial that will offer the highest ROI can be a daunting task. In addition, there is often pressure from management to showcase the value of the Big Data investment to the business, customers and users of the products and services. Almost always, focusing on improving customer/user satisfaction, increasing engagement with and use of your products and services mix and preventing customer churn is the most important problem that an enterprise can focus on and represents a class of problems that is 1. Universal 2. Perfect for Big Data analysis. As customers and end users interact with the enterprise’s products and services, they generate data or records of their usage. Because customer actions can be almost always divided into two sets: Transactional actions that represent a completed monetary or financially beneficial actions by the user for an enterprise. e..g purchasing a product or printing directions to a restaurant and Non Transactional, Leading Indicator Actions that by themselves are not monetarily beneficial to the enterprise but are leading indicators of upcoming transactions. e.g. searching for a product and adding it to a cart, reviewing a list of restaurants. Being able to tag the data generated by your users by the following metadata generates an extremely rich data set that is primed for Big Data analysis. Understanding the frequency of actions, time spent, when the actions occur, where they occur, on what channel and the environment and the demographic description of the user who carries out the action is critical. At the minimum, enterprises need to understand the actions of their users that correlate the highest with transactions, the attributes and behavior patterns of engaged and profitable users and the leading indicators of user dissatisfaction and abandonment, There are other very obvious applications of Big Data in the areas of security, fraud analysis, support operations, performance etc however each of these applications can be traced directly or indirectly to customer dissatisfaction or disengagement problems. Focusing your Big Data investments into a holistic solution to track and remedy customer dis-satisfaction to improve engagement and retention is a sure fire way to not only design the best possible Big Data solution to your needs but also to extract maximum value from these investments that impact your business’s bottom line.

What’s your API’s Cachiness Factor?

Posted on December 1, 2014 by Kumar Srivastava

Published Originally on Apigee

“Cachiness factor” is the degree to which your API design supports the caching of responses. Low cachiness means that a relatively higher than optimal number of requests is forwarded to the back end for retrieving data; a high cachiness factor means that the number of requests serviced through the cache layer is reduced and optimized.

Every time a request is sent to the API provider endpoint, the provider incurs the cost of servicing the request. Investing in a good caching mechanism reduces the number of requests that hit the endpoint, leading to a faster response time, lower servicing costs and saved bandwidth. Resources can then be spent on servicing requests that otherwise would have had to compete with cacheable requests.

Cachiness in an API design refers to understanding how a piece of retrieved data can be reused to serve other API requests. Such an understanding can be transformed into a set of actions that store the retrieved copy of the data in an optimal form for reuse.This coupled with insights from API usage analytics can provide direct benefits in terms of app performance and operational costs.

An API proxy can be designed to do a number of things when a request arrives:

Determine the quality or fidelity of the data requested by the app or end user

This information can then be used to
– Transform the API request to retrieve the data from the endpoint data store at the highest possible fidelity and breadth
– Save the retrieved data in the proxy cache
– Extract the appropriate fidelity and breadth (determined by the original request) and send as the response to the app/end user

For example, if the request is for weather patterns for a city, the system can potentially map the response to a response for all zipcodes in that city and store it accordingly in the cache.

Pre-fetch based on temporal and spatial locality
Predict based on usage patterns what the next request is likely to be and pre-fetch this data from the endpoint for storing in the cache.

For example, given a request for browsing a list of plasma TVs on sale at a retailer, it might make sense to cache the entire response set and serve subsequent requests for more data (e.g. the next set of TVs) from the cache.

Pre-fetch based on similarity
Use the idea of similarity in data sets to predict the next request and retrieve the data for storing at the proxy cache.

For example, for the scenario in which our user requests a list of TVs from one manufacturer, it might make sense to pre-fetch a list of TVs from another manufacturer with a similar product line and store this information in the cache.

Parameters-based selection
If your API supports “select” on your data through parameters, another option to optimize the cachiness of your API is to retrieve the entire data set (within certain bounds) from the backend, store it in your cache, and return only the appropriate data set for the request. Similarly, filtering of data can be performed at the proxy as opposed to the end point, increasing the cachiness factor for the API.

Using Data Analysis to improve cachiness
You can also use data analysis techniques to understand request patterns for your data and use this information to pre-fetch or over-fetch data from the endpoint to increase the cachiness factor of your API.

Caching Diffs
Another possible technique is building a mechanism where updated data is automatically sent from the end point data store to the cache as new updates are generated in the backend. At the cache level, instead of expiring the entire data set, the part of the data set that is least likely to be relevant is automatically expired and the new updated “diff” is appended to the cache data set.

The technique that will work for an API will vary from API to API. You might need to experiment with various techniques to identify the one that makes sense for your scenario and API.

Data as Currency & Dealmaker

Posted on December 1, 2014 by Kumar Srivastava

Published Originally on Apigee

The amount of data collected by companies in all sectors and markets has grown exponentially in the past decade. Businesses increasingly interact with customers through social and business networks, through apps (and APIs) and therefore businesses collecting data from new and diverse locations outside the walls of their enterprises and systems-of-record.

The following is a perspective on 5 ways in which data is changing how we do business in 2012 and beyond.

Data as currency and dealmaker

Similar to the discovery of oil in Texas at the turn of the last century, enterprises that have been collecting and storing data will be the ones primed to leverage their data for new opportunities and striking new business deals.

Add to this new data sources like the explosion of social data, which provides a window into your real-world and real-time customers’ behavior. The data accumulates quickly and changes frequently, and the ability to capture, analyze and derive insights from it will be key to offering true customer-centric value across companies, and even entire industries.

Data is fast becoming the de-facto currency for facilitating business deals. Enterprises will be able to command monetary and opportunistic conditions in return for providing access to their data. Google’s custom search for websites is an example. By providing indexed data and search functionality to websites, Google in return has the ability to show ads and generate revenue on the website.

We will also see the emergence of data network effects: enterprises will be able to enrich their existing data sets by requiring that other enterprises who purchase their data return (or feedback) the enriched data to augment the original data set. Enterprises sitting on the most desired data will be able to continuously add value to their existing data sets.

Collaborations through data

I believe a new model of collaboration based on data is emerging. Enterprises realize that they can partner with other enterprises and create innovative, new business value by sharing and operating with shared or semi-shared data stores.

Apart from shared data storage and processing costs, enterprises will be able to leverage faster time-to-market and build enhanced data-driven features using such collaboration models. They could use shared data stores as primary operating data backends thereby realizing near real-time data updates and availability for their products and services.

The academic world has several examples and parallels to this notion where data sets are frequently generated, updated and shared between various research projects leading to better quality and more expansive research and insights.

Data marketplaces

While the Internet is full of open data, there’s plenty of data that companies will be willing to pay for – particularly if it’s timely, curated, well aggregated, and insightful. As a result, data marketplaces are a burgeoning business. Witness Microsoft ’s data platform,Thompson Reuters Content Marketplace strategy, Urban Mapping and many more.

Data can be defined by attributes such as “Latency of delivery for processing”, “quality”, “sample ratio”, “dimensions”, “context”, “source trustworthiness”, and so on. As data becomes a key enabler of business, it becomes an asset that can be bid upon and acquired by the highest bidder. The price associated with acquiring data will be determined by the data attributes. For example, a real-time feed of a data source might cost more than historical data sets. Data sets at 100% sample ratio might cost more than data at lower fidelity.

Ability to access and synthesize data is a competitive edge. To gain and maintain this edge, enterprises will have to add the cost of acquiring data to their variable operating costs. At the same time, enterprises will have to protect their data as they they protect other corporate assets. Protection (and insurance) against loss, theft, corruption will be required to ensure continued success.

End users will stake claim to their data

With the rise of social networks and even with the consumerization of IT, data is also becoming more personal. We trade our personal data for services every day as we interact with Facebook and other sites.

End users who are the generators of data that enterprises collect and use to improve their businesses will stake claim to their data and demand a share of the value with the enterprise. In addition, end users will demand and gravitate towards enterprises that give them the ability to track, view and control the data they generate for enterprises. Enterprises may have to either “forget” users because users demand it or compensate them for their data.

The jury is still out but the tide may already have turned in this direction in Europe. Data protection regulations may allow for a “right to be forgotten” law through which users will have the right to demand that data held on them be deleted if there are “no legitimate grounds” for it to be kept. This includes if a user leaves a service or social network, like Google or Facebook – the company will have to permanently delete any data that it retains.

Data disintermediation

The concept of disintermediation – of removing the middlemen and vendors and giving consumers direct access to information that would otherwise require a “mediator” has been an active topic in the information industry and gains momentum in 2012 as data becomes currency.

We will see more and more enterprises exposing their data schemas, formats and other related capabilities publicly though a common data description language and data explorer capabilities accessible by both humans and machines.

Enterprises (or their automatic agents) will be able to crawl the web (or some other data network) and discover new data sources that serve their needs. Enterprises will have the ability to walk the data models and understand the structure and schema of various data sets and understand the intricacies of using these data sources.

The Three ‘ilities’ of Big Data

Posted on December 1, 2014December 1, 2014 by Kumar Srivastava

Published Originally on Big Data Journal

When talking about Big Data, most people talk about numbers: speed of processing and how many terabytes and petabytes the platform can handle. But deriving deep insights with the potential to change business growth trajectories relies not just on quantities, processing power and speed, but also three key ilities: portability, usability and quality of the data.

Portability, usability, and quality converge to define how well the processing power of the Big Data platform can be harnessed to deliver consistent, high quality, dependable and predictable enterprise-grade insights.

Portability: Ability to transport data and insights in and out of the system

Usability: Ability to use the system to hypothesize, collaborate, analyze, and ultimately to derive insights from data

Quality: Ability to produce highly reliable and trustworthy insights from the system

Portability
Portability is measured by how easily data sources (or providers) as well as data and analytics consumers (the primary “actors” in a Big Data system) can send data to, and consume data from, the system.

Data Sources can be internal systems or data sets, external data, data providers, or the apps and APIs that generate your data. A measure of high portability is how easily data providers and producers can send data to your Big Data system as well as how effortlessly they can connect to the enterprise data system to deliver context.

Analytics consumers are the business users and developers who examine the data to uncover patterns. Consumers expect to be able to inspect their raw, intermediate or output data to not only define and design analyses but also to visualize and interpret results. A measure of high portability for data consumers is easy access – both manually or programmatically – to raw, intermediate, and processed data. Highly portable systems enable consumers to readily trigger analytical jobs and receive notification when data or insights are available for consumption.

Usability
The usability of a Big Data system is the largest contributor to the perceived and actual value of that system. That’s why enterprises need to consider if their Big Data analytics investment provides functionality that not only generates useful insights but also is easy to use.

Business users need an easy way to:

Request analytics insights
Explore data and generate hypothesis
Self-serve and generate insights
Collaborate with data scientists, developers, and business users
Track and integrate insights into business critical systems, data apps, and strategic planning processes

Developers and data scientists need an easy way to:

Define analytical jobs
Collect, prepare, pre process, and cleanse data for analysis
Add context to their data sets
Understand how, when, and where the data was created, how to interpret data and know who created them

Quality
The quality of a Big Data system is dependent on the quality of input data streams, data processing jobs, and output delivery systems.

Input Quality: As the number, diversity, frequency, and format of data channel sources explode, it is critical that enterprise-grade Big Data platforms track the quality and consistency of data sources. This also informs downstream alerts to consumers about changes in quality, volume, velocity, or the configuration of their data stream systems.

Analytical Job Quality: A Big Data system should track and notify users about the quality of the jobs (such as map reduce or event processing jobs) that process incoming data sets to produce intermediate or output data sets.

Output Quality: Quality checks on the outputs from Big Data systems ensure that transactional systems, users, and apps offer dependable, high-quality insights to their end users. The output from Big Data systems needs to be analyzed for delivery predictability, statistical significance, and access according to the constraints of the transactional system.

Though we’ve explored how portability, usability, and quality separately influence the consistency, quality, dependability, and predictability of your data systems, remember it’s the combination of the ilities that determines if your Big Data system will deliver actionable enterprise-grade insights.