Big data analytics in E-commerce: a systematic review and agenda for future research

There has been an increasing emphasis on big data analytics (BDA) in e-commerce in recent years. However, it remains poorly-explored as a concept, which obstructs its theoretical and practical development. This position paper explores BDA in e-commerce by drawing on a systematic review of the literature. The paper presents an interpretive framework that explores the definitional aspects, distinctive characteristics, types, business value and challenges of BDA in the e-commerce landscape. The paper also triggers broader discussions regarding future research challenges and opportunities in theory and practice. Overall, the findings of the study synthesize diverse BDA concepts (e.g., definition of big data, types, nature, business value and relevant theories) that provide deeper insights along the cross-cutting analytics applications in e-commerce.

Similar content being viewed by others

E-commerce and Business Analytics: A Literature Review

Chapter © 2019

Big Data Applications

Chapter © 2014

Does big data mean big knowledge? Integration of big data analysis and conceptual model for social commerce research

Article 01 October 2016

Explore related subjects

Avoid common mistakes on your manuscript.

Introduction

In the past few years, an explosion of interest in big data has occurred from both academia and the e-commerce industry. This explosion is driven by the fact that e-commerce firms that inject big data analytics (BDA) into their value chain experience 5–6 % higher productivity than their competitors (McAfee and Brynjolfsson 2012). A recent study by BSA Software Alliance in the United States (USA) indicates that BDA contributes to 10 % or more of the growth for 56 % of firms (Columbus 2014). Therefore, 91 % of Fortune 1000 companies are investing in BDA projects, an 85 % increase from the previous year (Kiron et al. 2014a). While the use of emerging internet-based technologies provides e-commerce firms with transformative benefits (e.g., real-time customer service, dynamic pricing, personalized offers or improved interaction) (Riggins 1999), BDA can further solidify these impacts by enabling informed decisions based on critical insights (Jao 2013). Specifically, in the e-commerce context, “big data enables merchants to track each user’s behavior and connect the dots to determine the most effective ways to convert one-time customers into repeat buyers” (Jao 2013,p.1). Big data analytics (BDA) enables e-commerce firms to use data more efficiently, drive a higher conversion rate, improve decision making and empower customers (Miller 2013). From the perspective of transaction cost theory in e-commerce (Devaraj et al. 2002; Williamson 1981), BDA can benefit online firms by improving market transaction cost efficiency (e.g., buyer-seller interaction online), managerial transaction cost efficiency (e.g., process efficiency- recommendation algorithms by Amazon) and time cost efficiency (e.g., searching, bargaining and after sale monitoring). Drawing on the resource-based view (RBV)(Barney 1991), we argued that BDA is a distinctive competence of the high-performance business process to support business needs, such as identifying loyal and profitable customers, determining the optimal price, detecting quality problems, or deciding the lowest possible level of inventory (Davenport and Harris 2007a). In addition to the RBV, this research views BDA from the relational ontology of sociomaterialism perspective, which puts forward the argument that different organizational capabilities (e.g., management, technology and talent) are constitutively entangled (Orlikowski 2007) and mutually supportive (Barton and Court 2012) in achieving firm performance. Finally, service marketing offers the perspective of improving service innovation models, which has been reflected by firms such as Rolls Royce (Barrett et al. 2015), Amazon, Google and Netflix (Davenport and Harris 2007a). As such, the extant literature identifies BDA as the platform for “growth of employment, increased productivity, and increased consumer surplus” (Loebbecke and Picot 2015, p.152), the “next big thing in innovation” (Gobble 2013, p.64); “the fourth paradigm of science” (Strawn (2012); “the next frontier for innovation, competition, and productivity” (p. 1) and the next “management revolution” (p. 3) (McAfee and Brynjolfsson 2012); or that BDA is “bringing a revolution in science and technology” (Ann Keller et al. 2012); etc. Due to the high impact in e-commerce, notably in generating business value, BDA has recently become the focus of academic and industry investigation (Fosso Wamba et al. 2015c). As shown on Table 1, there is a steady growth in the BDA market, and in the number of global e-commerce customers and their per capita sales.

Table 1 Global growth in e-commerce and big data analytics (BDA)

Although an increasing amount of published materials has focused on practitioners in this domain, the literature remains largely anecdotal and fragmented. There is a paucity of research that provides a general taxonomy from which to explore the dimensions and applications of big data in e-commerce. The purpose of this research therefore is to identify different conceptual dimensions of big data in e-commerce and their relevance to business value. This paper focuses on e-commerce firms that capture business value through using big data analytics (BDA). The extant literature shows that BDA could allow an e-commerce firm to achieve a range of benefits, such as: enhanced pricing strategies for products and services (Christian 2013); targeted advertising; better communication between research and development (R&D) and product development; improved customer service; improved multi-channel integration and coordination; enhanced global sourcing from multiple business units and locations, and, overall, suggesting models and ways to capture greater business value (Beath et al. 2012; Fosso Wamba et al. 2015a; Sharma et al. 2014). The research question that has driven this study focuses on: how is “big data analytics” different from traditional analytics in the e-commerce environment in creating business value? To answer this research question, the paper aims to provide a general taxonomy to broaden the understanding of BDA and its role in creating business value. More specifically, the aims of this paper are:

Overall, this paper intends to provide a thorough representation of the meaning of big data in the e-commerce context. We have organized this paper into five main parts. Firstly, in section 2, we explain the methodological gestalt and present the results of our systematic review. By collating this information, in section 3, we then define the role of big data in e-commerce and identify alternative definitional perspectives. Secondly, in section 4, we analyze the distinctive attributes and types of big data within e-commerce. Thirdly, in section 5, we recommend different types of business value that can be derived using BDA in the e-commerce domain. Finally, we identify the challenges and provide solutions to tackle them in order to foster the growth of BDA in e-commerce.

Research approach

The study was grounded in a literature review to identify and appraise the current knowledge on the definitional aspects, attributes, types and business value of BDA in e-commerce. In defining e-commerce, Kalakota and Whinston (1997) focused on four perspectives: online buying and selling, technology driven business process, communication of information and customer service. However, this definition does not provide adequate focus on transaction cost and other aspects of e-commerce (e.g., B2B, B2G, C2C etc.) Thus, illuminating these critical aspects, Frost and Strauss (2013) extends the definition focusing on buying and selling online, digital value creation, virtual market places and storefronts and new distribution intermediaries. However, this definition heavily focuses on e-marketing and fails to integrate other important e-business processes. As such, this study puts forward a more holistic definition of e-commerce in big data environment, which aims to achieve both transaction value (i.e., cost savings, improving productivity and efficiency) and strategic value (i.e., competitive advantages, firm performance) in digital markets by transforming production, inventory, innovation, risk, finance, knowledge, relationship and human resource management with the help of analytics driven insights (Wixom et al. 2013).

This study explores ‘big data’ in e-commerce environment, which refers to the huge quantities of transaction, click-stream, voice, and video data in the e-commerce landscape (Davenport et al. 2012). The study embraced a systematic approach to establish rigor throughout the review: this was based on a similar approach used by Ngai and Wat (2002) and Vaithianathan (2010) in e-commerce research and Benedettini and Neely (2012) in service systems research. The review process adopted a protocol that described the criteria, scope, and methodology at each step. Due to the subjective nature of the study, the systematic approach was adapted to the specific objectives of the study. The study applied a scientific and transparent process throughout the protocol in order to make the review process more precise and less biased.

The review process was driven by the following research question: what are the definitional perspectives, distinctive characteristics, types, business value aspects and challenges of BDA in e-commerce? These aspects of the research question guided the review process by correctly identifying the subject areas, relevant studies, sources of materials, and the inclusion and exclusion criteria. The review aimed to provide pragmatic solutions to the research question by capturing concrete and meaningful aspects with the support of empirical evidence. Therefore, major components of e-commerce (e.g., product development; operations; marketing, finance, and human resources management; and information systems) were studied in relation to BDA and business value. We have excluded disciplines which are not directly linked to our research interests, such as biology, chemistry, geology, physics, or politics. As research on BDA is an emerging field, a search within the time frame from 2006 to 2014 was considered to be representative. We set the lower boundary at 2006 because in this year the first seminal paper “competing on analytics” was published by Davenport (2006) in the Harvard Business Review (cited >500 times). The systematic review of the study identified this paper as a trigger for subsequent big data analytics research.

The study identified relevant publications by forming search strings that combined the key words ‘big data analytics*’ with a different range of terms and phrases. Using wildcard symbols, the study reduced the number of search strings because, for example, ‘big data analytics*’ could return hits for ‘big data analytic’ and ‘big data analysis’. Initially, the search focused on e-commerce research as the source of material most relevant to the big data and analytics experienced by e-commerce firms. The study conducted the database search combining the key words ‘big data analytics’ with the terms ‘electronic commerce*’, ‘e-commerce*’, ‘big data analytics* AND e-commerce*’, and ‘big data analytics* AND electronic commerce*’. Overall, the study identified and submitted a total of 33 search strings to a panel of experts (n = 10) from different streams within e-commerce studies (i.e., marketing, operations management, strategic management, human resources management, e- commerce and information systems). The panel represented a team of experts consisting of an academic and an analytics practitioner from each stream to validate the review protocol.

Our search started on November 01, 2014 and ended on December 20, 2014. The study reviewed scholarly peer reviewed journals, periodicals, and quality web content by exploring five databases: Scopus (Elsevier); Web of Knowledge (Thomson ISI); ABI/Inform Complete (ProQuest); Business Source Complete (EBSCO Host); and Emerald, IEEE Xplore and ScienceDirect (Taylor & Francis). In addition, a similar search was conducted within the Association of Information Systems (AIS) basket of top journals. By adding the basket of top journals, the study incorporated the key databases used by prior studies that had a similar approach (Fosso Wamba et al. 2013; Lim et al. 2013b; Ngai et al. 2008; Ngai et al. 2009) as well as important findings from leading information systems (IS) journals.

The searches were limited to the abstract field, with the exception of the Web of Knowledge database in which the topic (i.e., abstract, title, and key words) was used. A total of 121 papers were downloaded and reviewed. As the study’s focus was on capturing the maximum number of views on BDA in e-commerce, a quality appraisal was established regarding the clarity of the papers’ contributions to the research questions (Birnik and Bowman 2007). At this stage, 32 papers were identified. A further seven papers were deemed relevant as they were clearly focused on BDA in various sectors including e-commerce. Cross-referencing yielded five more papers that were suitable for inclusion. At this stage, the study manually included four more papers, yielding the final list of 48 papers. Overall, the criteria used to select each paper contained an explicit or implicit indication of BDA in the e-commerce landscape.

We adopted a thematic analysis of the literature review (Ezzy 2002) which was particularly guided by Braun and Clarke (2006). The extensive review generated a set of 5 initial codes. Although the open coding was informed by the codes derived from the literature review, the coders were open to identify additional dimensions not currently present in the literature (Spiggle 1994). However, the coders established an initial correspondence between the literature and the five identified categories (i.e., needs identification, market segmentation, decision making and performance improvement, new product/market/business model innovation and creating infrastructure and transparency). At this stage, to establish further rigor in content analysis, we estimated Krippendorff’s alpha (or, Kalpha), which is a robust reliability measure irrespective of the number of observers, levels of measurement, sample sizes, and presence or absence of missing data (Krippendorff 2004, 2007). To estimate the Kalpha, first, each of the subsamples of 48 articles were coded independently by two judges using a nominal scale ranging from 1 to 5 (i.e., 1 = needs identification, 2 = market segmentation, 3 = decision making and performance improvement, 4 = new product/market/business model innovation, 5 = creating infrastructure and transparency). Second, we uploaded the coded data into IBM SPSS statistics package (version 21) and conducted the analysis following the guidelines of Hayes (2011) and De Swert (2012) to judge the inter-rater reliability of the coded variables (Hayes and Krippendorff 2007). Finally, the results provided us a decent Kalpha value of 0.82 which exceeds the cut off value of 0.80 (De Swert 2012), and provides us an adequate evidence of reliability in content analysis.

The overall distribution of literature that covers 5 aspects of BDA in e-commerce is presented in Table 2. It is worth noting that many articles appeared more than once as they captured multiple aspects. Clearly, the vast majority of the publications were in the category of ‘decision making and performance improvement’ (48 articles or 36 % of all publications). Indeed, the ultimate success of e-commerce depends on real time business decision making, which may be one explanation for the high level of publications focused on this category. The study found that most studies support the association between sophisticated analytics and robust decision making based on actionable insights. For example, using robust analytics, LinkedIn made the decision to introduce new features, such as ‘People You May Know’, ‘Jobs You May Be Interested In’, ‘Groups You May Like’, ‘Companies You May Want to Follow’ and achieved a 30 % higher click-through rate (Barton and Court 2012). This was followed by ‘needs identification’ and ‘creating infrastructure and transparency’ with 24 articles for each category (or 18 % for each domain). ‘Needs identification’ refers to the identification of precise customer needs by exploring big transaction data, whereas ‘infrastructure and transparency’ focuses on making relevant information available through networks to make right time decisions. For instance, Amazon’s recommendation engine generates ‘you might also want’ prompts to identify possible needs of customers based on the analysis of transactions history and book views (Manyika et al. 2011). As part of infrastructure and transparency, Google uses big data to refine core search and ad-serving algorithms (Davenport and Patil 2012). Similarly, Macys, an e-retailer in the US, has developed an analytics infrastructure that can optimize pricing of 73 million items in just over one hour (Davenport et al. 2012). Macy’s also analyzes data at the stock keeping unit (SKU) level to ensure that products of different assortments are readily available. Finally, the review identified ‘new product/market/business model innovation’ with 22 articles (or 17 % of all publications), and ‘market segmentation’ with 14 articles or 11 % of all publications. For example, Netflix Inc. created various customer segments (e.g., adventures, crime movies, family features, movies on books etc.) by analysing over one billion reviews in categories such as liked, loved, hated, etc. (Davenport and Harris 2007b). Using a visualization and demand analytics tool, Netflix introduced “House of Cards” program in the United States (USA) which is a huge success as a new product (Ramaswamy 2013). Overall, this study identifies these five broad aspects in which organizations can create business value by harnessing big data analytics (see Table 2).

Table 2 Big data analytics (BDA) applications in e-commerce

Defining big data analytics in the e-commerce environment

E-commerce firms are one of the fastest groups of BDA adopters due to their need to stay on top of their game (Koirala 2012). In most cases, e-commerce firms deal with both structured and unstructured data. Whereas structured data focuses on demographic data including name, age, gender, date of birth, address, and preferences, unstructured data includes clicks, likes, links, tweets, voices, etc. In the BDA environment, the challenge is to deal with both types of data in order to generate meaningful insights to increase conversions. Schroeck et al. (2012) found that the definition of big data incorporated various dimensions including: greater scope of information; new kinds of data and analysis; real-time information; non-traditional forms of media data; new technology-driven data; a large volume of data; the latest buzz word; and social media data. In defining big data, IBM (2012); Johnson (2012a), and Davenport et al. (2012) focused more on the variety of data sources, while other authors, such as Rouse (2011); Fisher et al. (2012); Havens et al. (2012), and Jacobs (2009) emphasized the storage and analysis requirements of dealing with big data. As defined by Gantz and Reinsel (2012), big data focuses on three main characteristics: the data itself, the analytics of the data, and the presentation of the results of the analytics that allow the creation of business value in terms of new products or services. Overall, the study defines big data in terms of five Vs: volume, velocity, variety, veracity, and value (White, 2012). The ‘volume’ refers to the quantities of big data, which is increasing exponentially. The ‘velocity’ is the speed of data collection, processing and analyzing in the real time. The ‘variety’ refers to the different types of data collected in big data environment. The ‘veracity’ represents the reliability of data sources. And, finally, the ‘value’ represents the transactional, strategic and informational benefits of big data (Fosso Wamba et al. 2015b; Wixom et al. 2013).

The sheer volume of academic and industry research provides evidence on the importance of big data in many functional areas of e-commerce including marketing, human resources management, production and operation, and finance (Agarwal and Weill 2012; Bose 2009; Davenport 2006; Davenport 2010, 2012; Davenport et al. 2012). In e-commerce, a large amount of customer-related information is available simply when customers ‘sign in’: these data are of great interest to business decision makers. While the significance of big data in making strategic decisions is recognized and understood, there is still a lack of consensus on the operational definition of big data analytics (BDA). It is thus prudent to analyze the definitions of BDA mentioned in previous studies in order to identify their common themes. For example, Davenport (2006) indicated that BDA refers to the quantitative analysis of big data with a view to making business decisions. In addition, this decision-usefulness aspect of analytics has been the focus in other studies such as those by Davenport and Harris (2007b); Davenport (2010), and Bose (2009). Whereas Davenport and Harris (2007b) explained BDA with the help of mechanisms such as statistical analysis and the use of an explanatory and predicting model, Bose (2009, p.156) described BDA as the “group of tools” used to extract, interpret information as well as predict the outcomes of decisions.

In defining BDA, one stream of research has focused on strategy-led analytics, or analytics that create sustainable value for business. For example, LaValle et al. (2011) explained that the application of business analytics (or the ability to use big data) for decision making must essentially be connected with the organization’s strategy. Indeed, strategy-driven analytics has received much attention due to its role in better decision making. Studies have also focused on “competitive advantages” and “differentiation”, while applying analytics to analyze real-time data (Schroeck et al. 2012). In a similar vein, Biesdorf et al. (2013) highlighted that it is important to create an environment where big data, process optimization, frontline tools and people are well-aligned in order to achieve competitive advantages.

The other stream of research defines BDA from the perspective of identifying new opportunities with big data (see Table 3). For example, Davenport (2012) explained that BDA attempts to explore new products and value-added activities. Similar arguments have also been offered in another study by Davenport et al. (2012) in scanning the external environment and identifying emerging events and opportunities. In addition, studies have also highlighted the role of behavioral elements in the BDA definition (Agarwal and Weill 2012; Ferguson 2012), such as empathy, which they believe to be essential in considerably enhancing the analytical ability of firms. They explained BDA as being a combination of three things, that is, business processes, technology optimization, and emotional connection with the use of data. In a similar spirit, Ferguson (2012) stated that BDA refers to a multidimensional behavioral analysis that covers both internal and external factors.

Table 3 Definitional aspects of big data analytics (BDA) in e-commerce

Overall, it is evident that statistical, contextual, quantitative, predictive, cognitive, and other models are necessary prerequisites for big data analytics (BDA) (Kiron et al. 2012a). As such, the study defines BDA in e-commerce as a holistic process that involves the collection, analysis, use, and interpretation of data for various functional divisions with a view to gaining actionable insights, creating business value, and establishing competitive advantage. While this definition reflects the notion that analytical techniques can be used to produce actionable insights is rooted in the origin of statistics traced back to the 18th century, the clear difference today is the vast amount electronic transactions in the digital economy and its associated data deluge(Agarwal and Dhar 2014). From the transaction cost theory and the new institutional economics viewpoint (Williamson 1979, 1981, 2000), when it comes to economic performance of e-commerce firms in the emerging data economy, institutional structure can play pivotal role in defining BDA. In addition, relationship based e-commerce theories in terms of trust, loyalty and privacy (Dinev and Hart 2006; Gefen 2000, 2002) or, classic information systems theories, such as IS success (Delone 2003; DeLone and McLean 1992), IT quality (Wixom and Todd 2005), IS continuance (Bhattacherjee 2001) and IT capability, sociomateriality and business value theories (Kim et al. 2012) can be utilized in defining BDA based on the objectives and scope of the study. As a result, there are clear opportunities for theoretical and practical inquiry to define BDA in a particular e-commerce context ranging from marketing promotions, customer relationship to supply chain management. By developing interesting research questions, this new frontier of data science can create new knowledge and scientific possibilities by leveraging on data, technology, analytics, business and society. Table 3 summarizes the definitional aspects and potential research areas on BDA in e-commerce.

Big data and their distinctive characteristics in the e-commerce environment

The e-commerce landscape today is bubbling up with numerous big data that are being used to solve business problems. According to Kauffman et al. (2012, p.85), the use of big data is skyrocketing in e-commerce “due [to] the social networking, the internet, mobile telephony and all kinds of new technologies that create and capture data”. With the help of cost-effective storage and processing capacity, and cutting-edge analytical tools, big data now enable e-commerce firms to reduce costs and generate benefits without any difficulty. However, analytics that capture big data is different from traditional data in many respects. Specifically, owing to the elements of their unique nature (i.e., voluminous, variety, velocity, and veracity), big data can be easily distinguished from the traditional form of data used in analytics (see Table 4). The next sections discuss these elements in turn, along with their implications for e-commerce.

Table 4 Nature of big data used in business analytics

Voluminous

With the emergence of web technologies, there is an ever-increasing growth in the amount of big data in the e-commerce environment (Beath et al. 2012). These mass quantities of data that e-commerce firms are trying to harness to improve their decision-making process are defined as voluminous (McAfee and Brynjolfsson 2012). As illustrated by Russom (2011), BDA takes a large volume of data that require a massive amount of storage and entail a large number of records. In fact, BDA encompasses large volumes of data (commonly expressed in petabytes and exabytes) that are used by decision makers for making strategic decisions. Data collected in the big data environment are often unstructured and can incorporate video, image, or data generated from mobile technology. As such, it is unlikely that big data will be clean and free from any errors. While this poses an extra challenge for decision makers to get data ready for use, big data enable real-time decision making for e-commerce firms (Kang et al. 2003). For example, using the massive amount of structured and unstructured data, Amazon developed sophisticated recommendation engines that deliver over 35 % of all sales, automated customer service systems to ensure superior customer satisfaction and dynamic pricing systems that adjust pricing against competing sites every 15 s (Goff et al. 2012). Similarly, Netflix, the online movie retailer, analyzes over 1 billion reviews to determine the customer’s movie tastes and inventory conditions (Davenport and Harris 2007b). Many e-commerce firms (e.g., Amazon, eBay, Expedia, Travelocity) use massive volume of social media data (e.g., photos, notes, blog posts, web links, and news stories) to tap into the opportunity of real time promotional offers (Manyika et al. 2011). In addition to opportunities, the volume of big data brings challenges, especially integration of big data from different sources and formats, introducing new “agile” analytical methods and machine-learning techniques, and increasing the speed of data processing and analysis. As such, E-commerce firms must have the ability to embed analytics and optimization into their operational and decision processes to enhance their speed and impact (Davenport 2013a).

Variety

The word ‘variety’ denotes the fact that big data originate from numerous sources which can be structured, semi-structured or unstructured (Schroeck et al. 2012). Variety is another critical attribute of big data as they are generated from a wide variety of sources and formats including text, web, tweet, audio, video, click-stream, log files, etc. (Russom 2011). This variety of data requires the use of different analytical and predictive models which can enable information about different functional areas to be used. Biesdorf et al. (2013) explained, for example, that the analytic model used by e-commerce firms could comprise a variety of customer information, such as: customer profiles and historical data on buying behavior; regional and seasonal buying patterns; optimizing of supply chain operations; and, above all, the retrieval of any unstructured data from social media to predict buying by product, store, and advertising activities. For example, Manyika et al. (2011) showed that an e-retailer provided real-time responses in marketing campaigns, amending them as and when necessary by conducting sentiment analysis. Overall, the variety of big data has the potential to add business value to firms. However, top management commitment in terms of improving business processes and defining workflows is very significant in order for the benefits from such data to be realized (Beath et al. 2012).

Velocity

Velocity refers to the frequency of data generation and/or the frequency of data delivery (Russom 2011). It is important to understand the velocity of big data which needs to be prioritized and synced into business processes, decision making and improvements in performance (Beulke 2011). As described by Gentile (2012), the term ‘velocity’ is the rate of change in big data and how quickly big data should be used in business decisions in order to add value. In fact, given that greater data velocity is assured, data have the potential to open up new opportunities for organizations. As shown by Davenport and Patil (2012), the high velocity of BDA can enable analysts to conduct consumer sentiment analysis and provide a clear picture about choices of brands and/or products.

To capitalize on this high pace of data, many e-commerce firms have used various techniques to add value to their business. For example, Amazon has been able to maintain a constant flow of new products by right-time communication with its stakeholders (Davenport 2006). eBay Inc. has performed thousands of experiments using data velocity with different aspects of its website, which has resulted in better layout and website features ranging from navigation to the size of its images (Bragge et al. 2012). To utilize the high velocity of data, many e-commerce firms now use sophisticated systems to capture, store, and analyze the data to make real-time decisions and retain their competitive advantages.

Veracity

Another essential attribute of big data relates to the uncertainty associated with certain types of data. These data demand rigorous verification, requiring full compliance with quality and security issues. High data quality is an important requirement of BDA for better predictability in the e-commerce environment (Schroeck et al. 2012). Therefore, verification is necessary to generate authenticated and relevant data, and to have the capability to screen out bad data (Beulke 2011). In fact, verification is essential in the data management process because the existence of bad data might hinder management in making cognizant decisions. Likewise, bad data would have little relevance in adding business value. Beulke (2011) explained that the information technology (IT) units of e-businesses can play a key role in this regard by setting up an automatic verification system so that the big data used for business decisions have been authenticated and have passed through strict quality compliance procedures.

In this regard, Schroeck et al. (2012) argued for the use of data fusion which combines various less reliable data sources in order to create a more precise and worthwhile data point (such as social comments affixed to geospatial location information). Ferguson (2012) highlighted that Montage Analytics has developed a tool which is particularly useful for predicting ‘black swans Footnote 1 ’ in organizations, and any other types of risk that have originated as a result of human manners and motivations. The reason is that the inherent unpredictability of some data is always caused by factors, such as technology failure, human lack of truthfulness and economic factors.

Types of big data used in E-commerce

E-commerce refers to the online transactions: selling goods and services on the internet, either in one transaction (e.g., Amazon, Zappos, eBay, Expedia) or through an ongoing transaction (e.g., Netflix, Match.com, LinkedIn etc.) (Frost and Strauss 2013). E-commerce firms ranging from Amazon to Netflix capture various types of data (e.g., orders, baskets, visits, users, referring links, keywords, catalogues browsing, social data), which can be broadly classified into four categories: (a) transaction or business activity data (b) click-stream data (c) video data and (d) voice data (see Table 5). In e-commerce, data are the key to track consumer shopping behaviour to personalize offers, which are collected over time using consumer browsing and transactional points. This section discusses different types of big data along with their implications for e-commerce.

Table 5 Types of big data

Transaction or business activity data

Transaction or business activity data evolve as a result of exchanges between the customer and company over time. These data are structured in nature and originate from many sources ranging from customer relationship programs (e.g., customer profiles maintained by the company, the occurrence of customer complaints) through to sales transactions. A recent study by Chandrasekaran et al. (2013) provided the example of an e-retailer that analyzes data from its loyalty program (i.e., its Clubcard loyalty program), entailing 1.6 billion data points, 10 million customers, 50,000 stock keeping units (SKUs), and 700 stores, which has resulted in the thorough coordination of big data with consumer insights. In the context of e-retailing, Kiron et al. (2014b) reported that StyleSeek, the online recommendation engine in the US, makes massive revenue by analyzing customers’ tastes and preferences and driving consumers toward its retail partners with the help a sophisticated analytics platform. Overall, it is evident that e-retailers can derive numerous benefits across the value chain using transaction data.

Click-stream data

Click-stream data originate from the web and online advertisements, and from social media content such as the tweets, blogs, Facebook wall postings, etc. of e-commerce businesses. In today’s connected environment, social media and online advertisements play a key role in the ongoing promotional strategy of firms, such as the use of click-stream data that are very important for management in making informed, strategic, and tactical decisions. Earlier studies have found that many e-commerce firms worldwide (e.g., Amazon, eBay, Zappos, Alibaba etc.) rely on click-stream data in their efforts to capture data. Click-stream data can be applied to predict customer preferences and tastes. As highlighted by Davenport and Harris (2007a), Netflix, a world-famous internet TV network, captures and analyzes more than one billion web data related to reviews of movies that are liked, loved, hated, etc. in order to understand customers’ tastes.

Another recent study by Davenport et al. (2012) reported that credit card firms, through relying on website and call center data, maintain databases (named as ready-to-market) to offer customer-tailored products within milliseconds and also optimize offers by following up responses from customers. Some companies use such databases not only to approach customers but also to offer online services. For example, Biesdorf et al. (2013) explained that by analyzing web data, e-retailers receive a red flag alert when the prices of their competitors’ products are below their own price level. Therefore, retailers can adjust their prices to remain competitive.

Video data

Video data are live data that come from capturing live images. Currently, e-commerce firms are keen to use not only either click-stream data or transaction data but, in association with image analysis software, they tend to also capture video data. As indicated by Schroeck et al. (2012), e-commerce firms have the necessary competencies to analyze extremely unstructured data, such as video or voice data. These data have the potential to add value for e-commerce firms. For example, Ramaswamy (2013) reported that Netflix uses video data to predict viewing habits and evaluate the quality of experiences. In addition, the visualization and demand analytics tool based on the type of movie consumption help Netflix understand preferences, which led them to achieve success in their “House of Cards” program in the US. Thus, the use of video data is essential for firms in making better decisions than their competitors.

Voice data

Another type of data attached to the big data family is voice data, that is, data typically originating from phone calls, call centers, or customer service. As evidenced in recent research, voice data are advantageous for analyzing consumer-buying behavior or targeting new customers. As described by Davenport et al. (2012), credit card companies, for example American Express, use and track data related to call center activities so that personalized offers can be given in milliseconds. In Schroeck et al.’s (2012) survey, e-commerce firms were found to use advanced capabilities to analyze text and transcripts converted from call center conversations. In addition, numerous nuances of language, such as sentiment, slang and intentions, can be read and recognized by means of BDA in the context of e-commerce.

Since the nature and type of big data are unique and coming from various networks of digital platforms, there is possibility of new theory enquiring new problems. The data economy also indicates that big data are “relational” and “networked”, which necessitate new developments in IT capability and algorithms, system and data quality, privacy and ethical implications, strategic alignment and corporate culture.

Business value of big data analytics for E-commerce firms

The ultimate challenge of BDA is to generate business value from this explosion of big data (Beath et al. 2012). The term ‘value’ in the context of big data implies the generation of economically worthy insights and/or benefits by analyzing big data through extraction and transformation. Aligned with Wixom et al. (2013), we define business value of BDA as the transactional, informational and strategic benefits for the e-commerce firms. Whereas transactional value focuses on improving efficiency and cutting costs, informational value sheds light on real time decision making and strategic value deals with gaining competitive advantages. For example, by injecting analytics into e-commerce, managers could derive overall business value by serving customer needs (79 %); creating new products and services (70 %); expanding into new markets (72 %); and increasing sales and revenue (76 %) (Columbus 2014). Table 6 shows that many e-commerce firms worldwide are able to enhance business value in the form of transactional, informational and strategic benefits by using big data analytics.

Table 6 Business value of big data analytics (BDA) in e-commerce

Amazon, the online retailer giant, is a classic example of enhancing business value and firm performance using big data. Indeed, the firm was able to generate about 30 % of its sales through analytics (e.g., through its recommendation engine) (The Economist 2011). Similarly, Kiron et al. (2012b) reported that Match.com was able to earn over 50 % increase in revenue in the past two years while the company subscriber base for its core business reached 1.8 million. The IBM case study (IBM 2012) illustrated that greater data sharing and analytics could improve patient outcomes. For example, Premier Healthcare Alliance was able to reduce expenditure by US$2.85 billion. Schroeck et al. (2012) found that Automercados Plaza’s grocery chain was able to earn a nearly 30 % rise in revenue and a total of US$7 million increase in profitability each year by implementing information integration throughout the organization. Furthermore, the company avoided losses on over 30 % of its products by scheduling price reductions to sell perishable products on time. In addition to adding value for business in financial terms, the use of big data can add benefit in non-financial parameters such as customer satisfaction, customer retention, or improving business processes. As presented by Davenport (2006), United Parcel Service (UPS) examines usage patterns and complaints data to accurately predict customer defections. This process has resulted in a significant increase in customer retention for the firm. In the similar spirit, LaValle et al. (2011) reported that an online automobile company was able to develop accurate customer retention strategies by creating a customer sample from big data followed by applying analytic algorithms to forecast attrition probabilities, coupled with identifying at-risk customers. This retention strategy consequently has opened up prospects for the firm to yield hundreds of millions of dollars merely from a single brand. Because e-commerce firms have opportunities to interact in real time with customers more frequently than firms that do not have an e-commerce platform, they need to use big data for various purposes. Therefore, drawing on the relevant theories in e-commercethe current study highlights, we present six mechanisms to enhance practical business values in data economy as follows.

Personalization

The first application of big data for e-commerce firms is the provision of personalized service or customized products (Koutsabasis et al. 2008). Studies have argued that consumers typically like to shop with the same retailer using diverse channels, and that big data from these diverse channels can be personalized in real time (Kopp 2013; Mehra 2013; Miller 2013). Real-time data analytics enables firms to offer personalized services comprising special content and promotions to customers. In addition, these personalized services assist firms to separate loyal customers from new customers and to make promotional offers accordingly (Mehra 2013). According to Liebowitz (2013), personalization can increase sales by 10 % or more and provide five to eight times the ROI on marketing expenditures. Bloomspot, in this regard, explored customer credit card data to track the spending records of the most loyal customers and to offer them rewards via follow-up offers and benefits which assisted in increasing customer loyalty (Miller 2013). Wine.com achieved a massive increase in their sales using personalized email marketing (Zhao 2013). Bikeberry.com is an example of an e-commerce firm that is now leveraging BDA (e.g., using data from customers’ browsing patterns, login counts, past purchases) to send each customer a tailored offer: this has led to a sales increase of 133 % and user on-site engagement increase of approximately 200 % (Jao 2013).

Dynamic pricing

In today’s extremely competitive market environment, customers are considered ‘king’. Therefore, to attract new customers, e-commerce firms must be active and vibrant while setting a competitive price (Kung et al. 2013). Amazon.com’s dynamic pricing system monitors competing prices and alerts Amazon every 15 s, which has resulted in a 35 % increase in all sales. To offer competitive prices to customers on the eve of possible increases in sales (such as at Christmas or other festive times), Amazon processes big data by taking into account competitors’ pricing, product sales, actions of customers, and any regional or geographical preferences (Kopp 2013). Access to this information through the use of big data is likely to enable e-commerce firms to establish dynamic pricing (Leloup and Deveaux 2001).

Customer service

Another key area in which e-commerce firms can use big data is customer service. Customer grievances communicated by means of contact forms in online stores together with tweeting enable e-commerce firms to make customers feel valued when they call the service center resulting in prompt service delivery (Mehra 2013). Similarly, Miller (Miller 2013) explained that, by offering proactive maintenance (i.e., taking preventive measures before a failure takes place or is even detected) using big data obtained from sensors rooted in products, e-commerce firms are able to offer innovative after-sales service.

Supply chain visibility

When customers place an order on an online platform, it is logical for them to expect that companies would provide the service of tracking the order while the goods are still in transit. Kopp (2013) explained that customers expect key information, such as the exact availability, current status, and location of their orders. E-commerce firms often face difficulty in addressing these expectations from customers as various third parties such as warehousing and transportation are involved in the supply chain process (Kopp 2013). Big data analytics (BDA) plays a key role in this context by collecting multiple information from multiple parties on multiple products (Mehra 2013), and subsequently precisely advises the expected delivery date to customers.

Security and fraud detection

Fraud-related losses, on average, amount to US$9000 for every US$1 million in revenue (Mehra 2013). This significant amount of loss can be prevented by identifying relevant insights through the use of big data. With the help of the right infrastructure, such as Hadoop, e-commerce firms can analyze data at an aggregated level to identify fraud relating to credit cards, product returns and identity theft (Mehra 2013). In addition, e-commerce firms are able to identify fraud in real time by combining transaction data with customers’ purchase history, web logs, social feed, and geospatial location data from smartphone apps. For example, Visa has installed a big data-enabled fraud management system that allows the inspection of 500 different aspects of a transaction, with this system saving US$2 billion in potential losses annually.

Predictive analytics

Predictive analytics refers to the identification of events before they take place through the use of big data (Kopp 2013). The application of predictive analytics depends on robust data mining (Cherif and Grant 2013). In this context, Loveman (2003), CEO and President of Caesar’s Entertainment, stated that: “[t]he best way to engage in … data-driven marketing is to gather more and more specific information about customer preferences, run experiments and analyses on the new data, and determine ways of appealing to [casino game] players’ interests. We realized that the information in our database, coupled with decision science tools that enabled us to predict individual customer’s theoretical value to us, would allow us to create marketing interventions that profitably addressed players’ unique preferences.” Therefore, predictive analytics helps firms to prepare their revenue budgets. The preparation of these budgets assists e-commerce firms to recognize future sales patterns from past sales data (e.g., yearly or quarterly). This, in turn, helps firms to better forecast and determine inventory requirements, thus leading to the avoidance of product stockouts and lost customers (Mehra 2013). Similarly, the application of a visualization and demand analytics tool at Netflix helped in the accurate prediction of consumer behavior and preferences when airing the “House of Cards” program in the United States (USA) (Ramaswamy 2013).

E-commerce firms increasingly extract business value from BDA insights either to solve business problems or make decisions. This new development in the realm of data driven e-commerce triggers development of new theories in the context of tangible (e.g., productivity improvement) and intangible (e.g., strategic business understanding) business value using people, process and technology. For example, Wixom et al. (2013) identified ‘drivers of pervasive use’ and ‘drivers of speed to insight’ as new constructs for possible theoretical vextension in BDA adoption and value. In the similar spirit, Fosso Wamba et al. (2015a) highlighted transactional, strategic and transformative business values of BDA as fertile grounds for knowledge creation. Also, Sharma et al. (2014) put forward new research questions exploring the roles of resource allocation and resource orchestration processes in organizations in order to gain a deeper understanding of how BDA influences business value. Similar arguments have been put forward by Beath et al. (2012) to explore BDA and business value in practice to capitalize on the current opportunities of big data. Overall, we suggest conducting innovative, non-traditional research in electronic markets that leverages the power of big data towards theory building in the form of capturing novel social and economic activity.

Discussion and future research agenda

While the use of big data tends to add value for business throughout the entire value chain, there are a few challenges that organizations should confront and resolve before pay-offs from big data will flow into their business. Indeed, any innovative way of performing jobs always brings challenges: big data is no exception to this reality. Many researchers have argued that, while big data have great potential to improve business performance/add value, decision makers need to address various business challenges in order to reap the benefits (Davenport 2012; Schroeck et al. 2012; Shah et al. 2012). As shown in Table 7, the current study highlights some of these challenges with theoretical and practical implications, thus laying the ground for potential research on BDA in the e-commerce landscape.

Table 7 Future research questions for big data analytics (BDA) in e-commerce

One of the biggest challenges in the big data environment is that it does not give clear direction on how to reach business targets by aligning with the existing organizational culture and capabilities (Kiron et al. 2014a). In this regard, Barton (2012) highlighted that the key challenge for managers is to make big data trustworthy and understandable to frontline employees, with the example that frontline employees are typically reluctant to use big data as they either did not trust a big data-based model or did not have the capabilities to understand how it worked. Therefore, in the process of gaining greater acceptance by employees and other end-users, managers should present big data in an understandable format such as through a dashboard, reports or a visualization system (Bose 2009). Indeed, an innovative capability always leads toward sustained long-term advantages (Porter and Millar 1985) and superior firm performance through the characteristics of rarity, appropriability, non-reproducibility, and non-substitutability (Barney 1991). Therefore, the required training, discussion with relevant employees and managers, the active role of top management (a leader), and incentives could work as catalysts to facilitate the adoption by employees and managers of big data (Kiron et al. 2014a). In this regard, McAfee and Brynjolfsson (2012) argued that it is very unlikely for a firm to be a top performer through the use of big data unless there is a clear goal and strategy in place.

Marketing within e-commerce firms is grappling with the massive amount of data arriving through multiple channels. Therefore, the biggest challenge is to find the right information about each customer from the large amount of data (Agarwal and Dhar 2014; Miller 2013). This huge amount of data empowers customers more than ever before and creates an opportunity for marketers to establish relationship in e-commerce based on trust and loyalty (Gefen 2000, 2002). Although organizations such as Facebook, Google, and Twitter have an enormous amount of person-specific information, the challenge is how they and other e-commerce firms can inject BDA into marketing practices to create personalized offers, set dynamic prices, and use the right channels to provide consumer value. In addition to marketing, it is also important to process big data in production and operations management using sophisticated database management tools. In relation to e-commerce, big data advocates argue for the integration of talent, technology and information to reduce overall transaction costs (Williamson 1979, 1981). In this regard, Chang et al. (2014, p.12) stated that: “[s]ince big data are now everywhere and most firms can acquire it, the key to competitive advantage is to accelerate managerial decision-making by providing managers with implementable guidelines for the application of data analytics skills in their business processes”.

The availability of good quality big data is the key to adding value to the organization (Kiron et al. 2014a). Poor quality data might arise from redundant applications and databases, which add to data storage costs and make data more difficult to access and use (Beath et al. 2012). Although big data can be leveraged to improve business value, there is always the risk of redundant, inaccurate, and duplicate data which might undermine he decision-making process (Nelson et al. 2005). As argued by Schroeck et al. (2012), poor data quality or ineffective data governance is a key challenge for BDA. It is noteworthy that the use of even the most sophisticated analytics would be meaningless if inappropriate data are in place or poor quality data are used (Bose 2009).

In addition to good quality data, the safe handling of individual and organizational privacy and data security (e.g., names and addresses, social security numbers, credit card numbers, and financial information) could possibly be another challenge for big data management (Bose 2009; Smith and Shao 2007). Although the unprecedented growth in BDA makes it alluring to use data without consent, that non-consenting respondents might jeopardise the advancement in research. As such, informed consent is a big challenge in big data environment (Bialobrzeski et al. 2012). Indeed, informed consent process need to be streamlined in BDA by embracing flexible, refined, simplified but informative consent process (Beskow et al. 2010; Ioannidis 2013; White, 2012) to encourage participatory community research (Bouhaddou et al. 2011). Undoubtedly, big data provides actionable insights for e-commerce firms, however, it creates a “privacy paradox” because consumers, on the one hand, want to protect their privacy and on the other hand, they regularly trade their personal information for free apps, promotional offers and social media incentives (Hull 2015). In this case, Nunan and Di Domenico (2013,p.6) claim that “while privacy concerns have been raised over the use and creation of big data, these have been outpaced by individuals’ use of social networks”. Although consumers increasingly share personal information in e-commerce sites or in social networks, it is expected for firms not to breach consumers’ privacy because consumers disclose information expecting it to be confidential under ‘terms of use’(Martin 2015). Despite consumers’ expectations for anonymous data to protect their privacy, the extant review on BDA identifies a new wave technologies and tools (e.g., facial recognition software) to de-anonymize and re-identify people in data economy. It raises serious concerns on the use of so called big public data (Boyd and Crawford 2012). As a result, there is an urgent research call to protect privacy both technologically and legally in the era of biometric and genomic big data research (Kaplan 2014).

In addition to privacy, big data creates serious security challenges as consumers are completely unaware of how their data are being used by whom and for what purposes. In this context, Vaidhyanathan and Bulock (2014) raised question on the validity of monitoring buying behaviour online and the extent to which consumers are knowledgeable and aware of this sort of monitoring. Firms within big data industry are often involved in creating an aggregated negative externality because they jointly contribute to the development of a large system of surveillance (Martin 2015). Any such surveillance and potential revelations (e.g., Edward Snowden’s revelation of PRISM program and its involvement with big corporations-Yahoo, Google, Facebook, Microsoft, Apple etc.) call attention to the security of private information (Bankston and Soltani 2014; Schneier 2013). In this context, Google has recently introduced “the right to be forgotten” policy in the European Union, which allows an individual to remove irrelevant personal data from its search results. The realm of big data-sharing agreements still remains informal, poorly structured, manually enforced, and linked to isolated transactions (George et al. 2014; Pantelis and Aija 2013). Thus, to succeed in the emerging big data environment, e-commerce firms need to be responsible to handle autonomy and informed consent and ensure privacy and security of data.

Another key challenge of the big data environment is to find the skills, such as technical, analytical, and governance skills as well as the networked relationships needed to operationalize big data (Davenport et al. 2012; Kiron et al. 2014a; Schroeck et al. 2012). However, it is not easy to find all these skills in one person. As argued by McAfee and Brynjolfsson (2012) and Kiron et al. (2014a), the enormous amount of big data needs to be captured, integrated, cleaned, and visualized; therefore, the technical and analytical skills of the data scientist (e.g., statistical, contextual, quantitative, predictive, and cognitive skills, and other related knowledge) are critical. In addition, these scientists should be conversant with business and governance issues, and should have the skills to communicate in the language of business. According to the sociomaterialism theory in IT (Orlikowski and Scott 2008), organizational (i.e., BDA management), technological (i.e., IT infrastructure), and talent (e.g., analytics skill or knowledge) dimensions of analytics are so interwoven that it is difficult to measure their individual contribution in isolation (Orlikowski and Scott 2008). Therefore, it is essential to develop big data analytics capability focusing on sophisticated technology, robust talent and analytics driven management culture.

Overall, the lack of organizational ability to articulate a solid and compelling business case is likely to be an overarching challenge for BDA in e-commerce. Both in academia and practice, researchers (e.g., Hayashi 2014; Kiron et al. 2014b; Wamba et al., 2015 reported that a fascinating case for business opportunities with measureable benefits is the key challenge of big data. As most organizations are facing the dilemma of how to use big data, we advise that the first step is to spend time in creating a simple plan for how data, analytics, frontline tools, and people can come together to create business value (Biesdorf et al. 2013). Specifically, in a similar spirit with Davenport (2013b), we suggest developing a research blueprint by recognizing the problem, reviewing past findings, identifying the variables and developing the model, collecting and analyzing the data and making decisions on actionable insights. Overall, there is wealth of possibilities to address exciting, non-trivial questions in e-commerce and we offer some illustrations in Table 7 focusing on specific research streams. However, the assessment of exciting, blue ocean research question in e-commerce should be based on fit, rigor, story, theory and economic and social significance.

Conclusion

Big data analytics (BDA) has emerged as the new frontier of innovation and competition in the wide spectrum of the e-commerce landscape due to the challenges and opportunities created by the information revolution. Big data analytics (BDA) increasingly provides value to e-commerce firms by using the dynamics of people, processes, and technologies to transform data into insights for robust decision making and solutions to business problems. This is a holistic process which deals with data, sources, skills, and systems in order to create a competitive advantage. Leading e-commerce firms such as Google, Amazon, eBay, ASOS, Netflix and Facebook have already embraced BDA and experienced enormous growth. Through its systematic review and creation of taxonomy of the key aspects of BDA, this study presents a useful starting point for the application of BDA in emerging e-commerce research. The study presents an approach for encapsulating all the best practices that build and shape BDA capabilities. In addition, the study reflects that once BDA and its scope are well defined; distinctive characteristics and types of big data are well understood; and challenges are properly addressed, the BDA application will maximize business value through facilitating the pervasive usage and speedy delivery of insights across organizations.

Notes

A ‘black swan’ is also known as an extreme outlier: it is basically the disproportionate quality of a high-impact, hard-to-predict, rare event that, in general terms, people do not predict to happen, an event that is beyond the scope of detection by traditional analytics (Ferguson 2012).

References

Author information

Authors and Affiliations

  1. University of Wollongong, NSW, 2500, Australia Shahriar Akter
  2. NEOMA Business School, Rouen, 76825, France Samuel Fosso Wamba
  1. Shahriar Akter