What Big Data is – Part I

The goal of organizations is always aimed at growth in terms of revenue and reputation. In search of new opportunities for growth, organizations rely more heavily on insights from the everyday business operations, internal activities, business transactions, and their customers from whom they get the feedback about their product or service.

This process uncovers and ascertains the insights that are required by the organizations during which colossal and complex sets of data are generated. Highly skilled professionals are needed to manage, analyze, manipulate and process these data to get meaningful values that drive the growth perspective of organizations. This collective process of data compilation is termed as Big Data.

How Big Data is perceived

According to most of the professionals, the volume of data is considered to be the benchmark for labeling. If the data volume is measured in multiple terra bytes or petabytes, it is deemed to be Big Data. Some others argue that Big Data in various terra or petabytes today would become standard data shortly with the day to day technological advancements. Some even explore further to justify Big Data with a contextual reference. While doing so, if any data that could not be handled to keep up the pace with the available human and technical infrastructure, they labeled it as Big Data.

Definition of Big Data

Perceptions can give an idea of what Big Data is but some other facts also are taken into consideration to define Big Data. Big Data is not just about the volume but the complexity of data that matters. For instance, several datasets, classified as Big Data, do not consume a large physical space but their complexity, in particular, makes them be considered big. On the other hand, some datasets that consume enormous physical space are not deemed to be Big Data since they are not complex enough in their nature. For a better understanding, it is usually defined as the three V's, Volume, Variety, and Velocity.

  • Volume – many factors contribute data volume. Data generated from transaction-based processes for years, streamed unstructured data from social media, like facebook, twitter, vk, etc., collection of machine to machine data and sensor scanned data are some of the contributors of Big Data. In earlier times, storage of data was a major issue but with diminishing storage costs, it's been overpowered. Other issues, like ascertaining the relevance within large data volumes and creating values from relevant data with the help of analytics, emerged rapidly.

  • Variety – Streaming of data cannot be limited to particular formats. Data can be structured, numeric and in a regular format or unstructured text document with special characters, images, and email, audio & video files, stock ticker data, financial transactions, and much more. The task of managing these datasets and retrieving values from them using analytics remains a grapple for the majority of the organizations.

  • Velocity – Data streams flood in with an incredible speed that is unprecedented. It has to be dealt quickly and appropriately parallel to the velocity, as and when it comes along the way, and most of the organizations find it highly challenging. RFID tags, sensors, and smart metering are some of the means that deal with data torrents almost on time.

Apart from the three V's, some more attributes like Veracity, Variability and Complexity are also used to define Big Data.

  • Veracity – It's an indication of data integrity and the ability of an organization to trust the data. And it's phenomenal how they gain confidence in using the data to retrieve relevant values to use them in crucial decision-making processes.

  • Variability – It's obvious that data stream is increasing with velocity and variety. Additionally, the data flow is highly inconsistent with changing peaks. If an issue is rapidly trending in social media, then the data streaming will also be increasing exponentially. This can happen on everyday or seasonal basis or based on some triggered events. Data loads flooding in these time segments are highly challenging to manage, even for structured data. Even more so with unstructured data involved.

  • Complexity – Sources for data are immense today and for complexities as well. Data linking, matching, cleansing and transforming are remaining as a pledge. However, it is of utmost importance to link and correlate the relationships, data linkages and hierarchies of data failing that will result in losing control over the data.

The following facts will help to perceive better on how much data is being generated and managed by Big Data systems:

  • A statistics from IBM reveals that 2.5 billion quintillion bytes of data is being created every day by the users, which edifies to the calculation that 90% of data in today's world has been produced in last two years alone.

  • Two billion accounts across the globe is been protected with the help of credit card fraud system, which is currently in place, says FICO

  • Walmart has a database of over 2.5 petabytes of information in which it transfers over 1 million customer transactions every hour

  • Human genome, the complete set of genes or genetic material present in a cell or organism, can now be decoded in less than one week, which was earlier a feat that originally took ten years to complete

  • Facebook, in its user database, currently holds more than 45 billion photos and this number grows rapidly every day

  • UPS tracks data on 16.3 million packages per day for 8.8 million customers, with an average of 39.5 million tracking requests from customers per day. The company stores nearly 20 petabytes of data

What Big Data actually mean

Big Data is quite a behemoth in size, and that is not what it actually means to anyone who is about to handle the data. It's about what could be done with the data that matters. The goal is that organizations can obtain data from any possible source, harness the data and analyze it to find possible solution that could lead them to:

  • Cost reduction

  • Reduce the processing time

  • Developing new products and services

  • Optimizing offerings

  • Smarter business decision-making

The following can be achieved by combining Big Data and high-powered analytics:

  • Potentially saving billions of dollars annually by determining the cause of failures, issues, and defects

  • Millions of SKUs can be analyzed to determine prices to maximize profit and inventory clearance

  • Using customers' purchases to explain the trend and generate coupons and vouchers to attract customers to new sales

  • Route optimizations for delivery vehicles which are numerous in number, say in thousands while they are on road

  • While customers are in the right area, they can be sent recommendations about attractive offers to their mobile devices

  • Within minutes, entire risk portfolios can be recalculated

  • Identify most valued customers in no time

  • Detect fraudulent behavior using data mining and quick stream analysis

Keeping these facts in mind, the sources from where Big Data is obtained, handling and management of Big Data and other issues are briefly discussed in the continuation part of this article. Stay tuned.

Subscribe to get updates from us