Open a ticket
Chat with us
BLOG Published on 2017/01/08 by Asitha De Silva in Tech-Tips

What is Big Data?

The amount of data in the world is multiplying rapidly. With the development of technology, data is generated more than ever and big organizations are planning to change their businesses using these massive amount of data. First, it will change the shape of businesses and then people's life. This large amount of data which is generated from multiple sources can be identify as big data. Big data is generated from both traditional and new digital sources. Sometimes, we contribute to create big data without even noticing it. When we log into a website or mobile application, we usually use the available data in that particular website or app. But, simultaneously, while browsing we create new set of data as well. For instance, when you log into an e-commerce website like eBay or Amazon, you will generate new set of data such as customer location, payment details, search interests etc.

Big data consists of both unstructured and structured data. Structured data can be stored in traditional relational database management systems (RDBMS), spreadsheets etc. Unlike unstructured data, structured data can be simply derived from databases using SQL queries. On other hand, unstructured data cannot be stored in RDBMSs. Data stored in emails, video and audio files, images, twitter and Facebook post can be treated as unstructured data. Big data is too large and complex to manipulate with conventional software tools and technologies. Also big data doesn’t fit in columns and rows in RDBMSs, because of too large and diverse nature of that data. Organizations collect big data from various sources, such as computers, smartphones, tablets, sensors installed in vehicles, machineries etc. All these data which collected by organizations, especially unstructured data, cannot be used entirely because of the complexity. For organization, what really matters is, not the amount of data they owned, but what they can do with that data. Therefore, new set of skills and technologies required to manage the big data. 

Gartner, a research and advisory firm, describes big data as high volume, high velocity and/or high variety information assets that demand cost effective, innovative forms of information processing that enable enhanced insight, decision making, and process automation. Gartner analyst Doug Laney has introduced famous 3Vs model back in 2001.

Volume - refers to the amount of data. When it comes to big data, volume means massive amount of data. For instance, Facebook users upload 300 million photos every day, Twitter users create 500 million tweets per day.  Likewise, all big organizations generate large amount of data every day.


Velocity - refers the speed of data generated. Data is continuously generated every second, every minute across numerous sources such as e-commerce applications, social media sites, machineries, online gaming systems etc. For instance, 698,445 Google searches are made per minute.



Variety - refers to the type of data available. Big data can be vary from numbers, texts, dates to audio and video files, images, geographical locations, transaction log files etc. Most of these data are treated as unstructured data.





Importance of Big Data

Organizations accumulate big data from traditional and new digitals sources and stored. But, mostly, they make use of small amount of structured data to their business operations and large amount of data left without using them. With current market trends and competition, organizations need to consider to use of that unanalysed, unstructured big data to survive in the market and grow the business. Therefore, organizations have to concern about how they utilise their data resources in order to make smart management decisions and maximise business potentials. Here are the some benefits of big data.


  • Unlike before, customers do more searches before they buy any product or service. They ask questions about the product they want to buy in customer review sites, social media, online forums etc. Organizations can observe these customer behaviours and can be used this big data to know what customers want and streamline their production according to the demand.
  • Organizations can obtain information such as live traffic updates, weather predictions, road constructions works from news sites, televisions etc. And they can analyse these unstructured data and use them to manage their supply chain more efficient manner. Also it helps to optimise the customer services.
  • Big data can be used in healthcare sector to provide personalised medicine. Medical institutes generate diagnostic images, genetic test results, biometric information and stored them in electronic health records. Doctors can use these data to better understand of individual patients and provide more personalised medicine. For instance, cancer patients could get prescribed medications based on his genetic test results. This big data powered personalised medicine process leads to take more efficient clinical decisions and eventually, it will reduce the healthcare cost as well.
  • Risk management is an important part of every organization. Big data can be used to avoid the financial risks before they affect to the organization. Large amount of data allows to understand the new market developments and trends, later organizations can take necessary actions based on the market behaviour in order to avoid the future risks and also to enhance the business. Big data also can be used to fraud detection with the help of pattern recognition.
  • Big data sets such as traffic statistics, energy consumption rates, GPS mapping can be used for better city management. Analysis of energy consumption data allows to reduce the energy usage. Few years ago, IBM has introduced three cloud based smarter city management centres for transportation, water and emergency management which help to provide good citizen services and improve decision making.  






Big Data Technologies


Big data is vast and diverse, it can be consist of unstructured, semi-structured or structured data formats. These large data sets are not fit with RDBMSs, mainly because of high volume and diverse nature of data. RDBMSs use structured query language (SQL) to do the database operations. But big data comes in terabyte or petabyte like range and RDBMSs cannot cope with that amount of data volumes with traditional SQL operations. Organizations that plan to use big data, have to spend lots of money, if they decide to store big data in RDBMSs. Therefore, organizations required to go for new technologies to gain better benefits from big data. Forrester, a market research company, has introduced 22 big data technologies which promised enhance of business growth. Here are some popular technologies used to handle the big data.


NoSQL Databases


NoSQL (Not only SQL) databases are non-relational database systems which use to do database related operations such as storage, retrieval on large volume of unstructured, semi-structured or structured data. NoSQL databases, also referred as Big Data databases are able to handle the operations which cannot be done with relational databases. Therefore, NoSQL databases become the first alternative to relational databases such as Oracle, Microsoft SQL Server, DB2 etc. There are four types of NoSQL databases can be found, namely Key value store, Document database, Graph database and Column store.


Hadoop


Hadoop is an open source software framework which designed to store and analyse big data across several thousand nodes. MapReduce is the core component of the Hadoop software framework. Hadoop is considered as most popular implementation of MapReduce. Hadoop can be used to store large amount of data and it also enables fast processing of data. Hadoop allows to store any amount of unstructured data such as text, images, video and audio files without pre-process and able to manipulate them later. Since, Hadoop is an open source software framework, operation cost can be reduced.


Data Virtualization


In data virtualization, you are allowed to access the integrated data which abstracted from various data sources. These well-integrated data which generated through data virtualization, can be used for analytics and operations. Data virtualization allows organizations to extract value from large data volumes efficiently.




Despite the size of the business, big data effects to any type of business. Nowadays, every businesses, range from small shops to multinational companies, generate big data and that data will become an asset to the organizations. Good use of data helps to reduce the operational cost and increase the revenue of an organization. Therefore, none of the organization can think of the future growth of their businesses without considering the big data trends. The potential of big data is limitless, therefore, organizations are required to have big data experts and know-how to unblock them. 




Asitha De Silva

Consultant Cloud Solutions

Expert in architecting and implementing cloud-based infrastructure solutions.

Newsletter

To keep up with the news and updates related to our products, make sure to subscribe to our newsletter!

Copyright © 2024 Terminalworks. All Rights Reserved