Big data applications are vital for storing and processing large amounts of data. Depending on the type of organisation, big data sets can be used to better understand customer needs and preferences. Amazon stores, for example are using big data applications to process data of its customers and provide them with recommendations for future purchases. Its advertisements are also tailored with customer purchasing trends and their history.
Hadoop is one of the applications for big data analysis, which is quite popular for its storage system that is Hadoop distributed file system (HDFS). It is a Java-based open source framework which stores big datasets in its distributed file system and processes them using MapReduce programming model. Since the last decade, the size of big datasets has increased exponentially, going up to exabytes. Furthermore, even in a small organisation, big datasets range from hundreds of gigabytes to hundreds of petabytes (1 petabyte = 1000 terabytes = 1000000 gigabytes). When the size of datasets increase, it becomes more difficult for traditional applications to analyse them. That is where frameworks like Hadoop and its storage file system come into play (Taylor, 2010).
With increasing use of big data applications in various industries, Hadoop has gained popularity over the last decade in data analysis. It is an open-source framework which provides distributed file system for big data sets. This allow users to process and transform big data sets into useful information using MapReduce Programming Model of data processing (White, 2009).
Most part of hadoop framework is written in Java language while some code is written in C. It is based on Java-based API. However programs in other programming languages such as Python can also use the its framework using an utility known as, Hadoop streaming. Read more »
Big data software and big data has been a buzzword in the computing era for over a decade now. It is a term used for large and complex data sets which is difficult to be processed and analysed by traditional data processing software. These large data sets can be structured or unstructured. The data comes from various sources such as:
- social media,
- scientific applications,
- sensors, surveillance,
- video and image archives.
These large data sets are analysed to find hidden patterns, relations between pieces of data, market trends or other information. But, in the end, it is not about the amount of data, it is about what the organization does with the data (Boyd & Crawford, 2011).