Text mining as a better solution for analyzing unstructured data

Text mining is a sub-division of data mining that is used in recognizing hidden patterns and correlation in large amount of data. It is also known as text data mining, intelligent text analysis and knowledge discovery in text. It is related to extracting useful information from unstructured text data. Gupta & Lehal (2009) have regarded text mining as new interdisciplinary area which is an amalgamation of data mining, information retrieval, machine learning, computer linguistic and statistics. There are many applications of text mining. It is a valuable resource in social networking and blogging, customer relations management, tracking public opinion and text filtering (Mostafa, 2013). Text mining is popular in the biomedical field also. Many practitioners have developed several bioinformatics data mining toolboxes for computational biology. In addition it deals with the text related to biology, medicine and chemistry.

Read more »

Importing data into hadoop distributed file system (HDFS)

Hadoop is one of the applications for big data analysis, which is quite popular for its storage system that is Hadoop distributed file system (HDFS). It is a Java-based open source framework which stores big datasets in its distributed file system and processes them using MapReduce programming model. Since the last decade, the size of big datasets has increased exponentially, going up to exabytes. Furthermore, even in a small organisation, big datasets range from hundreds of gigabytes to hundreds of petabytes (1 petabyte = 1000 terabytes = 1000000 gigabytes). When the size of datasets increase, it becomes more difficult for traditional applications to analyse them. That is where frameworks like Hadoop and its storage file system come into play (Taylor, 2010).

Read more »

Major functions and components of Hadoop for big data

With increasing use of big data applications in various industries, Hadoop has gained popularity over the last decade in data analysis. It is an open-source framework which provides distributed file system for big data sets. This allow users to process and transform big data sets into useful information using MapReduce Programming Model of data processing (White, 2009).
Most part of hadoop framework is written in Java language while some code is written in C. It is based on  Java-based API. However programs in other programming languages such as Python can also use the its framework using an utility known as, Hadoop streaming. Read more »

Preferred big data software used by different organisations

Big data software and big data has been a buzzword in the computing era for over a decade now. It is a term used for large and complex data sets which is difficult to be processed and analysed by traditional data processing software. These large data sets can be structured or unstructured. The data comes from various sources such as:

  • social media,
  • scientific applications,
  • sensors, surveillance,
  • video and image archives.

These large data sets are analysed to find hidden patterns, relations between pieces of data, market trends or other information. But, in the end, it is not about the amount of data, it is about what the organization does with the data (Boyd & Crawford, 2011).

Read more »

Importance of big data in the business environment of Amazon

Supply chain management and logistics are the crucial part of the business processes. It is the logistics and the supply chain management that manages the distribution, storage, transportation and packaging as well as delivery of the items. Big data plays an important role in managing the logistics and supply chain management (Ghosh 2015). This article is aimed to highlight the importance of big data in supply chain and logistic taking the special case of Amazon. Read more »

Difference between traditional data and big data

It has become important to create a new platform to fulfill the demand of organizations due to the challenges faced by traditional data. By leveraging the talent and collaborative efforts of the people and the resources, innovation in terms of managing massive amount of data has become tedious job for organisations. This can be fulfilled by implementing big data and its tools which are capable to store, analyze and process large amount of data at a very fast pace as compared to traditional data processing systems (Picciano 2012). Big data has become a big game changer in today’s world. The major difference between traditional data and big data are discussed below. Read more »

Understanding big data and its importance

Complex or massive data sets which are quite impractical to be managed using the traditional database system and software tools are referred to as big data. Big data is utilized by organizations in one or another way. It is the technology which possibly realizes big data’s value. It is the voluminous amount of both multi-structured as well unstructured data. Unstructured data is the one that is not organized and thereby cannot be interpreted by using a software or traditional database (Sawant & Shah 2013). Read more »

Security threats and legal issues related to cloud computing

The common buzzword of Information Technology era during the last decade is “Cloud Computing”, with many world-market players shaping the field, such as Amazon Elastic Computing Cloud (Amazon EC2), Skype, Box.com, Dropbox, Twitter, Facebook and chatter.com. Read more »

We are looking for candidates who have completed their master's degree or Ph.D. Click here to know more about our vacancies.