Big data software and big data has been a buzzword in the computing era for over a decade now. It is a term used for large and complex data sets which is difficult to be processed and analysed by traditional data processing software. These large data sets can be structured or unstructured. The data comes from various sources such as:
- social media,
- scientific applications,
- sensors, surveillance,
- video and image archives.
These large data sets are analysed to find hidden patterns, relations between pieces of data, market trends or other information. But, in the end, it is not about the amount of data, it is about what the organization does with the data (Boyd & Crawford, 2011).
Various platforms have emerged since the introduction of big data, out of which, some being similar to hadoop framework. While, others being completely different.
Using statistical analysis system (SAS) as a big data software
Statistical analysis system is a software suite which provides various products for analysis like business intelligence and data management. It was developed by North Carolina State University in 1966 and is currently managed by SAS Institute in North Carolina (SAS Institue Inc., 2014). It is one of the most popular applications used in big data analytics because of the following advantages (Salkind, 2010):
- Vast variety of products for various types of data handling.
- Works on multiple platforms.
- Reasonably cheaper than others.
The only concern for the user could be that it is not updated that frequently like other analytics platforms. The last update was about 3 years ago. Even though the software does not come with much fancy graphics However that should not be much of a concern because of its vast features. Some of the companies which use SAS for data analytics include:
- Genpact and
Using Sisense for big data with modern user interference
Sisense is an analytics and business intelligence software product which was developed and launched by a software company with the same name in the year of 2010 in New York City, USA (Sisense Inc., 2010). Sisense comes with lots of good features and a modern user interference (UI) that most of the analytics platforms lack. Some of the advantages of using Sisense as a big data software include:
- Modern and good looking graphical user interference (GUI).
- Easy for users without much technical knowledge.
- Drag and drop joining of data from multiple sources.
- Create data visualizations easily.
Sisense provides a host of features but also comes with a high pricing which appears to be its major disadvantage and also comes with a 30 days free trial. Some of the big tech companies which use Sisense for data analytics are:
- Henry Schein,
- and Sony.
Sisense has also been awarded with:
- Frost & Sullivan customer service excellence recognition (2016).
- Industry excellence award recipients from Dresner (2016).
- Total cost of ownership (TCOOTM) supplier award from Celestica (2015).
- Best IT partner (Business Wire Inc., 2016).
High-performance computing cluster (HPCC) systems
The High-performance computing cluster Systems is an open source and a massively scalable platform for big data. It was released by LexisNexis Risk Solutions, a subsidiary of REXL Group in 2011 after 10 years of development. It is popularly used by many famous enterprises like
- Cognizant and
High-performance computing cluster is preferred by organisations because of the following advantages (Middleton, 2011):
- Many benefits over Hadoop.
- Supports both parallel batch data processing and high-performance online query applications using indexed data files.
- Real-time performance.
The native binaries of High-performance computing cluster (HPCC) are coded in C++ unlike Hadoop which is based on Java Virtual Machine (JVM) (LexisNexis Risk Solutions, 2011). The major disadvantage with High-performance computing cluster (HPCC) systems platform is the licensing issue as it offers affero general public licence (AGPL) licensing which is not well regarded by the users.
Big data software by Talend
Talend is a software company, headquartered in Redwood City, California, USA. It provides a wide range of products such as big data software, cloud data integration, data preparation and master data management. Talend is one of the most preferred and used open-source ETL (Extract, Transform and Load) tool by companies like:
- Lenovo and
- Louvre Hotels.
It comes with the following advantages (Talend Inc., 2005):
- Easy to use tool for those who already know about Java.
- Provides flexible software solutions.
- User friendly graphical modelling environment and other functionalities not available in other ETL tools
The only disadvantage, is that the size of data to be processed is increasing exponentially every year. However Talend is not suitable for processing very huge amount of data.
Using Pentaho for big data without coding
Pentaho is yet another company, founded in 2004, which provides an open source ETL (Extract, Transform and Load) tool. These tools are for business intelligence, data integration, data mining and analytics of big data. Pentoha is also based on Java just like Talend and hadoop. However the storage and transformations are carried out in XML format. The software suite, also known as Pentoha Kettle comes with the following advantages (Pentoha Corporation, 2004):
- Graphical ETL designer to simplify the creation of data pipelines.
- Very reliable and fastest growing ETL tool.
- Requires less expertise in data integration field.
- Big data integration with zero coding required.
- Strong community support.
Although Pentoha is growing fast among businesses and has strong support from the open community, the suite comes with only some basic functionalities. It lacks a lot of features which other ETL tools such as Talend provide. Thus, it is a basic but easy ETL tool. Major companies which use Pentaho BI (business intelligence) suite are:
- Caterpillar Marine Asset Intelligence,
- Logitech and
- The New York Times.
Scope for improvement in big data software
Although these big data platforms provide best of the features, there are still many challenges that exist while analyzing big data in various fields. These platforms should focus on overcoming those challenges and promise to help businesses understand their customers in a better way. This will help in making better business decisions. Future developments in the field of big data might bring more features with less or more challenges.
- Boyd, D., & Crawford, K. (2011). Six Provocations for Big Data. A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society, 21, 1–17.
- Business Wire Inc. (2016). Sisense Takes Home Four Prestigious Industry Awards, Further Validating Technology Innovation and Market Leader Position. Retrieved from http://www.businesswire.com/news/home/20160824005214/en/Sisense-Takes-Home-Prestigious-Industry-Awards-Validating.
- LexisNexis Risk Solutions. (2011). HPCC Systems vs Hadoop Detailed Comparison. Retrieved from https://hpccsystems.com/why-hpcc-systems/hpcc-hadoop-comparison-big-data-software
- Middleton, A. M. (2011). HPCC Systems: Introduction to HPCC (High-Performance Computing Cluster).
- Pentoha Corporation. (2004). Big Data, Business Analytics & Data Integration | Pentaho. Retrieved from http://www.pentaho.com/product/product-overview.
- Salkind, N. J. (2010). Encyclopedia of Research Design. Retrieved from http://dx.doi.org/10.4135/9781412961288.
- SAS Institue Inc. (2014). About SAS. Retrieved from https://www.sas.com/en_in/company-information.html#history.
- Sisense Inc. (2010). About Sisense Business Intelligence Software & Data Analytics. Retrieved from https://www.sisense.com/company/.
- Sisense Inc. (2014). Sisense Features. Retrieved from https://www.sisense.com/business-analytics-software/.
- Talend Inc. (2005). Data Integration Platform & Products: Talend Products. Retrieved from https://www.talend.com/products/.