This page may be out of date. Submit any pending changes before refreshing this page.
Hide this message.
Quora uses cookies to improve your experience. Read more

What are the best methods for testing big data applications?

8 Answers
Sateesh Rai

Testing Big Data application is more a verification of its data processing rather than testing the individual features of the software product. When it comes to Big data testing, performance and functional testing are the key.

In Big data testing QA engineers verify the successful processing of terabytes of data using commodity cluster and other supportive components. It demands a high level of testing skills as the processing is very fast. Processing may be of three types

  1. Batch
  2. Real Time
  3. Interactive

Along with this, data quality is also an important factor in big data testing. Before testing the application, it is necessary to check the quality of data and should be considered as a part of database testing. It involves checking various characteristics like conformity, accuracy, duplication, consistency, validity,data completeness, etc.

Big Data Testing can be broadly divided into three steps:

Step 1: Data Staging Validation

The first step of big data testing, also referred as Pre-Hadoop stage involves process validation.

  1. Data from various source like RDBMS, weblogs etc. should be validated to make sure that correct data is pulled into system.
  2. Comparing source data with the data pushed into the Hadoop system to make sure they match.
  3. Verify the right data is extracted and loaded into the correct HDFS location

Step 2: "Map Reduce"Validation

The second step is a validation of "Map Reduce". In this stage, the tester verifies the business logic validation on every node and then validating them after running against multiple nodes, ensuring that the

  1. Map Reduce process works correctly
  2. Data aggregation or segregation rules are implemented on the data
  3. Key value pairs are generated
  4. Validating the data after Map Reduce process

Step 3: Output Validation Phase

The final or third stage of Big Data testing is the output validation process. The output data files are generated and ready to be moved to an EDW (Enterprise Data Ware-house ) or any other system based ont he requirement.

Activities in third stage includes

  1. To check the transformation rules are correctly applied
  2. To check the data integrity and successful data load into the target system
  3. To check that there is no data corruption by comparing the target data with the HDFS file system data

Architecture Testing

Hadoop processes very large volumes of data and is highly resource intensive. Hence, architectural testing is crucial to ensure success of your Big Data project. Poorly or improper designed system may lead to performance degradation, and the system could fail to meet the requirement. At least, Performance and Fail-Over test services should be done in a Hadoop environment.

Performance testing includestesting of job completion time, memory utilization, data throughput and similar system metrics. While the motive of Failover test service is to verify thatdata processing occurs seamlessly in case of failure of data nodes

Performance Testing

Performance Testing for Big Data includes following actions

  1. Data ingestion and Throughout: In this stage, the tester verifies how the fast system can consume data from various data source. Testing involves identifying different message that the queue can process in a given time frame. It also includes how quickly data can be inserted into underlying data store for example insertion rate into a Mongo and Cassandra database.
  2. Data Processing: It involves verifying the speed with which the queries or map reduce jobs are executed. It also includes testing the data processing in isolation when the underlying data store is populated within the data sets. For example running Map Reduce jobs on the underlying HDFS
  3. Sub-Component Performance: These systems are made up of multiple components, and it is essential to test each of these components in isolation. For example, how quickly message is indexed and consumed, map reduce jobs, query performance, search, etc.

If you have enjoyed reading this answer, make sure to Up-Vote and follow me for more: Sateesh Rai

Source: Guru99

Pratik Patel

Big Data is one of the most trending term used these days, as most of the organizations deals with large number of data sets, which is quite complex to manage or handle. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capture, manage, and process data within a tolerable elapsed time. Big data has increased the demand of information management specialists.

Testing of Big Data applications therefore is a very necessary process to be followed, so as to maintain as well as manage the important characteristics of big data like Volume, i.e. data size, Velocity, i.e. speed of change and Variety of data sources.

One of the best method which can be used for testing big data apps is testing with Automation. There are limited testing tools used to automate big data testing. One of the good example is Testing-Whiz. This is a code-less test automation tool with multi-functionalities like Web services testing, Database testing, Big data testing, Cross-browser testing, etc. with automation.

Specifically talking about big data testing, TestingWhiz provides automated Big Data testing solution, which helps you to verify structured and unstructured data sets, schemas, approaches and inherent processes residing at different sources in your application. This also helps you to validate volume, variety and velocity of data. For better clarity, you can visit the website & download its Free trial version of this tool and experience automated testing of big data applications.

Few more tools like Query surge, Tricentis are also used for big data testing.

Vidya Sagar Panati

'The next best thing to testing is developing' ~Self.

I believe testing is the cynosure to any product, service or process.

There are three ways to approach Big Data Testing.

  1. The Data Test - For quality, quantity, value, veracity.
  2. The Functional Test - For validating the functional/business functions
  3. The Performance Test - For speed, scalability and rigidity

These three can again be split into the Entry and Exit points at various stages of what your Big Data product is trying to achieve. For example, you can have a data test at the Entry point to HBase or HDFS and you can have a data test at the Exit point of Elastic Search.

Apart from this you could look at Automating all this for the below reasons:

  1. Too much data at stake
  2. Too many scenarios to be covered.
  3. Too little time for regression
  4. Too sensitive and real-time services involved.

If you need a ono-one on Big Data testing, do ping me...

View More Answers