Nowadays, Companies need to process Multi Petabyte Datasets efficiently. The Data may not have strict schema for the large system. It has become Expensive to build reliability in each Application for processing petabytes of datasets. If there is a problem of Nodes fail every day, some of the causes of failure may be. Failure is expected, rather than exceptional. The number of nodes in a cluster is not constant. So there is a Need for common infrastructure to have Efficient, reliable, Open Source Apache License. The solution is Hadoop. Hadoop is based on distributed computing having HDFS file system (Hadoop Distributed File System). Hadoop is highly fault-tolerant and can be deployed on low cost hardware. Hadoop is very much suitable for high volume of data and it also provide the high speed access to the data of the application which we want to use. hadoop architecture is cluster based, which is consist of nodes(data note, name node), physically separate to each other, in ideal condition. The performance of hadoop can be increased by proper assignment of the tasks in the default scheduler. In hadoop a program known as map-reduce is used to collect data according to query. As hadoop is used for huge amount of data therefore scheduling in hadoop must be efficient for better performance. The research objective is to study and analyse various scheduling techniques, which are used to increase performance in hadoop.
Copyright © 2023 IJRTS Publications. All Rights Reserved | Developed By iNet Business Hub