Nowadays, Companies need to process Multi Petabyte Datasets efficiently. The Data may not have strict schema for the large system. It has become Expensive to build reliability in each Application for processing petabytes of datasets. If there is a problem of Nodes fail every day, some of the causes of failure may be. Failure is expected, rather than exceptional. The number of nodes in a cluster is not constant. So there is a Need for common infrastructure to have Efficient, reliable, Open Source Apache License. The Hadoop platform was designed to solve problems where you have a lot of data perhaps a mixture of complex and structured data and it doesn’t fit nicely into tables. It’s for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting. That’s exactly what Google was doing when it was indexing the web and examining user behavior to improve performance algorithms. This article has made an attempt to study its need, uses and application, thereby brought to the notice of the readers.
Copyright © 2023 IJRTS Publications. All Rights Reserved | Developed By iNet Business Hub