The Apache Software Foundation (ASF) announced Apache® Hadoop® v3.0.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing.
“…today Hadoop is the only cost-sensible and scalable open source alternative to commercially available Big Data management packages. It also becomes an integral part of almost any commercially available Big Data solution and de-facto industry standard for business intelligence (BI).” — MarketAnalysis.com/Market Research Media
Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment.
“This latest release unlocks several years of development from the Apache community,” said Chris Douglas, Vice President of Apache Hadoop. “The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.”
“Hadoop 3 is a major milestone for the project, and our biggest release ever,” said Andrew Wang, Apache Hadoop 3 release manager. “It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I’m looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform.”
Apache Hadoop 3.0.0 highlights include:
HDFS erasure coding —halves the storage cost of HDFS while also improving data durability;
YARN Timeline Service v.2 (preview) —improves the scalability, reliability, and usability of the Timeline Service;
YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and
Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.
Hadoop 3.0.0 has already undergone extensive testing and integration with the broader Open Source ecosystem at The Apache Software Foundation. With this release, its community of developers and users promote this release series out of beta.
Apache Hadoop is widely deployed at numerous enterprises and institutions worldwide, such as Adobe, Alibaba, Amazon Web Services, AOL, Apple, Capital One, Cloudera, Cornell University, eBay, ESA Calvalus satellite mission, Facebook, foursquare, Google, Hortonworks, HP, Hulu, IBM, Intel, LinkedIn, Microsoft, Netflix, The New York Times, Rackspace, Rakuten, SAP, Tencent, Teradata, Tesla Motors, Twitter, Uber, and Yahoo.
Catch Apache Hadoop in action at the Strata Data Conference in San Jose, CA, 5-8 March 2018, and at dozens of Hadoop Meetups held around the world.
Leave a Reply