What is "hadoop"

Hadoop is an ecosystem for distributed computing, that is, created to handle the processing of large amounts of data (petabytes) at high speed. This ecosystem is composed of several systems/ technologies.

Today Hadoop is maintained by the Apache Foundation. And has as Enterprise distributions best known to Cloudera and Hortonworks.

Some components that may be part of Hadoop:

HDFS - Hadoop file system, this file system works in a distributed way, using large memory blocks.

Map Reduce -Programming model for large-scale processing. Based on mapping(map) and reduction (reduce).

Yarn -It is a resource management platform responsible for cluster management of computational resources, as well as resource scheduling.

Hive -Converts SQL queries to Mapreduces.

Pig -Language for creating Mapreduces

Hbase - A column-oriented Nosql database (columnar) that can be used over the HDFS. Provides access to large amounts of high-speed data.

Flume -Log export system, containing large amount of data to the HDFS

Anbari -Monitoring of Hadoop clusters

Sqoop -DBMS data export tool for Hadoop. Uses JDBC, generates a Java class of data export for each table in the relational schema

Oozie / Control-M -Scheduler/Task Manager and Workflows for Hadoop.