What is Commodity Hardware?
"Hadoop runs on commodity hardware." This sentence is often heard in discussions about Hadoop, but what precisely does it mean?
Just as the definition of "big" in "Big Data" may be relative to the
company or industry, so will the definition of "commodity" in
"commodity hardware" be relative to a given point in time or a given
industry, but several general points can be made.
Commodity hardware in general
- Commodity hardware has an average amount of computing resources; it is not considered a "sports car" in its field.
- "Commodity hardware" does not imply low quality, but rather, affordability.
- One common, though not necessary, feature of commodity hardware is
that over time it is widely used in roles for which it was not
specifically designed, as opposed to purpose-built hardware.
Commodity hardware in the context of Hadoop
- Hadoop clusters are run on servers.
- Most commodity servers used in production Hadoop clusters have an
average (again, the definition of "average" may change over time) ratio
of disk space to memory, as opposed to being specialized servers with
massively high memory or CPU.
- The servers are not designed specifically as parts of a distributed
storage and processing framework, but have been appropriated for this
role in Hadoop.
Examples of Commodity Hardware in Hadoop
An example of suggested hardware specifications for a production Hadoop cluster is:
- four 1TB hard disks in a JBOD (Just a Bunch Of Disks) configuration
- two quad core CPUs, running at least 2-2.5GHz
- 16-24GBs of RAM (24-32GBs if you're considering HBase)
- 1 Gigabit Ethernet
(Source: Cloudera)
Or, for a more powerful cluster
- six 2TB hard disks, with RAID 1 across two of the disks
- two quad core CPUs
- 32-64GBs of ECC (Error Correcting Code) RAM
- 2-4 Gigabit Ethernet
(Source: OpenLogic, slide 15)
Additional Links