Setting Up Standalone (Local) Hadoop

Hadoop is designed to run on [[hadoop-installation-tutorial|hundreds to thousands of computers]] inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a nightmare. This post introduces how to set up a standalone Hadoop environment.

1. Hadoop package and software installation

Follow the instruction of “1. Install needed packages” part in [[hadoop-installation-tutorial|Hadoop Installation Tutorial]] to install packages. Fllow “4. Hadoop Concigurations” to configure hadoop-env.sh (this file only).

2. Just run Hadoop!

Just run hadoop jobs whose input and output is in local directories. We use a simple example to show how to start a Hadoop job.

The example finds and displays every match of the given regular expression. Output is written to the given output directory.

$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-mapred-examples-0.21.0.jar grep input output '[a-z.]+'
$ cat output/*

The jar file’s name may be different depending on the Hadoop distribution’s version.

Is it simple? Enjoy it and go further to play [[hadoop-installation-tutorial|Fully-distributed Hadoop Installation]].

1. Hadoop package and software installation

2. Just run Hadoop!

Eric Ma