Hadoop TeraSort Benchmark
TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.
TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.
Generate input by TeraGen
The syntax for TeraGen:
$ hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>
Run TeraSort
After the data is generated, run the sort by TeraSort
$ hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>
By: Zhiqiang Ma
Last updated: Jan 3, 2013
764 views
Tags: Cloud computing, Hadoop, java, MapReduce, Sort, System, TeraSort
Tags: Cloud computing, Hadoop, java, MapReduce, Sort, System, TeraSort