Hadoop TeraSort Benchmark

By Zhiqiang Ma On Jan 3, 2013

TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.

TeraGen generates random data that can be used as input data for a subsequent running of TeraSort.

Generate input by TeraGen

The syntax for TeraGen:

$ hadoop jar hadoop-*examples*.jar teragen <number of 100-byte rows> <output dir>

Run TeraSort

After the data is generated, run the sort by TeraSort

$ hadoop jar hadoop-*examples*.jar terasort <input dir> <output dir>
By: Zhiqiang Ma Last updated: Jan 3, 2013 764 views
Tags: , , , , , ,

About Zhiqiang Ma

Zhiqiang Ma is a PhD candidate at Dep. of CSE, HKUST. He is interested in system software for cloud computing, virtualization of large-scale distributed system, etc. Find Zhiqiang on Facebook, Twitter, LinkedIn and Google+.

Add your comments, share your thoughts

Be nice. Keep it clean. Stay on topic. No spam.