John Ousterhout is a professor of Deparment of Computer Science from Stanford University. One recent project he is working on is the RAMCloud, a “new class of storage, based entirely in DRAM, that is 2-3 orders of magnitude faster than existing storage systems”. He posts his “Favorite Sayings” on his homepage. These sayings are precious… read more
When dealing with environments where memory is a constraint it is important to intelligently design memory usage. Be it embedded systems or supercomputers memory is always expensive. And with each boolean value using a byte it actually wastes a lot of memory. If not for addressability in languages like C and C++ booleans could have… read more
Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean. Slides download: Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean Numbers Everyone Should Know ∞ L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns Mutex lock/unlock 100 ns Main memory reference 100 ns Compress 1K bytes… read more
Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed and implemented by Google for processing and generating large data sets . HDFS is Hadoop’s underlying data persistency layer, which is loosely… read more
Here is a list of tutorials for learning how to write MapReduce programs on Hadoop, the opensource MapReduce implementation with HDFS. MapReduce Tutorials ∞ The official tutorial on Hadoop MapReduce framework: http://hadoop.apache.org/docs/r1.0.4/mapred_tutorial.html. Yahoo! Hadoop Tutorial ∞ A comprehensive tutorial on Hadoop from Yahoo! Developer Network: http://developer.yahoo.com/hadoop/tutorial/. More about MapReduce ∞ To better understand the design… read more
I compiled a list of good systems conferences and deadlines for my own reference. Here I share the list and hope it can help others who also need such a list. This list is kept updated. A PDF version: Systems Conference and Deadlines. Links to conference websites: Systems Conferences.
Storage Architecture and Challenges in Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer. Download PDF. This slides introduces some of Google’s storage systems with insights and discussion of problems.
Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean. Everyone who is interested in large distributed systems should read: PDF for Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.
MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important. Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark. During our work on MapReduce, we developed a benchmark suite… read more
TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark. TeraGen generates random data that can be used as input data for a subsequent running… read more