[Jan 3, 2013 | 10,444 views]
Hadoop mainly consists of two parts: Hadoop MapReduce and HDFS. Hadoop MapReduce is a programming model and software framework for writing applications, which is an open-source variant of MapReduce that is initially designed and implemented by Google for processing and generating large data sets [1]. HDFS is Hadoop’s underlying data ...
read more »
[Feb 5, 2013 | 2,989 views]
The mrcc project’s homepage is here: mrcc project.
Abstract
mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. mrcc is built to use Hadoop by default, but it is easy to port it to other could computing platforms, ...
read more »
[Jan 3, 2013 | 2,212 views]
This post lists important conferences on Cloud Computing in year 2012.
OSDI 2012
10th USENIX Symposium on Operating Systems Design and Implementation (OSDI ’12)
October 8–10, 2012, Hollywood, CA
“The tenth OSDI seeks to present innovative, exciting research in computer systems. OSDI brings together professionals from academic and industrial backgrounds in what has become ...
read more »
[Mar 13, 2013 | 1,106 views]
After installing Hadoop, we usually run some benchmark programs to test whether the system works well. In the post of Hadoop install tutorial, we show a very simple to grep strings from a simple sets of files. In this post, we introduce the Sort for testing and benchmarking Hadoop. The ...
read more »
[Jan 3, 2013 | 986 views]
TeraSort is one of Hadoop’s widely used benchmarks. Hadoop’s distribution contains both the input generator and sorting implementations: the TeraGen generates the input and TeraSort conducts the sorting. Here, we provide a short tutorial for using the Hadoop TeraSort benchmark.
TeraGen generates random data that can be used as input data ...
read more »
[Jan 3, 2013 | 915 views]
Hadoop’s namenode and datanodes expose a bunch of TCP ports used by Hadoop’s daemons to communicate to each other or listen directly to users’ requests. These ports information are needed by both the Hadoop users and cluster administrators to write programs or configure firewalls/gateways accordingly.
A post written by Philip Zeyliger from ...
read more »
[Feb 5, 2013 | 820 views]
Colossus is the successor to the Google File System (GFS) as mentioned in the recent paper on Spanner on OSDI 2012. Colossus is also used by spanner to store its tablets. The information about Colossus is slim compared with GFS which is published in the paper on SOSP 2003. There ...
read more »
[Mar 27, 2013 | 737 views]
This post lists important conferences related to Cloud Computing in year 2013.
SOSP 2013
SOSP’13: The 24th ACM Symposium on Operating Systems Principles. November 3-6, 2013, Nemacolin Woodlands Resort, Pennsylvania.
The biennial ACM Symposium on Operating Systems Principles is the world’s premier forum for researchers, developers, programmers, and teachers of computer ...
read more »
[Jan 3, 2013 | 681 views]
Hadoop is designed to run on hundreds to thousands of computers inside cluster. However, Hadoop is configured to run things in a non-distributed mode as a single Java process by default. This is specially useful for debugging since distributed debugging is really a ...
read more »
[Jan 3, 2013 | 450 views]
This post lists pitfalls and lessons learning when configuring and tuning Hadoop.
Hadoop with IPv6
Hadoo doesn’t support IPv6 currently (up to 0.20.2 and 0.21.0): Hadoop and IPv6. The performance of the cluster may suffer from turning IPv6 on in clusters: mail archive.
One good practice is to disable IPv6 on servers in ...
read more »
[Jan 3, 2013 | 421 views]
This post lists important conferences on Cloud Computing in year 2011.
ACM Symposium on Cloud Computing
October 27 and 28, 2011, Cascais, Portugal
Submission Deadline: April 30, 2011
23rd ACM Symposium on Operating Systems Principles (SOSP)
October 23-26, 2011, Cascais, Portugal
Submission deadline: March 18, 2011, 11:59 PM GMT
EuroSys 2011
April 10-13, 2011. Salzburg, Austria.
CLOUD COMPUTING ...
read more »
[Jan 3, 2013 | 271 views]
MapReduce is a well-known programming model designed for generating and processing large data. There are various MapReduce implementations. One widely known and used one may be Hadoop. Benchmarking MapReduce frameworks gets to be important.
Faraz Ahmad et al. developed a benchmark suite: PUMA MapReduce Benchmark.
During our work on MapReduce, we developed ...
read more »
[Feb 5, 2013 | 247 views]
Storage Architecture and Challenges in Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer.
Download PDF.
This slides introduces some of Google’s storage systems with insights and discussion of problems.
read more »
[Jan 3, 2013 | 221 views]
I am trying to find out the top conferences that have the largest average number of citations in the last 5 years on the Internet but fail to find one. However, there are many rankings about the overall citations and numbers of publications. Hence, it is not hard to calculate ...
read more »
[Feb 5, 2013 | 181 views]
Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing systems in datacenters as follows.
Storage systems
Google File System (GFS): http://research.google.com/archive/gfs.html
HDFS implementation: http://hadoop.apache.org/docs/stable/hdfs_design.html
Colossus (GFS2): http://www.fclose.com/b/cloud-computing/3202/colossus-successor-to-google-file-system-gfs/
BigTable: http://research.google.com/archive/bigtable.html
Megastore: http://research.google.com/pubs/pub36971.html
Spanner: http://research.google.com/archive/spanner.html
Dynamo: http://dl.acm.org/citation.cfm?id=1294281
RAMCloud: http://dl.acm.org/citation.cfm?id=1965751 and ...
read more »
[Feb 5, 2013 | 129 views]
Understanding the literature is usually the first step to do research, which is the same for system research on cloud computing. A reading list may help a lot to those that just start in cloud computing research.
Prof. Lin Gu, my PhD supervisor, compiles a reading list for system research on ...
read more »
[Feb 5, 2013 | 111 views]
Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.
There is no paper/technical report about Cosmos published yet. I compiled a list of information about Cosmos on the Web as follows.
What is Microsoft’s Cosmos service? by Yaron Y. Goland.
Microsoft Cosmos: Petabytes perfectly processed perfunctorily ...
read more »
[Feb 5, 2013 | 50 views]
Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.
Everyone who is interested in large distributed systems should read:
PDF for Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.
read more »
[Apr 11, 2013 | 44 views]
I compiled a list of good systems conferences and deadlines for my own reference. Here I share the list and hope it can help others who also needs such a list. This list is kept updated.
A PDF version: Systems Conference and Deadlines.
Links to conference websites: Systems Conferences.
read more »
[Apr 11, 2013 | 44 views]
I compiled a list of good systems conferences and deadlines for my own reference. Here I share the list and hope it can help others who also needs such a ...
read more »
[Mar 27, 2013 | 737 views]
This post lists important conferences related to Cloud Computing in year 2013.
SOSP 2013
SOSP’13: The 24th ACM Symposium on Operating Systems Principles. November 3-6, 2013, Nemacolin Woodlands Resort, Pennsylvania.
The ...
read more »
[Mar 13, 2013 | 1,106 views]
After installing Hadoop, we usually run some benchmark programs to test whether the system works well. In the post of Hadoop install tutorial, we show a very simple to grep ...
read more »
[Feb 5, 2013 | 129 views]
Understanding the literature is usually the first step to do research, which is the same for system research on cloud computing. A reading list may help a lot to those ...
read more »
[Feb 5, 2013 | 181 views]
Research on Cloud Computing has made big progresses and many excellent large-scale systems have been designed in recent years. I compiled a list of some large-scale data storage and processing ...
read more »
[Feb 5, 2013 | 50 views]
Designs, Lessons and Advice from Building Large Distributed Systems by Jeaf Dean.
Everyone who is interested in large distributed systems should read:
PDF for Designs, Lessons and Advice from Building Large Distributed ...
read more »
[Feb 5, 2013 | 247 views]
Storage Architecture and Challenges in Faculty Summit, July 29, 2010, by Andrew Fikes, Principal Engineer.
Download PDF.
This slides introduces some of Google’s storage systems with insights and discussion of problems.
read more »
[Feb 5, 2013 | 2,989 views]
The mrcc project’s homepage is here: mrcc project.
Abstract
mrcc is an open source compilation system that uses MapReduce to distribute C code compilation across the servers of the cloud computing platform. ...
read more »
[Feb 5, 2013 | 111 views]
Cosmos is “Microsoft’s internal data storage/query system for analyzing enormous amounts (as in petabytes) of data”.
There is no paper/technical report about Cosmos published yet. I compiled a list of information ...
read more »
[Feb 5, 2013 | 820 views]
Colossus is the successor to the Google File System (GFS) as mentioned in the recent paper on Spanner on OSDI 2012. Colossus is also used by spanner to store its ...
read more »