download sourceball as .zipdownload sourceball as .tardownload sourceball as static gem

Hadoop Config Tips

Setup NFS within the cluster

If you’re lazy, I recommend setting up NFS — it makes dispatching simple config and script files much easier. (And if you’re not lazy, what the hell are you doing using Wukong?). Be careful though — used unwisely, a swarm of NFS requests will mount a devastatingly effective denial of service attack on your poor old master node.

Installing NFS to share files along the cluster gives the following conveniences:

First, you need to take note of the internal name for your master, perhaps something like domU-xx-xx-xx-xx-xx-xx.compute-1.internal.

As root, on the master (change compute-1.internal to match your setup):

    apt-get install nfs-kernel-server 
    echo "/home *.compute-1.internal(rw)" >> /etc/exports ;
    /etc/init.d/nfs-kernel-server stop ;

(The *.compute-1.internal part limits host access, but you should take a look at the security settings of both EC2 and the built-in portmapper as well.)

Next, set up a regular user account on the master only. In this case our user will be named ‘chimpy’:

  visudo # uncomment the last line, to allow group sudo to sudo
  groupadd admin 
  adduser chimpy
  usermod -a -G sudo,admin chimpy
  su chimpy                  # now you are the new user
  ssh-keygen -t rsa          # accept all the defaults
  cat ~/.ssh/id_rsa.pub      # can paste this public key into your github, etc
  cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys2

Then on each slave (replacing domU-xx-… by the internal name for the master node):

    apt-get install nfs-common ;
    echo "domU-xx-xx-xx-xx-xx-xx.compute-1.internal:/home  /mnt/home  nfs  rw  0 0"  >> /etc/fstab
    /etc/init.d/nfs-common restart
    mkdir /mnt/home
    mount /mnt/home
   ln -s /mnt/home/chimpy /home/chimpy

You should now be in business.

Performance tradeoffs should be small as long as you’re just sending code files and gems around. Don’t write out log entries or data to NFS partitions, or you’ll effectively perform a denial-of-service attack on the master node.

Tools for EC2 and S3 Management

Random EC2 notes


Fork me on GitHub