Search This Blog

Thursday, February 28, 2013

We just invented a new game: "Whack a Hadoop namenode"

I just came back from Strata 2013 BigData conference. A pretty interesting event, considering that Hadoop wars are apparently over. It doesn't mean that the battlefield is calm. On the contrary!

But this year's war banner is different. Now it seems to be about Hadoop stack distributions. If I only had an artistic talent, the famous
would be saying something like "Check out how big is my Hadoop distro!"

But judge for yourself: WANdisco announced their WDD about 4 weeks ago, followed yesterday by Intel and Greenplum press releases. WDD has some uniquely cool stuff in it like non-stop namenode, which is the only 'active-active' technology for Namenode metadata replication on the market based on full implementation of Paxos algorithm,

And I was having fun during the conference too: we were playing the game 'whack-a-namenode'. The setup includes a rack of blade Supermicro servers, running WDD cluster with three active namenodes.
While running stock TeraSort load, one of the namenode is killed dead with SIGKILL. Amazingly, TeraSort can't care less and just keep going without a wince. We played about a 100 rounds of this "game" over the course of two days using live product, with people dropping by all the time to watch.

Looks like it isn't easy to whack an HDFS cluster anymore.

And nice folks from SiliconAngle and WikiBon stopped at our booth to do the interview with me and my colleagues. Enjoy ;)

Sunday, February 24, 2013

Rooting JB Asus Transformer TF700

I didn't expect this to be so hard, really. The rooting is usually a pretty straight forward process that can be done quickly. However, considering the amount of issues around the upgrading ICS table to JB with or without rooting the first; number of the posts on forum that refers to the same partially outdated recovery images, it wasn't easy. However, here's the easiest way to root your stock JB Transformer device:

  # I am doing everything on Linux
  # You need to either install Android SDK or get adb and fastboot tools for your distribution

1. Push CWM-SuperSU to the device
    % adb push /sdcard
2. Download and unpack ClockWorkMod Recovery v6.0.1.4
    rename recovery_jb.img to recovery.img
3. Boot your device to fastboot mode
   % adb reboot bootloader
Use Vol- to scroll to USB icon (fastboot mode);  use Vol+ to select
4. Unlock the bootloaded using UnLock_Device_App_V7.apk (google to download the file). Or alternatively you should be able to do use
  % fastboot <VendorID> oem unlock
(please note, that you might be better off by running fastbook as root user). ASUS VendorID is 0x0B05. To find out the id for your device use lsusb.
5. Flash recovery.img to your device and reboot
  % fastboot -i 0x0B05 flash recovery recovery.img
  % fastboot -i 0x0B05 reboot
6. Boot to fastboot mode again (as in #2 above) and enter Recovery mode
7. Install SuperSU from the zip file on sdcard
8. Reboot once again
9. Install RootChecker from Google Play and make sure your devices is rooted.


Friday, February 22, 2013

Multi-nodes Hadoop cluster on a single host

If you running Hadoop for experimental or else purposes you might face a need to quickly spawn a 'poor man hadoop': a cluster with multiple nodes within the same physical or virtual box. A typical use case would look like working on your laptop without access to the company's data center; another one is running low on the credit card, so you can't pay for some EC2 instances.

Stop right here, if you are well-versed in Hadoop development environment, tar balls, maven and all that shenanigans. Otherwise, keep on reading...

I will be describing Hadoop cluster installation using standard Unix packaging like .deb or .rpm, produced by the great stack Hadoop platform called Bigtop. If aren't familiar with Bigtop yet - read about its history and conceptual ideas.

Let's assume you installed Bigtop 0.5.0 release (or a part of it). Or you might go ahead - shameless plug warning - and use a free off-spring of the Bigtop just introduced by WANdisco. Either way you'll end up having the following structure:
your mileage might vary if you install more components besides Hadoop. Normal bootstrap process will start a Namenode, Datanode, perhaps SecondaryNamenode, and some YARN jazz like resource manager, node manager, etc. My example will cover only HDFS specifics, because YARN's namenode would be a copy-cat and I leave it as exercise to the readers.

Now, the trick is to add more Datanodes. With a dev. setup using tarballs and such you would just clone and change some configuration parameters, and then run a bunch of java processes like: --config <cloned config dir> start datanode

This won't work in the case of packaged installation, because of higher level of complexity involved. This is what needs to be done:
  1. Clone the config directory cp -r /etc/hadoop/conf /etc/hadoop/conf.dn2
  2. In the cloned copy of hdfs-site.xml, change or add new values for:
    (An easy way to mod the port numbers is to add 1000*<node number>)to the default value. So, port 50020 will become 52020, etc.
  4. Go to /etc/init.d and clone hadoop-hdfs-datanode
  5. In the clone init script add the following
  6.   export HADOOP_PID_DIR="/var/run/hadoop-hdfs.dn2"
    and modify
  7. Create and make hdfs:hdfs to be the owner of
  8. run /etc/init.d/hadoop-hdfs-datanode.dn2 start to fire up the second namenode
  9. Repeat steps 1 through 6 if you need more nodes running.
  10. If you need to do this on a regular basis - spare yourself a carpal tunnel and learn Puppet.
Check the logs/HDFS UI/running java processes to make sure that you have achieved what you needed. Don't try to do it unless you box has sufficient amount of memory and CPU power. Enjoy!

Wednesday, February 6, 2013

One more Hadoop in the family!

Indeed "may you live in interesting times". Not so long ago I posted the update of my elephants genealogy and it seems to be outdated already. Oh, well - I guess it is an exciting thing to be bothered with - because I love all kinds of elephants ;)

This is the birth of another Apache Hadoop's brother! The young dude has been definitely born with a silver spoon in the mouth. It is called Active/Active replication of NameNode - the very first in the world to my limited knowledge in the matter. Pretty cool, eh?

WANdisco is releasing their certified version of Hadoop as the base of their own BigData distribution called WDD. Hence, I need to update the tree again.

And congratulations on the release, guys - the more the merrier!