Search This Blog

Tuesday, December 27, 2011

New blog for Apache BigTop!

We've just kicked off new blog for BigTop project - Apache Hadoop data stack creation and validation.

Surprisingly, if got started with post on BigTop history

Blog is available from ASF Blog roller at - bookmark it!

Sunday, December 11, 2011

Conception and validation of Hadoop BigData stack: putting the record straight.

With more and more people jumping on bandwagon of big data it is very settling to see that Hadoop is gaining momentum by a day.

Even most fascinating is too see how the idea of putting together a bunch of service components on top of Hadoop proper is getting more and more momentum. IT and software development professionals are getting better understanding about benefits that a flexible set of loosely coupled yet compatible components provides when one needs to customize data processing solution at scale.

The biggest problem for most businesses trying to add Hadoop infrastructure into their existing IT is a lack of knowledge, professional support, and/or clear understanding of what's out there on the market to help you. Essentially, Hadoop exists in one incarnation - this is the open-source project under the umbrella of Apache Software Foundation (ASF). This is where all the innovations in Hadoop are coming from. And essentially this is a source of profit for a few commercial offerings today.

What's wrong with the picture, you might ask? Well, the main issue with most of these "commercial offerings" are mostly two folds. They are either immature and based on an sometimes unfinished nor unreleased Hadoop code, or provide no significant value add compare to Hadoop proper available in source form from And no matter if any of above (or both of them together) apply to a commercial solution based on Hadoop, you can be sure of one thing: these solutions will cost you literally tons of money - as much as  $1k/node/year in some cases - for what is essentially available for free.

"What about neat packages I can get from a commercial provider and perhaps some training too?" one might ask. Well, yeah if you are willing to pay top bucks per node for say like this  to get fixed or learn how to install packages on a virtual machine - go ahead by all means.

However, keep in mind that you always can get a set of packages for Hadoop produced by another open source project called Bigtop, hosted by Apache. What essentially you get are packages for your Linux distro, which can be easily installed on your cluster's nodes. A great benefit is that you can easily trim your Hadoop stack to only include what you need: Hadoop + Hive, or perhaps Hadoop + HBase (which will automatically pick up Zookeper for you).

At any rate, the best part of the story isn't a set of packages that can be installed: after all this is what packages are usually being created for, right? The problem with the packages or other forms of component distribution is that you don't know in advance if A-package will nicely work with B-package v.1.2 unless some has tested this assumption before. Even then, testing environment  might be significantly different from your production environment and then all bets are off. Unless - again - you're willing to pay through your nose to someone who's willing to get it for you. And that's where true miracle of something like BigTop is coming for a rescue.

Before I'll explain more, I wanna step back a bit and take a look at some recent history. A couple of years ago Yahoo's Hadoop development team had to address an issue of putting together working and well-validated Hadoop stack including a number of components developed by different engineering organizations with their own development schedule and integration criteria. The main integration point of all of the pieces was the operations team which was in charge of big number of cluster deployments, provisioning and support. Without their own QA staff they were oftentimes at mercy of assumed code or configuration quality coming from all the corners of the company. Yet worst, even with a chance of the high quality of all these components there were no guarantees that they will work together as expected once put together on the cluster. And indeed, integration problems were many.

That's were a small team of engineers including yours truly put together a prototype of a system called FIT (Final Integration Testing). The system essentially allowed you to pick up a packaged component you want to validate against your cluster environment and perform the deployment, configuration, and testing with integration scenarios provided by either component's owner or your own team.

The approach was so effective that the project was continued and funded further in the form of HIT (Hadoop Integration Testing). At which point two of us have left for what seemed like a greener pasture back then :(

We thought the idea was real promising so we have continued on the path of developing a less custom and more adoptable technology based on open standards such as Maven and Groovy. Here you can find slides from the talk we gave at eBay about a year ago. The presentation is putting the concept of Hadoop data stack in open writing for the time, as well as stacks customization and validation technology. When this presentation were given we already had well working mechanism of creating, deploying, and validating both packaged and non-packaged Hadoop components.

BigTop - open-sourced for the second time just a few months and based on our project above - has added up a packaging creation layer on top of the stack validation product. This, of course, makes your life even easier. And even more so with a number of Puppet recipes allowing you to deploy and configure your cluster in highly efficient and automatic manner. I encourage you to check it out.

BigTop has been successfully used for validating release of Apache Hadoop 0.20.205 which has become a foundation of coming Hadoop 1.0.0 Another release of Hadoop - 0.22 - was using BigTop for release candidates validation and so on.

Sunday, October 23, 2011

Pointy-haired boss from Sun Microsystems...

No kidding - there was a manager back at my last job at Sun Microsystems whom I had to explain 4 times a week why you can't "just add a link to that page" of the enterprise application and have it available in 15 minutes.

I am kidding you not

Saturday, October 22, 2011

Thursday, September 29, 2011

cos(x) >= 4 ? Possible, but rarely...

Back in the day, in my alma mater, we used to have mandatory reserve officers training (one of the reasons why socialism sucks noodle). The lecturers were the dumb-ass officers who couldn't be kept any longer in the active military forces (among active military dumb-asses).

At some point one of these imbeciles was blabbing about artillery calculations and in the course of "solving" an equation he came to the conclusion that  
  cos(x) = 2.37 
or some such. One of the math. department's students raised his hand and expressed his concern
"Comrade major, we - at math. department - actually learned that  
  -1 <= cos(x) <= 1
Are you sure that your calculations are correct?"

Dumb-ass major (perhaps it was a dumb-ass Lt.-Colonel) looked back, paused for a moment and said "In the time of war cos(x) was reaching 4, but rarely..."

Wednesday, September 14, 2011

Monday, July 25, 2011

Are former Sun executives cursed by a strong spell?

I wonder if this is a curse of some kind that former Sun Microsystems execs are bringing viable companies down one after another?

As many can remember great ponny-tailed pinhead applied some black magic to become Sun's CEO and then smacked it to the ground.

Now it seems like we are seeing a similar patter in Yahoo!

I wonder how Google survived through the times with their great defender and proponent of personal privacies?

Friday, July 22, 2011

I swear - I used to have that kind of manager... twice ;(

I am sure most of you had the same kind from Dilber

If you want to know what company you should never ever go to work for - leave me a comment: I promise to answer ;)

Saturday, July 9, 2011

Very user friendly MacOS, undeed... ;(

Spent at least an hour trying to mount an smb shared from my Debian server to 10.6.+ OSX laptop (I have no idea what kind of feline name it carries, nor I care enough for Apple to figure it out).

As always the best answer comes from Ubuntu forums where a dude mentioned that certain tweaks need to be done to the samba server configuration. Namely, password encryption has to be enabled (even for 'share' level of security)
   encrypt passwords = true
Apparently, Apple doesn't give a shit about its users (as usual, I suspect) to ever mention the little fact that OSX SMB client can't work with SMB server where password encryption isn't enabled. No help, no web information, no nothing... While producing a pretty good yet overpriced hardware for their laptops there's still much to do to make the software better. Having cool iPhone features added to the PC OS doesn't make it any better, I guess.

Sunday, June 19, 2011

Tuesday, June 14, 2011

Confirmation of a suspicion: at least 15% of iPhone'rs are pretty dumb

Interesting analysis of iPhone pass code frequencies. Golly Gee, they are gullible, man!
Besides, iPhone pass code protects nothing at all once iPhone is connected to a Linux system via USB cable ;)

Monday, June 13, 2011

Dilber nailed that again: I see dumb people (C)

I am fascinated by Scott Adams: he expresses the story of my life very close to its sources... Very often (not all the time though). I have noticed that I am not a people person some odd years ago (despite various suggestions to read a book of Habits of Highly Crippy People

Contrastively to the book, I like people in short bursts - 1-1.5 minutes long max. That's all because I have a very low tolerance for a bull shit. Perhaps, I shouldn't say BS word for the sake of the children.

Monday, June 6, 2011

I'm sure everyone can recognize my late manager...

If someone wants to see a perfect depiction of my late manager (not a pretty, yet a hilarious view) click, scroll to 'Meeting' title and read away!
The fella is recreated in front of my mental eye with 100% accuracy!

Oh yes, I am sure you had one like this in your professional life. You didn't? I am envious cause I already had two of those ;(

Tuesday, May 31, 2011

Thank you, Dilbert, for debunking open workspace crap!

Being an alumni of Sun Microsystems for more than 15 years I love isolated offices. I have one at home which is a sort of sacred place even for my dog.

In fact, one of a biggest hardies of getting accustomed to Yahoo! environment were cubicles. Don't get me wrong - I like Yahoo! a lot: great company and mostly nice engineering culture! However, I hate cubicles with my guts - the noise level is too much sometimes: a lot of people aren't inherently evil, but rather inconsiderate when it gets to things like noise in a workplace. Some pretends to be amicable and laughing too hard, others chat loudly, etc.

However, I found cubicles to be somewhat tolerable after having a first hand experience with an open workspace. The main idea behind one is - apparently - to facilitate a collaboration between team members (yeah, right!), however the reality was nothing but a constant flow of interruptions, annoyance, and irritation where one could had a productive office hours only between 6 and 10 am while the office is virtually empty. Golly Gee: what a nightmare ;(!

I found many startups which are eagerly engaging into open-space fallacy mimicking, perhaps, some 'hot' companies out there. A whole lot of studies were done over last 40-50 years which shows direct correlation between a level of distraction and one's ability to perform activities requiring a high level of concentration (i.e. software engineering, learning, etc.). So, what the proponents of open workspaces, perhaps, do not realize is that at the end of the day it isn't a hive-mind aka 'group think' doubtful benefits but rather the quality and predictability of a company's product are at stake.

Am I glad to be in such a good company on that one?

Wednesday, April 27, 2011

The secret of laughing Buddha is revealed...

For many centuries people all around the world were trying to find out the cause of endless happiness of Buddha (click on a picture to see it in a greater resolution).

I am pretty sure that now I know the actual reason and it is - as many have suspected before - great monk's angle of view on the existence.

Lady and gentlemen, the great secret of Buddha's happiness!

He sees what many of us have failed to discover.

Aren't you departing being totally illuminated?

Tuesday, April 26, 2011

"Hadoop isn't without faults" said GigaOm

I came across this GigaOm article which among some legitimate points (such as lack of innovations to hide the complexity of MR framework) also says that "Hadoop is the talk of the town when it comes to big data, but it’s not without faults...".

Apparently folks @GigaOm have been looking into Hadoop very close to notice that Hadoop has faults. It has a lot of them. In fact, it has a special fault injection framework in it, which sole purpose is to add faults into the Hadoop like there's no tomorrow.

Being an author of this fault injection stuff I feel really flattered that GigaOm has noticed it ;)

Wednesday, March 23, 2011

In your face, concerned scientists! Pi is 3!

The other day I wrote about some concerned scientists who don't know any chemistry. So, bear with me, people! Today I am going to tell you a true blood chilling horror story!

American education sucks badly. US students and mostly indoctrinated morons without any real tangible knowledge. I have seen them making molecules models out of foam balls to pass a chemistry class (I guess it was 9th grade of K12). " and rankings from the Organisation for Economic Co-operation and Development, rating the United States' 15 year-olds 25th in the world in mathematics." and so on.

There are many hypotheses trying to explain why this is happening and how to fix it. The answer is clear for most sober-minded people and it is 'public' part of education. 'Public' means 'belongs to no one' or 'no one gives a damn about it' and tremendous failures of socialist regimes have demonstrated this simple fact beyond any reasonable doubt.

At any rate... there's new development of 'how to fix US education' jazz. Here it is (you can't make such stuff up, my concerned scientific friends) - politicians to rescue!

"That long-held empirical value of pi, I am not saying it should be necessarily viewed as wrong, but 3 is a lot better," said Roby, the 34-year old legislator representing Alabama's second congressional district, ushered into office in the historic 2010 Republican mid-term bonanza.

Pi has long been defined as the ratio of a circle's area to the square of its radius, a mathematical constant represented by the Greek letter "π," with a value of approximately 3.14159. HR 205 does not change the root definition, per se. The bill simply, and legally, declares pi to be exactly 3.

Roby, raised in Montgomery, Ala., is on the House Committee on Education and the Workforce, and the Subcommittee on Early Childhood, Elementary and Secondary Education.

"It's no panacea, but this legislation will point us in the right direction. Looking at hard data, we know our children are struggling with a heck of a lot of the math, including the geometry incorporating pi," Roby said. "I guarantee you American scores will go up once pi is 3. It will be so much easier." (bold font is mine)

You can read the rest of it here if you can see anything through the tears of a pure joy!

What's next? Canceling friction and gravitation laws by a political will?

I need a different globe, please!

Sunday, March 20, 2011

What kind of A-holes are blogging for Forbes?

As taken from this blogpost on Forbes
"...Because there are only a limited number of potassium iodide tablets in the world, people hoarding them in the U.S. are preventing them from getting those who really need them—like the people in Japan. The Union of Concerned Scientists issued a statement saying that Americans should not stockpile the tablets..." (emphasis is mine)

I am not discussing this particular blogger's post 'cause he's beyond help already (chemically, economically, etc.) What I want to point at is that apparently so called concerned scientists are more concerned than scientific. Or they are simply too busy making statements and don't have no time to read a school chemistry textbook to find out how to make any amount of KI in one's own kitchen.

Well, I am here to help. Here's the reaction, my concerned friends
6KOH + 3I2 = 5KI + KIO3 + 3H2O

I am completely agree with a part of the passage above! Some quantities are very limited on this planet... of intelligence, for starter.

Sunday, February 20, 2011

Fox News are shameless sleazeballs

Wow, "Republican" TV station Fox is full of - this is just low.
I see why big fellas are so scared of Ron Paul, but this crap is unbelievably unintelligent even for Fox...

Saturday, February 19, 2011

RPMs by truckload

A pretty big YUM (full of RPMs apparently)

Monday, February 14, 2011

Code and distributed fault injection for Hadoop

I gave this talk at Greenplum last Friday (2/11/2011) and it has been recorded and kindly posted at If you are interested in the topic it might be of an interest for you:

Slides are available here