7. Hive won't let me have my external table and delete it too
If you let Hive manage tables, it automatically deletes them if you drop the table. If you have an external table, it does not. Why can't there be a "drop external table too" or something? Why do I have to do this outside if I really want to? Also, while Hive is practically evolving into an RDBMS, why doesn't it have Update and Delete?
8. Namenode fail
Oozie, Knox, and several other parts of Hadoop do not obey the new Namenode HA stuff. You can have HA Hadoop, so long as you don't use anything else with it.
It's cliche to complain, but check this out. Line 37 is wrong -- worse, it is wrong in every post all over the Internet. This proves that no one even bothered to run the example before checking it in. The Oozie documentation is even more dreadful, and most of the examples won't pass schema validation on the version it's meant for.
10. Ambari coverage
I have trouble criticizing Ambari; given what I know about Hadoop architecture, it's amazing Ambari works at all. That said, where Ambari has shortcomings, they can be annoying. For example, Ambari doesn't install -- or in some cases, doesn't install correctly -- many items, including various HA settings, Knox, and much, much more. I'm sure it will get better, but "manually install afterward" or "we'll have to create a puppet script for the rest" shouldn't appear in my emails or documentation any longer.
11. Repository management
Speaking of Ambari, have you ever done an install while the Repositories were being upgraded? I have -- it does not behave well. In fact, sometimes it finds the fastest (and most out of date) mirror. It doesn't care if what it pulls down is in any way compatible. You can configure your way out of that part, but it's still annoying the first time you install incoherent pieces of Hadoop across a few hundred nodes.
12. Null pointer exceptions
I seem to find them. Often they are parse errors or other faults I've caused. That said, they still should not be exposed as NPEs in Pig, Hive, HDFS, and so on.
The response to any similar list of complaints will of course be "patches welcome!" or "hey, I'm working on it." Hadoop has come a long way and is certainly one of my favorite tools, but boy, those sharp edges annoy me.
What's your favorite Hadoop bug or six-legged feature? What are you doing to make it better?
Sign up for CIO Asia eNewsletters.