Andrew's Website
Aunt Rosie — Google Wave Translation Robot
2009-10-10 15:58:10 google

Google Wave used to have a realtime language translation robot named Rosy Etta. Unfortunately it isn't currently available, so I had to write my own. I named my translation bot Aunt Rosie after my aunt and as a tribute to the original Rosy.

If you include a special keyword, Aunt Rosie will reply to your message with a translation in the language you specified. Watch the video for a demonstration.

To get started, you should add aunt-rosie@appspot.com to your contacts. Now invite her to a Wave.

To begin translation, type "/translate:xx" in your blip. The XX is an ISO 639-1 code which specifies the target language. Chinese is the exception. Use "zh-CN" for simplified Chinese and "zh-TW" for traditional. Note: Aunt Rosie is in beta and still has some bugs. As such, you need to add a space and a newline after the language target. Rosie should appear as soon as you you type in this keyword. If she doesn't, then you might be doing something wrong.

Rosie automatically detects the language you are writing. However, this means you'll need to type enough text for her to recognize the language. Usually it takes 5-6 words before she starts translation. If you are using short words or proper nouns, it might take even longer. Try not to put too many proper nouns in one sentence or she'll get confused.

Aunt Rosie uses Google's machine translation software as her backend. She speaks all the languages Google supports. She is written in Java and runs on Google's App Engine.

Update 2009-10-13: I just finished making Aunt Rosie a lot easier to use. Instead of having to type special keywords and remember two letter codes, Aunt Rosie will automatically insert a language drop down into your blip when you've typed enough for her to recognize your language. Select the language you'd like to translate to and she'll reply with the translation. Easy as pie.

Update 2009-10-23: Occasionally the bot stops working due to a bug in Google Wave. I've mirrored the bot at webmaster@appspot.com, so switch to webmaster if Aunt Rosie is broken. You can track bug 278 on Google Wave's developer site

New Features in Hadoop 0.20
2009-04-28 01:41:33 hadoop

Hadoop 0.20 was released on April 22nd. I was curious about some of the changes in 0.20 so I did a little research and decided to blog about what I found. These are the changes, features, and improvements that were interesting to me. For a full list of changes, see the Hadoop 0.20 changelog.

My first impression was that 0.20 is less ambitious feature-wise than 0.19. That is not a denigration of Hadoop's committers, but an artifact of Hadoop's quarterly release cycle. You can read about Hadoop 0.19's features in Cloudera's excellent write up.

Context Object For Mapper and Reducer
The biggest change in 0.20 is a large refactoring of the core MapReduce classes (HADOOP-1230). The commit message lists the changes best:

  1. All of the methods take Context objects that allow us to add new methods without breaking compatability.
  2. Mapper and Reducer now have a "run" method that is called once and contains the control loop for the task, which lets applications replace it.
  3. Mapper and Reducer by default are Identity Mapper and Reducer.
  4. The FileOutputFormats use part-r-00000 for the output of reduce 0 and part-m-00000 for the output of map 0.
  5. The reduce grouping comparator now uses the raw compare instead of object compare.
  6. The number of maps in FileInputFormat is controlled by min and max split size rather than min size and the desired number of maps.

An example of the changes can be seen in map's method signature (before and after):

void map(K1 key, V1 value, OutputCollector<K2, V2> output, Reporter reporter)
protected void map(KEYIN key, VALUEIN value, Context context)

OutputCollector and Reporter can both be accessed through the new Context object.

This is a large change, so Hadoop added the classes in a new package at org.apache.hadoop.mapreduce. The old classes have been deprecated but can still be found under org.apache.hadoop.mapred.

This feature will allow Hadoop Core to quickly iterate and add new features without breaking end user code every release. This is an important milestone on the path to stability as Hadoop approaches 1.0.

Vaidya
Hadoop 0.20 also includes a new performance tool called Vaidya (HADOOP-4179). Vaidya scans your job logs using a collection of rules to identify potential performance issues. For example, if it notices that your Mapper is writing a lot of data to disk, it'll suggest you create a Combiner. It reminds me a bit of Fortify or FindBugs (or if you're feeling less generous: Clippy) in that it suggests improvements when using bad practices (although, not using powerful static analysis like the aforementioned software).

BloomMapFile
HADOOP-3063 gives us BloomMapFile, a fast-failing implementation of MapFile. MapFiles don't seem to get much attention, although having used them for a project in my CSE 490h class with Hadoop 0.7.2, I appreciate the addition.

Removal of LZO
Finally, the last change that caught my attention was the removal of the LZO compression codec in HADOOP-4874. The LZO code was infected with GPL (I hear that's been going around recently) and thus had to be removed. However, the LZO code lives on as a side project on Google Code.

Modern Day Witch Hunt
2009-04-13 20:51:08 amazon google

Last week it was #savejon and this week it is #amazonfail. What do these hashtags have in common? They were both markers used in modern day witch hunts.

The basic pattern is similar. Someone posts an unsubstantiated claim on the internet. Thousands of "twactivists" repeat and forward the claims creating an internet swarm. They write outraged blog posts and send angry e-mails to the supposedly offending company. The few people that actually possess critical thinking skills e-mail the company for more information or start doing research on their own. Eventually the big, slow moving company responds to the issue, but by then all the low-attention span sheeple have already moved on to the next big drama.

In #savejon's case, a guy named Jon claimed claimed a stock art company had stolen his art and was now suing him. Turns out that was all a lie.

With #amazonfail, Amazon stopped giving sales rank to some GLBT books. People thought they were pressured by the religious right to remove the books from the sales charts. This was a perfect example of confirmation bias because people would search for other gay books and see that they didn't have a sales rank either. It is easy to think it only affects GLBT books when you are only searching for those books. Additionally, people searched for [homosexuality] and only got books by anti-gay authors. Of course they did! Gay people usually describe themselves as gay not homosexual!

Google has encountered similar issues with their automated ranking algorithms. A few years ago searches for [Jew] returned an anti-semetic website first. Since people don't understand how search engines work, Google had to post an explanation. There is no ulterior motive, big complicated systems can produce wonky results sometimes.

Hack journalists who think they know computers know that this could never, ever be a computer glitch. No, computers and software always work perfectly. Most people don't realize how hard it is to maintain large ontologies and sets of metadata, especially when that data comes from publishers, third party sellers, and users.

What's even worse is that some people still refuse to believe it was a mistake. These people are as bad as moon landing conspiracy theorists and 9/11 truthers. They maintain their beliefs no matter how much evidence or logic you throw at them. Some people just need to feel persecuted.

I find it interesting that this has happened twice in as many weeks. I'm a little worried about the reactionary witch hunt mentality that people are showing. Instead of taking some random internet claim with a bit of skepticism, people automatically assume the worst. Whatever happened to giving someone the benefit of the doubt or innocent until proven guilty? Drop the persecution complex and just chill, okay?

Kindle Reader for iPhone Released
2009-03-03 20:21:09 amazon
I'd like to offer a huge congratulations to my friends over in Digital who just launched the Kindle Reader for iPhone (iTunes). This app was highly anticipated—bloggers have been requesting it for months. To Kevin, Ian, Cody, Rudd, Guido, and others, congrats, it looks great.
Andrewhitchcock.org 2008 Year In Review
2009-01-14 22:47:26 meta

At the beginning of 2008 I made a resolution to write one blog post a week. I fell short of my goal and only clocked in 21 posts. Nevertheless, I'm going to highlight some of the posts and provide a recap of what I've been up to here in the last year.

I wrote a couple of posts about cloud computing. For earth day, I talked about the potential environmental and economic benefits of using cloud computing. This is relevant given the recent attention paid to datacenter infrastructure by the popular media and the often incorrect studies they propagate. Microsoft launched Photosynth as Software + Services, which I wasn't too fond of. Finally, I explained Microsoft Azure to some ignorant slashdotters.

Speaking of cloud computing, I wrote two apps for Google App Engine. The first was a BigTable-like RESTful web service. It was intended as a joke, but it gets a fair amount of traffic (the website, not the service). I also ported my album cloud mashup over to App Engine.

I completed one of my life goals by finally getting IPv6 set up. Afterwards I wrote a HOWTO on configuring an IPv6 tunnel using m0n0wall. Hopefully it'll help a few people connect to the future of the internet.

And of course, anything I do wouldn't be complete with rants… lots and lots of rants about Leopard, Safeway, Photosynth, bailouts, slashdotters, and the auto industry.

Here's to a great 2009.

View All Entries
Creative Commons License
This work is licensed under a Creative Commons Attribution 2.5 License.