Archive for the ‘Technology’ Category

SANOG XVI Conference in Paro, Bhutan

Recently in the last month, I traveled to Bhutan to attend the SANOG Conference. Bhutan is a small country nestled in the Himalayan mountains surrounded by Indian to the East, West and South and China to the North. It was a good opportunity to meet some like-minded network geeks and also visit an exotic country.

SANOG (South Asian Network Operators Group) is a conference where various stakeholders from the Internet infrastructure ecosystem can come together, share operational experiences and learn from each other. SANOG is targeted towards the SAARC Countries (India, Pakistan, Nepal, Sri Lanka, Bangladesh, Bhutan and Maldives). SANOG is loosely modeled on APRICOT conference with 5 days of Workshops, 2 days of tutorials and 2 days of conference.

The 16th edition of SANOG was held in Paro, Bhutan. This was the second time that Bhutan was hosting SANOG. SANOG was held in the the Paro Engineering College beside the Paro river in very picturesque settings. The earlier edition of SANOG in Bhutan was held in the capital, Thimphu. This was the 2nd time I was attending SANOG, having attended an earlier edition of SANOG in Mumbai in 2006.

I attended the workshop on Network Security by Gaurab Raj Upadhyay and Johhny Martin from PCH. It covered the basics of security specifically for ISPs and large network providers. There were some good discussions on how to manage the different security audit process as well as an incident management program in case of network security breaches. The hands-on part of the workshop concentrated heavily on securing backbone routers and exchanging routes information securely. Some aspects of filtering and verifying network traffic were also covered. The last day had demos of several tools such as nessus and nmap. The slides can be downloaded from the SANOG Program page. Also in between the workshop breaks and during one of the days of the workshop, Devdas and I wrote a improved whois server that is hopefully in production now at the NIC website.

In the tutorials part of SANOG, I was giving a half-day tutorial on application-level performance measurement [Slides,PDF]. There was an small but interested crowd in the tutorials. I ended up covering a lot more of the web-facing and measurement tools as many of the participants were application developers who had written quite a bit of PHP code. It was the first time I was giving a tutorial on this topic and it helped that it was interactive. In addition to the material on the slides, I talked a bit about front-end performance and tools such as Yslow (Yahoo), Pagespeed (Google) and Webpage Test (AOL). There was a lot of whiteboarding and veered a little away from the slides. I also spoke about the network measurement work being done in the IPPM, BMWG and PMOL working groups in the IETF. The feedback was pretty good and I plan to give a longer version tutorial at later versions of SANOG/APRICOT. I skipped the second day of the tutorials and went to Chele La pass.

The conference had several talks that I was looking forward to and I was not disappointed. The standout talk were on long distance wireless network deployment by Matt Peterson and F-root update by Pete Losher. Both has interesting networking insights and interesting traffic data. I also gave a talk on my IETF fellowship experience [Slides,PDF]. Some of the slides were liberally lifted from “The Tao of the IETF” written by Paul Hoffman. As (good) luck might have have it, I ran into Paul Hoffman at the IETF 78 and told him about it :) . There were a few questions about the fellowship after the conference so I hope it would inspire more people to apply to the IETF fellowship.

Installing Lucid Lynx – Ubuntu 10.04 on a Asus EeePC

Lucid Lynx on  Asus EeePC

I had bought a Asus EeePC (model 1005HA) sometime back. I use it to take notes and as portable browsing and storage device whenever I travel. It came pre-installed with Windows and a crippled version of Office. So when Ubuntu 10.04 – Lucid Lynx came out last week, I decided to install Ubuntu – netbook edition on it as I had heard good reviews of it. This would also help me to do any casual programming on the go. Ubuntu 10.04 is also a long term support (LTS) release. I have written a small howto as installing linux on a netbook is slightly more involved than popping a CD and clicking next (as most Netbooks do not have a CDROM drive).

Step-by-Step guide to install Lucid Lynx Ubuntu 10.04 on a Asus EeePC 1005HA netbook.
1. Go to the Lucid Lynx release page and download the netbook ISO. I suggest you get the torrent and download using a bittorrent client. It is faster and you save some bandwidth (for the mirrors) for the less tech-savvy.

2a. If on Windows, download the USB disk creator, choose the downloaded ISO image and the amount of read-write space you want and follow instructions to create the disk.

2b. If you are on Linux, you can burn the ISO image to a disk and boot from it (by selecting the CDROM drive as the primary drive). Once you are logged in go to System => Administration and select Startup Disk creator. Select the ISO image and the drive to use and click next. You are done in a few minutes.

3. Next reboot the Asus EeePC and press F2 to go to the boot setting screen. Select Boot => Boot Device Priority. Set Removable Device as the 1st boot Device and disable “Boot Booster”. “Boot Booster” feature does some caching to enable fast booting. Since we are changing the boot sequence we need to disable this.

4. Once the BIOS setting are done, plug in the USB drive with the image and boot from it. It will give you an option to install to disk. From there on it is a normal Ubuntu Install.

5. Wireless did not work out of the box. Run the following commands to
get it working.
$ sudo apt-cache search linux-backports

See the output of the above command and install the wireless x86 packages.

But before that, check if the wireless is switched on in the BIOS. I wasted a lot of time realising that it had been switched off in BIOS and the keyboard function keys wouldn’t work.

Posting this on an ASUS EeePC on Lucid Lynx with wireless working :)
For more info look here.

Cloudcamp Bangalore 2010 and Hadoop Summit

The 2nd CloudCamp Bangalore was held at Dayanand sagar College of Engineering. It was co-located with the First Hadoop summit in India. The Hadoop summit was interesting and more relevant to me as I am using a Hadoop cluster for Analytics at Inmobi. Dave kicked off Cloudcamp with signature “unPanel”. I was on the Unpanel this time and answered some questions on mobiles, netbooks and smartphones as access devices for the cloud and the on impact of Google patent on MapReduce.

The corridor discussions with a bunch of Hadoop committers were insightful. I also found out more about Mahout. Mahout is a Apache project to build scalable machine learning libraries. It is not restricted to Hadoop implementations, but much of the current activity seems to be around Hadoop.

Notes and embedded slides from the sessions I attended follow:

Hadoop summit Keynote

Data Management on Grid

Notes:

  • Y! uses a HDFS replication factor of 3 (the hadoop default) in most cases. Exceptions are big clusters with large number of applications running simultaneously.
  • Y! does not use Avro yet due to large amount of legacy data. Twitter uses Avro.
  • Data ingestion layer uses MapReduce for heavy lifting and format conversion for storage.
  • LZO is used for compression. gzip (not ideal due to non-block-level indexing) and bzip2 is also used. There are problems with slowness of bzip2 decompression but bzip2 delivers better compression ratios.
  • Data ingestion layer also oversees policy for data retention and purging.
  • Underlying filesystems is rarely a bottleneck for Hadoop. Mostly the synchronization semantics of HDFS is a bottleneck. A file operation is not successful until all the replicas are in sync.

Machine Learning using Hadoop

Notes:

  • There are clear differences between data mining and machine learning.
  • ML is harder to implement efficiently on Hadoop. Improving efficiency is still a research problem.
  • Hadoop creates one map job / block creating too many empty files and also many reducers.

Optimizing and Benchmarking Hadoop

Notes:

  • As a thumb rule, adding as much memory as money can buy is a a good idea for Hadoop
  • Consider Network connections as shuffle stage does heavy network I/O
  • Solid state disks might make sense at certain price/performance ratios. They are also more power efficient.

Tuning Hadoop To Deliver Performance To Your Application

Notes:

  • Several parameters to tune Hadoop but must be used in conjunction with each other.
  • Set number of map jobs slightly more than number of cores to ensure better utilization. Makes sure that data is processed in waves. Also better network utilization (as shuffle phase happens parallely with Map phase) along with CPU scheduling
  • Choosing a good HDFS block size is important. Number of HDFS blocks is directly proportional to number of Map tasks generated

Links to all presentations

ACM Compute 2010 and ACM India launch

ACM Compute 2010 concluded yesterday. It is the flagship conference of the ACM Bangalore chapter. This year was the 3rd edition of the conference and more than 500 people attended the conference. The highlight of this year’s conference was the launch of ACM India. ACM wants to increase it reach in India and ACM India Council consisting of 18 leading computer scientists from academia and industry are heading this initiative.

The ACM India launch was addressed by 3 Turing Award Winners – Barbara Liskov, C.A.R Hoare (Tony Hoare) and Raj Reddy. The ACM Turing award is “The Nobel Prize for Computing” and it is rare to see three Turing Award winners address the audience at any event. Barbara Liskov is the most recent awardee of the Turing award (the 2nd woman to win it) and she spoke on the power of abstraction. She spoke about the problems early programmers faced when writing large and complex programs. She explained how she tried to solve it using abstractions similar to (what is now called) Object-oriented programming. She talked at length on how her insights and experiences with these programming problem led to design of the CLU language. CLU was the first language to implement iterators and generators (as well as exception handling). It was a good lesson in computer history listening to her. I learned later that she was the first woman to get her PhD from a Computer Science Department. (Her doctoral advisor was the legendary John McCarthy). Her presentation and the mentioned references in it make for good reading.

Dr Raj Reddy is the only Indian who has won the Turing award for his contributions to field of Artificial Intelligence. Incidentally, his PhD advisor was also John McCarthy – AI Pioneer and Turing Award winner. Dr Raj Reddy spoke about the growth of computing over the years and the challenges of reaching the “bottom of the Pyramid”. He explained why there was need to move from the WIMP-paradigm in user interfaces to the SILK (Speech, Image, Language and Knowledge) to increase the reach of computing. His Turing award lecture (“To dream the possible dream”) makes for interesting read as well.

C.A.R Hoare (Tony Hoare) was the next speaker. He is a living legend in computer science. I was looking forward to hearing him speak as I had studied the Quicksort algorithm (which he invented) and Communicating Sequential Processes paper in college. He was remarkably witty and his enthusiasm for computer science shone through in his talk. In particular he spoke about the Verified Software initiative which he contended was similar in scope and impact (for Computer Science) to the Hubble Telescope and the Human genome project.

The following 2 days, we had the ACM Compute 2010 conference and there were several hands-on Tutorials on Cloud Computing, Rich Internet Applications and Web 2.0 apps, Widgets and Mobile Applications. The RIA tutorial was conducted by Mrinal Wadhwa (slides embedded below) and the Facebook connect tutorial by Prateek Dayal (of Muziboo).

(Disclosure:I am the secretary of the Bangalore Chapter and am on the program committee for ACM Compute 2010.)

Coders At Work Review

Coders At Work

Once in a while, you read a book that is filled with ‘aha’ moments. If you have written complex software for a while or want to become a good programmer then ‘Coders at work’ is a must read. This fantastic book interviews 15 master programmers. Some of the people interviewed in the book are well-known names such as Don Knuth, Ken Thompson, Jamie Zawinski and Peter Norvig.

Some comments on the content of the book:
Programming languages
Many of programmers interviewed started with BASIC and considered it an okay language. What is probably more surprising is the universal hatred of C++ in this group. In fact several people such as Peter Norvig and Ken Thompson (who goes on a tirade against C++) consider it a downright ugly and cumbersome language to work with.

Jamie Zawinski – C++ is just an abomination
Brad Fitzpatrick – The syntax is terrible and totally inconsistent and the error messages, at least from GCC, are ridiculous.
Ken Thompson – - By and large I think it’s a bad language. It does a lot of things half well and it’s just a garbage heap of ideas that are mutually exclusive. Everybody I know, whether it’s personal or corporate, selects a subset and these subsets are different. So it’s not a good language to transport an algorithm—to say, “I wrote it; here, take it.” It’s way too big, way too complex. And it’s obviously built by a committee.

On Programming and Curiosity
Almost everyone interviewed still programs (some do occasionally) and enjoys hacking and taking things apart. Many were misfits and took unusual career paths to get to where they are today. There is a rebel and hacker streak in all of the them. Most of them stumbled into programming and discovered that they were good at that at some point. Everyone emphasized the practice of writing good code readable code. Everyone laments that you cannot understand a system from the bottom upwards as systems have become more and more complex and layers of abstraction have multiplied manifold.

On categorizing programming and building software
The opinion is pretty much evenly split on whether programming is a science, art, craftsmanship or engineering with a slight bias towards craftsmanship.

On Recommended Books
Among the books recommended, “The Art of computer programming” by Don Knuth topped the list for obvious reasons. Another books which was recommended by several people was the “Psychology of computer programming” by Gerald Weinberg.

On the state of computer science
The mood on the state of developments in computer science was fairly pessimistic and most people pointed to the fact that many of the breakthrough ideas for computer science were conceived in the ’70s (with the notable exception of the internet and web programming)

The only downside here is the interview of Fran Allen. It should not have made the book. I got the distinct feeling that much of the work that she claimed credit for is implemented by others and she was the manager of those projects (probably a good one but that is hardly the same as being a good programmer).

I have added some notes (for further reading) and quotes from the book on the wiki

Get Adobe Flash playerPlugin by wpburn.com wordpress themes