By George Herbert, Sr. Technical Consultant
Earlier this week on June 12, three Senior Technical Taos Consultants (Dan Roncadin, Kristen Stewart, and myself) attended a panel debate in Palo Alto run by the Churchill Club titled “The Elephant in the Enterprise: What Role Will Hadoop Play?”. This event was a panel discussion and debate featuring Mike Olson (CEO of Cloudera, formerly of Oracle and Sleepycat), Jay Parikh (VP of Infrastructure Engineering at Facebook), John Schroeder (CEO of MapR), Andrew Mehdelsohn (SVP, Oracle Server Technologies at Oracle), and Michael Driscoll (CEO of Metamarkets). All of these people have been leading Hadoop companies or Hadoop-using groups within their companies. The event was moderated by Cade Metz (Senior Editor at Wired). The event was very well attended with nearly 500 executives and technology leaders from leading bay area organizations.
The question for the panel was a very current, topical problem for a lot of enterprises (and Taos customers). What role can Hadoop play for companies, and how do they get there? How is Hadoop working for companies that are working to adopt it? The panelists are in the middle of this and had a good experience and strong opinions about what was working and not working as Hadoop comes into the enterprise. The experiences and opinions agreed at times and at times led to interesting polite clashes of opinion: One panelist saying to another at one point “This is a ‘Jane, you ignorant slut moment’ ” talking about features adoption, and a number of backhanded complements to Facebook, including an ongoing joke about how many tens of Stanford PhDs it took to repair major Hadoop installations.
All joking aside, there was a very serious point to all of this. There was nearly unanimous agreement that Hadoop can make a difference for some organizations with some questions to ask and data to gather in their large data sets, and that it is successfully running in a number of organizations now doing that. There was agreement that figuring out how to look for new things in the data was a big gap, and that learning how to use Hadoop for more than initial easy questions was a current industry roadblock. Everyone agreed that the open-source model for Hadoop was benefiting everyone in the room, both from a freedom point of view but even from a cold hard business point of view (getting more eyes on problems, getting good engineers on someone else’s payroll helping with problems).
There seems to be an agreement that Hadoop’s architecture isn’t entirely done yet. MapR’s new filesystem work was one avenue of expansion people were looking at. There were discussions about the stability of the single, currently non-redundant name node and finding ways to use RAM in the cluster and caching heavily used data somewhere above the disk. Making Hadoop less batch-oriented and more real-time responsive was also discussed.
All in all, the panel (and commenters in the audience) agreed that Hadoop is meaningful for the enterprise, and showing up increasingly in the industry, but has a ways to go. It continues to be a technology that we are watching closely.