This week I attended the AWS Summit in SF. Things change so fast and frequently in the expanding world of AWS that in addition to their yearly re:Invent conference they now hold Summits around the world. These Summits serve as mini-conferences to update people on new and relevant information, and to allow partners and vendors to showcase their solutions and capabilities.
One thing that shouldn’t be a surprise to anyone is the announcements Amazon made regarding their growth figures. In the beginning of AWS it used to be quite amazing how fast things were growing, but now it’s almost expected that every time it’s mentioned, AWS growth will be yet more exponential. In this regard Amazon didn’t disappoint. S3 is up over 2 trillion objects, with a peak rate of 1.1 million requests per second. And the famous, though purposefully vague, statistic that they periodically add as much infrastructure as was required to run all of Amazon when it was a $5 billion business, which used to be monthly, is now daily. Of course, they won’t say exactly how many machines and disks that is, it’s an impressive figure. It’s a great way to describe velocity of growth without admitting to any concrete statistics that consumers and competitors really want to know – just how many servers do they have?
On Monday, before the main Summit day, a session was held for partners. AWS Started as direct to consumer or business, but in the past year they have worked on their partner program, realizing that many customers need additional services beyond what is provided by AWS itself. Taos is part of that partner program, and will be offering more AWS related service in the months to come.
Even in an on premise environment, data-warehouses aren’t easy. They require complex hardware and software configurations to perform adequately, and are very expensive propositions often stretching into the millions of dollars for any scale beyond a few TB. They also become obsolete quickly with the rapid pace of hardware innovation we are seeing. A lot of effort and cost goes into designing implementing and then especially operating these environments. Data warehouses require many servers working together, high capacity and high performance storage, adequately provisioned network connectivity, and often very specialized software. To date, the data warehouse workload has been one of the most difficult things for people to envision running in a cloud environment.
People have been working on large unstructured datasets with cloud technology for quite some time now, with the nearly infinite storage capacity of S3, and technologies like Hadoop on EC2 or Amazon’s Elastic Map Reduce service. But the structured database offerings on AWS have been limited to quirky configurations running on EC2, or the single-machine offerings of RDS.
Recently, Amazon announced their offering for data warehouses, Redshift. Redshift is a hybrid service that includes both hardware and software to deliver a turnkey data warehouse in which the customer need only request a certain size and number of machines and storage, design a schema, and then load data. On the query side, Redshift supports JDBC/ODBC and will work well with existing business intelligence and reporting tools. As in RDS, Amazon takes care of all maintenance and monitoring activities including backup. It was beta-tested with large customers for quite some time before being released to the public. It is a very impressive offering. AWS prices it quite attractively too.
The Summit included some very in-depth technical sessions. One of my favorite involved Identity and Access Management (one of Amazons most complicated service offerings since it interacts with almost all other offerings). It was apparent that despite the advances AWS has made in making the web console UI easy to use and friendly, and building command-line tools for most operations – AWS is still catering primarily to the developer. There is still a large difference in functionality between what can be accomplished via the GUI or command line, compared to what can be done for instance with the Java SDK. As such, if you are a user of AWS services, I encourage you to take a look at the actual API and all the functionality that is available which you may not have known about. This is particularly relevant to IAM, since security and identity is so important when designing and operating services in a cloud environment. While the attraction of people to the ease of use of “3 clicks to spin up a server” it is important to realize that using the web console is just scratching the service and the types of advanced architectures able to be implemented on the AWS platform often require significant knowledge and experience. The fact remains that to be an effective expert at AWS, one needs some coding skills.
Amazon has posted a video of the Keynote, and slide decks that were presented at the breakout sessions.
Keynote Video: http://aws.amazon.com/live/
Breakout Decks: http://www.slideshare.net/AmazonWebServices/tag/sfsummit2013