A diverse set of real-world Java benchmarks shows that Google is fastest, Azure is slowest, and Amazon is priciest
If the cartoonists are right, heaven is located in a cloud where everyone wears white robes, every machine is lightning quick, everything you do works perfectly, and every action is accompanied by angels playing lyres. The current sales pitch for the enterprise cloud isn’t much different, except for the robes and the music. The cloud providers have an infinite number of machines, and they’re just waiting to run your code perfectly.
The sales pitch is seductive because the cloud offers many advantages. There are no utility bills to pay, no server room staff who want the night off, and no crazy tax issues for amortizing the cost of the machines over N years. You give them your credit card, and you get root on a machine, often within minutes.
[ From Amazon to Windows Azure, see how the elite 8 public clouds compare in InfoWorld Test Center’s review. | Benchmarking Amazon: The wacky world of cloud performance | Stay on top of the cloud with InfoWorld’s “Cloud Computing Deep Dive” special report and Cloud Computing Report newsletter. ]
To test out the options available to anyone looking for a server, I rented some machines on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure and took them out for a spin. The good news is that many of the promises have been fulfilled. If you click the right buttons and fill out the right Web forms, you can have root on a machine in a few minutes, sometimes even faster. All of them make it dead simple to get the basic goods: a Linux distro running what you need.
At first glance, the options seem close to identical. You can choose from many of the same distributions, and from a wide range of machine configuration options. But if you start poking around, you’ll find differences — including differences in performance and cost. The machines may seem like commodities, but they’re not. This became more and more evident once the machines started churning through my benchmarks.
Fast cloud, slow cloud
I tested small, medium, and large machine instances on Amazon EC2, Google Compute Engine, and Microsoft Windows Azure using the open source DaCapo benchmarks, a collection of 14 common Java programs bundled into one easy-to-start JAR. It’s a diverse set of real-world applications that will exercise a machine in a variety different ways. Some of the tests will stress CPU, others will stress RAM, and still others will stress both. Some of the tests will take advantage of multiple threads. No machine configuration will be ideal for all of them.
Some of the benchmarks in the collection will be very familiar to server users. The Tomcat test, for instance, starts up the popular Web server and asks it to assemble some Web pages. The Luindex and Lusearch tests will put Lucene, the common indexing and search tool, through its paces. Another test, Avrora, will simulate some microcontrollers. Although this task may be useful only for chip designers, it still tests the raw CPU capacity of the machine.
I ran the 14 DaCapo tests on three different Linux machine configurations on each cloud, using the default JVM. The instances aren’t perfect “apples to apples” matches, but they are roughly comparable in terms of size and price. The configurations and cost per hour are broken out in the table below.
I gathered two sets of numbers for each machine. The first set shows the amount of time the instance took to run the benchmark from a dead stop. It fired up the JVM, loaded the code, and started to work. This isn’t a bad simulation because many servers start up Java code from command lines in scripts.
To add another dimension, the second set reports the times using the “converge” option. This runs the benchmark repeatedly until consistent results appear. This sometimes happens after just a few runs, but in a few cases, the results failed to converge after 20 iterations. This option often resulted in dramatically faster times, but sometimes it only produced marginally faster times.
The results (see charts and tables below) will look like a mind-numbing sea of numbers to anyone, but a few patterns stood out:
Google was the fastest overall. The three Google instances completed the benchmarks in a total of 575 seconds, compared with 719 seconds for Amazon and 834 seconds for Windows Azure. A Google machine had the fastest time in 13 of the 14 tests. A Windows Azure machine had the fastest time in only one of the benchmarks. Amazon was never the fastest.
Google was also the cheapest overall, though Windows Azure was close behind. Executing the DaCapo suite on the trio of machines cost 3.78 cents on Google, 3.8 cents on Windows Azure, and 5 cents on Amazon. A Google machine was the cheapest option in eight of the 14 tests. A Windows Azure instance was cheapest in five tests. An Amazon machine was the cheapest in only one of the tests.
The best option for misers was Windows Azure’s Small VM (one CPU, 6 cents per hour), which completed the benchmarks at a cost of 0.67 cents. However, this was also one of the slowest options, taking 404 seconds to complete the suite. The next cheapest option, Google’s n1-highcpu-2 instance (two CPUs, 13.1 cents per hour), completed the benchmarks in half the time (193 seconds) at a cost of 0.70 cents.
If you cared more about speed than money, Google’s n1-standard-8 machine (eight CPUs, 82.9 cents per hour) was the best option. It turned in the fastest time in 11 of the 14 benchmarks, completing the entire DaCapo suite in 101 seconds at a cost of 2.32 cents. The closest rival, Amazon’s m3.2xlarge instance (eight CPUs, $0.90 per hour), completed the suite in 118 seconds at a cost of 2.96 cents.
Amazon was rarely a bargain. Amazon’s m1.medium (one CPU, 10.4 cents per hour) was both the slowest and the most expensive of the one CPU instances. Amazon’s m3.2xlarge (eight CPUs, 90 cents per hour) was the second fastest instance overall, but also the most expensive. However, Amazon’s c3.large (two CPUs, 15 cents per hour) was truly competitive — nearly as fast overall as Google’s two-CPU instance, and faster and cheaper than Windows Azure’s two CPU machine.
These general observations, which I drew from the “standing start” tests, are also borne out by the results of the “converged” runs. But a close look at the individual numbers will leave you wondering about consistency.
Some of this may be due to the randomness hidden in the cloud. While the companies make it seem like you’re renting a real machine that sits in a box in some secret, undisclosed bunker, the reality is that you’re probably getting assigned a thin slice of a box. You’re sharing the machine, and that means the other users may or may not affect you. Or maybe it’s the hypervisor that’s behaving differently. It’s hard to know. Your speed can change from minute to minute and from machine to machine, something that usually doesn’t happen with the server boxes rolling off the assembly line.
So while there seem to be clear performance differences among the cloud machines, your results could vary. These patterns also emerged:
Bigger, more expensive machines can be slower. You can pay more and get worse performance. The three Windows Azure machines started with one, two, and eight CPUs and cost 6, 12, and 48 cents per hour, but the more expensive they were, the slower they ran the Avrora test. The same pattern appeared with Google’s one CPU and two CPU machines.
Sometimes bigger pays off. The same Windows Azure machines that ran the Avrora jobs slower sped through the Eclipse benchmark. On the first runs, the eight-CPU machine was more than twice as fast as the one-CPU machine.
Comparisons can be troublesome. The results table has some holes produced when a particular test failed, some of which are easy to explain. The Windows Azure machines didn’t have the right codec for the Batik tests. It didn’t come installed with the default version of Java. I probably could have fixed it with a bit of work, but the machines from Amazon and Google didn’t need it. (Note: Because Azure balked at the Batik test, the comparative times and costs cited above omit the Batik results for Amazon and Google.)
Other failures seemed odd. The Tradesoap routine would generate an exception occasionally. This was probably caused by some network failure deep in the OS layer. Or maybe it was something else. The same test would run successfully in different circumstances.
Adding more CPUs often isn’t worth the cost. While Windows Azure’s eight-CPU machine was often dramatically faster than its one-CPU machine, it was rarely ever eight times faster — disappointing given that it costs eight times as much. This was even true on the tests that are able to recognize the multiple CPUs and set up multiple threads. In most of the tests the eight CPU machine was just two to four times faster. The one test that stood out was the Sunflow raytracing test, which was able to use all of the compute power given to it.
The CPU numbers don’t always tell the story. While the companies usually double the price when you get a machine with two CPUs and multiply by eight when you get eight CPUs, you can often save money if you don’t increase the RAM too. But if you do, don’t expect performance to still double. The Google two-CPU machine in these tests was a so-called “highcpu” machine with less RAM than the standard machine. It was often slower than the one-CPU machine. When it was faster, it was often only about 30 percent faster.
Thread count can also be misleading. While the performance of the Windows Azure machines on the Sunflow benchmark track the number of threads, the same can’t be said for the Amazon and Google machines. Amazon’s two-CPU instance often went more than twice as fast as the one-CPU machine. On one test, it was almost three times faster. Google’s two-CPU machine, on the other hand, went only 20 to 25 percent faster on Sunflow.
The pricing table can be a good indicator of performance. Google’s n1-highcpu-2 machine is about 30 percent more expensive than the n1-standard-1 machine even though it offers twice as much theoretical CPU power. Google probably used performance benchmarks to come up with the prices.
Burst effects can distort behavior. Some of the cloud machines will speed up for short “bursts.” This is sort of a free gift of the extra cycles lying around. If the cloud providers can offer you a temporary speed up, they often do. But beware that the gift will appear and disappear in odd ways. Thus, some of these results may be faster because the machine was bursting.
The bursting behavior varies. On the Amazon and Google machines, the Eclipse benchmark would speed up by a factor of more than three when using the “converge” option of the benchmark. Windows Azure’s eight-CPU machine, on the other hand, wouldn’t even double.
If all of these factors leave you confused, you’re not alone. I tested only a small fraction of the configurations available from each cloud and found that performance was only partially related to the amount of compute power I was renting. The big differences in performance on the different benchmarks means that the different platforms could run your code at radically different speeds. In the past, my tests have shown that cloud performance can vary at different times or days of the week.
This test matrix may be large, but it doesn’t even come close to exploring the different variations that the different platforms can offer. All of the companies are offering multiple combinations of CPUs and RAM and storage. These can have subtle and not-so-subtle effects on performance. At best, these tests can only expose some of the ways that performance varies.
This means that if you’re interested in getting the best performance for the lowest price, your only solution is to create your own benchmarks and test out the platforms. You’ll need to decide which options are delivering the computation you need at the best price.
Calculating cloud costs
Working with the matrix of prices for the cloud machines is surprisingly complex given that one of the selling points of the clouds is the ease of purchase. You’re not buying machines, real estate, air conditioners, and whatnot. You’re just renting a machine by the hour. But even when you look at the price lists, you can’t simply choose the cheapest machine and feel secure in your decision.
The tricky issue for the bean counters is that the performance observed in the benchmarks rarely increased with the price. If you’re intent upon getting the most computation cycles for your dollar, you’ll need to do the math yourself.
The simplest option is Windows Azure, which sells machines in sizes that range from extra small to extra large. The amount of CPU power and RAM generally increase in lockstep, roughly doubling at each step up the size chart. Microsoft also offers a few loaded machines with an extra large amount of RAM included. The smallest machines with 768MB of RAM start at 2 cents per hour, and the biggest machines with 56GB of RAM can top off at $1.60 per hour. The Windows Azure pricing calculator makes it straightforward.
One of the interesting details is that Microsoft charges more for a machine running Microsoft’s operating system. While Windows Azure sometimes sold Linux instances for the same price, at this writing, it’s charging exactly 50 percent more if the machine runs Windows. The marketing department probably went back and forth trying to decide whether to price Windows as if it’s an equal or a premium product before deciding that, duh, of course Windows is a premium.
Google also follows the same basic mechanism of doubling the size of the machine and then doubling the price. The standard machines start at 10.4 cents per hour for one CPU and 3.75GB of RAM and then double in capacity and price until they reach $1.66 per hour for 16 CPUs and 60GB of RAM. Google also offers options with higher and lower amounts of RAM per CPU, and the prices move along a different scale.
The most interesting options come from Amazon, which has an even larger number of machines and a larger set of complex pricing options. Amazon charges roughly double for twice as much RAM and CPU capacity, but it also varies the price based upon the amount of disk storage. The newest machines include SSD options, but the older instances without flash storage are still available.
Amazon also offers the chance to create “reserved instances” by pre-purchasing some of the CPU capacity for one or three years. If you do this, the machines sport lower per-hour prices. You’re locking in some of the capacity but maintaining the freedom to turn the machines on and off as you need them. All of this means that you can ask yourself how much you intend to use Amazon’s cloud over the next few years because it will then help you save more money.
In an effort to simplify things, Google created the GCEU (Google Compute Engine Unit) to measure CPU power and “chose 2.75 GCEUs to represent the minimum power of one logical core (a hardware hyper-thread) on our Sandy Bridge platform.” Similarly, Amazon measures its machines with Elastic Compute Units, or ECUs. Its big fat eight-CPU machine, known as the m3.2xlarge, is rated at 26 ECUs while the basic one-core version, the m3.medium, is rated at three ECUs. That’s a difference of more than a factor of eight.
This is a laudable effort to bring some light to the subject, but the benchmark performance doesn’t track the GCEUs or ECUs too closely. RAM is often a big part of the equation that’s overlooked, and the algorithms can’t always use all of the CPU cores they’re given. Amazon’s m3.2xlarge machine, for instance, was often only two to four times faster than the m3.medium, although it did get close to being eight times faster on a few of the benchmarks.
Caveat cloudster
The good news is that the cloud computing business is competitive and efficient. You put in your credit card number, and a server pops out. If you’re just looking for a machine and don’t have hard and fast performance numbers in mind, you can’t go wrong with any of these providers.
Is one cheaper or faster? The accompanying tables show the fastest and cheapest results in green and the slowest and priciest results in red. There’s plenty of green in Google’s table and plenty of red in Amazon’s. Depending on how much you emphasize cost, the winners shift. Microsoft’s Windows Azure machines start running green when you take the cost into account.
The freaky thing is that these results are far from consistent, even across the same architecture. Some of Microsoft’s machines have green numbers and red numbers for the same machine. Google’s one-CPU machine is full of green but runs red with the Tradesoap test. Is this a problem with the test or Google’s handling of it? Who knows? Google’s two-CPU machine is slowest on the Fop test — and Google’s one-CPU machine is fastest. Go figure.
All of these results mean that doing your own testing is crucial. If you’re intent on squeezing the most performance out of your nickel, you’ll have to do some comparison testing and be ready to churn some numbers. The performance varies, and the price is only roughly correlated with usable power. There are a number of tasks where it would just be a waste of money to buy a fancier machine with extra cores because your algorithm can’t use them. If you don’t test these things, you can be wasting your budget.
It’s also important to recognize that there can be quite a bit of markup hidden in these prices. For comparison, I also ran the benchmarks on a basic eight-core (AMD FX-8350) machine with 16GB of RAM on my desk. It was generally faster than Windows Azure’s eight-core machine, just a bit slower than Google’s eight-core machine, and about the same speed as Amazon’s eight-core box. Yet the price was markedly different. The desktop machine cost about $600, and you should be able to put together a server in the same ballpark. The Google machine costs 82 cents per hour or about $610 for a 31-day month. You could start saving money after the first month if you build the machine yourself.
The price of the machine, though, is just part of the equation. Hosting the computer costs money, or more to the point, hosting lots of computers costs lots of money. The cloud services will be most attractive to companies that need big blocks of compute power for short sessions. If they pay by the hour and run the machines for only a short block of time, they can cut the costs dramatically. If your workload appears in short bursts, the markup isn’t a problem because any machine you own will just sit there most of the day waiting, wasting cycles and driving up the air conditioning bills.
All of these facts make choosing a cloud service dramatically more complicated and difficult than it might appear. The marketing is glossy and the imagery makes it all look comfy, but hidden underneath is plenty of complexity. The only way you can tell if you’re getting what you’re paying for is to test and test some more. Only then can you make a decision about whether the light, airy simplicity of a cloud machine is for you.
Best Microsoft MCTS Certification, Microsoft MCITP Training at certkingdom.com