[Following from Cloud Revolution]
It’s late at night in January and an email just arrived that has someone very excited. A medical student has won an auction, but this is no eBay auction, it’s an Amazon EC2 instance auction. She has been preparing for this moment for over a year and will make or break her post-doctoral research paper.
She is continuing the valuable work done by her predecessors in the field of on Parkinson’s research and its relationship to dopamine levels in the brain. Previous studies were painstakingly conducted by carefully tracking, monitoring and documenting the lives of hundreds of thousands of patients presenting with dopamine affecting afflictions such as methamphetamine addictions over the course of 10-15 years.
Our scientist is hoping to show a relationship to other causes of dopamine depletion and to do it she’s has been scouring the huge catalogue of free and paid for databases from the likes of the WHO and Science Direct available in the Cloud. Bringing together datasets as wide as meteorology, geography, drug addiction, depression and death rates she has amassed 20Tb of data that needs to be processed to find correlations. Trial runs on small subsets have shown that it will take around 30,000 hours for one CPU to get through the full set. She can’t afford to wait the 3 years to run on her own machine, and limited by a $5,000 grant she can’t buy the hardware needed to do it herself. Instead she puts in a bid for 5000 Amazon EC2 Spot instances at 4c/hour.
Humming in the darkness under a football field sized roof in Ashburn Virginia are thousands of computers running Amazon.com, Instagram, Reddit, Quora and Foursquare. But now it is late in the evening, many shoppers and users are asleep, post Christmas sales are over and the GFC have all conspired to bring Internet traffic to a record low. One by one computers are being freed up into a pool until there are 5000 available and the deal is struck. 6 hours and $1200 later and she has her answer.
This is an example of the new paradigm of data-intensive scientific discovery and it’s happening right now. Effectively time-shifting 16 years of research into 6 hours of processing by utilising data and computing power that already exists. While many organizations are battling with the concept of how to secure their data in the cloud, others have seen the opportunity and make their data freely available or as a chargeable service.
There are many problems that don’t need to be solved now, or even today. In fact some problems have remained unsolved for years but may now tackled using enormous amounts of otherwise idle computing capacity at prices previously unheard of to scientists. This new tool of human evolution will be used to map the neurons in the brain, solve the riddle of Parkinson’s disease, chip away at the list of cancers and discover unexpected relationships and correlations across massive datasets of medical information.
And it’s all available to anyone with a hunch or a hypothesis they want to test.