Prospecting for Predictions: Machine-learning Algorithms in an Information Age

By Michael LeVine, @ThoughtCulture

Steam Punk Robot
Steam Punk Robot Miner by Nathaniel Daught © 2009

Data is everywhere. In this new, digital world, huge corporations are buying and selling streams of 0s and 1s at an incredible rate. Every morning, a wide-eyed, up-and-coming entrepreneur fires up his or her laptop and begins coding a new smartphone app with the dream of transforming your mundane, everyday activities into a wealth of electronic gold – not unlike a miner panning for precious metal in the California Gold Rush. While the seemingly endless supply of shiny ore that drove the forty-niners was actually finite, data can be mined continuously, as an unfathomable amount of new currency is generated every second by websites like Facebook and Twitter. But why is it worth so much?

Many of us freely share elements from our lives across these social media platforms, essentially doing the dirty work for market researchers. This makes it increasingly important for us to understand why people want our data and, more importantly, what is being done with it.

Computer scientists have worked hard to develop complex algorithms that turn various types of data into an array of useful predictions. These predictions are often made using a class of computer algorithms called “machine learning algorithms.” Even though the Wikipedia article about these algorithms seems intimidating, the goal is quite simple: to create computer programs that can “learn” from data.

For example, as the computer collects data about your likes on Facebook, it uses that data to deduce what other types of movies and music you might enjoy. These algorithms are designed with a never-ending thirst for data, and won’t stop until they can make the best predictions possible. In addition, they aren’t limited to your browsing habits, and they are certainly not limited to a particular demographic or subject matter. Machine learning techniques are appearing everywhere – they’re our most efficient pans when prospecting for predictions.

The ability of these algorithms to make predictions can be both awe-inspiring and terrifying. In one case, an algorithm predicted that a teenage girl was pregnant based on her Target shopping habits, and her father became unhappy when she began to receive coupons for baby food. While this sounds like the digital equivalent of asking a heavy-set woman when her baby is due, it turned out the algorithm was right and the girl was pregnant.

However, these algorithms aren’t just socially awkward – they can also be dangerous. Despite being a product of Hollywood imagination, the movie GATTACA portrays a dark and ominous example of a data-driven dystopia. In this movie, we’ve become perilously confident in our ability to predict the future from the genome alone. Entire futures are discarded without any regard for the role of environment, personal choices, or any of the many other factors that work in concert with our genetic makeup.

While GATTACA is just a movie, we’re seeing hints of this reality in the medical world today. Recently, actress Angelina Jolie underwent a double mastectomy when doctors identified she carried the dangerous BRCA1 gene mutation. Jolie was estimated to have an 87% chance of developing breast cancer, and 50% chance for developing ovarian cancer. In the case of breast cancer, the prediction was alarming – 87% is quite high for a life-threatening disease, and Jolie discusses her and her doctor’s decision to get the mastectomy and the events of the procedure in an opinion piece in the New York Times.  Even though the piece created quite a stir on the Internet, the BRCA1 mutation isn’t new, and women have been opting to take extreme preventative measures for years.

How much faith should we put into medical predictions based on our genetics? We’ve long known that there is a significant enrichment for the BRCA1/BRCA2 mutations in Ashkenazi Jews. After my Ashkenazi mother finished her treatment for ovarian cancer last year, many of the females in our family were tested for the BRCA1 and BRCA2 mutations, but no one underwent mastectomy based on genetic testing alone. As was with Jolie, the conversation with the doctor is more than a number predicted by a computer, and the combination of man and machine has done great things in the medical world. As the volume of data increases in the biological sciences due to advancements in technologies like whole genome sequencing, computers have become better and better at linking mutations to diseases, opening up the possibilities for personalized medicine.  But, we should move forward with caution. While a supercomputer can beat a chess Grandmaster, two amateurs with three average computers can beat them both [1], so there is still room for partnership and teamwork between the organic and silicon-based life forms.

Machine learning algorithms and data-driven predictions are undoubtedly revolutionizing the world around us. However, it’s important to remember that these machines can’t see the future – they only try to make good predictions from the data we give them. I know if I put all my trust in Netflix’s recommendations, I would have quit watching TV a long time ago.

[1] Rasskin-Gutman, Diego. Chess Metaphors: Artificial Intelligence and the Human Mind. 2009.

Steam Punk Robot Miner, Photo Credit

2 thoughts on “Prospecting for Predictions: Machine-learning Algorithms in an Information Age”

    • Thanks for the comment! I am also interested in the digitalization of everyday life – I should have an article coming out soon that deals with the “data omniscience” you describe in your blog post. While I don’t see it quite as dark as 1984, I agree with you; the world of our children will be very different than the world we now know.

Join the conversation!