Two factors in particular, Mason says, will drive the creation of new data products that seem unimaginable today. Those include progress in the field of data science and the building of abstraction layers, and the drop in the cost of computation through the use of commodity clusters.
“You can imagine in one year we sort of know what the world will look like,” the data scientist said during a keynote at last month’s AnacondaCON, which Continuum Analytics hosted in its hometown of Austin, Texas. “But in five years, we will all have super powers that today seem hugely expensive in comparison.”
To illustrate her point, Mason related an event that occurred several years ago, when Fast Forward Labs had just taken delivery of a brand spanking new Hadoop cluster, and the team needed a job to burn it in.
“I like dogs. [My coworker] likes cats,” she said. “Wouldn’t it be fun to see…which ones do people share more on social media?”
So Mason and her coworker gathered up three years’ worth of social media data, which included 80 million items shared per day, and crunched the numbers in a massive eight-hour job on the new Hadoop cluster. In the end, canines beat out their feline companions in the most-shared-on-social-media department.
“This was a massive waste of energy,” Mason admited. “And the idea that we had computational power so cheap that we could apply it to something so absolutely trivial really blew my mind.”
Encourage Frivolous Exploration
Certainly, that Hadoop cluster has gone on to do more meaningful work for clients of Fast Forward Labs, which Mason and her colleagues created as a boutique data science consultancy with a specialty in prototyping machine intelligence solutions after doing “deep dives” to explore their clients’ particular situation.
As Mason explains, we’re on the cusp of major breakthroughs in the capability to build data products—that is, products that couldn’t exist without the use of data in their creation. The potential for big data analytics is truly massive, but it will take data scientists poking around in unusual places to discover patterns and anomalies that could give rise to the new data products of the future.
That data can come from Facebook likes and Twitter shares, but it can also come from other sources that may not seem relevant. For example, one of Fast Forward Labs clients has “living data” dating back to the 1800s sitting on their mainframe. Being able to pull that data out of the mainframe and use it for analyses critical to the success of certain data products.
Finding the right questions to ask is also important to getting value out of it. “Having data is not useful unless you have the right problems to apply it to,” Mason said. “The way you find a good question to ask, often is to ask something until you have no good intuition for the answer, and then try to do some analytics to validate it.”
Another key to building a good data product is having the freedom to pursue oddball ideas. “When you go from nothing to be able to do [analytics] so cheaply and you can apply it to completely frivolous exploratory work is equally as important, and it might even matter more,” Mason said. “And that’s where we are today with data products. This is why it’s important to have this conversation.”
Prepare for Failure
Not all data products turn out well, and that’s fine. For every data product champion, like Mason’s favorite, Google Maps, there are ideas that could use some more baking, such as the Pictograph image classification prototype released by Fast Forward Labs last year.
Pictograph used deep learning techniques to automatically categorize an Instagram users photos. The system was trained using the ImageNet deep neural network, but it sometimes misidentified pictures, to hilarious ends.
“Everything it thinks are crabs are pictures of French fries near water,” Mason said. Her own photo collection of New York City sites, which typically include subway stations, was categorized as mostly correctional institutions. “Because the models are not interpretable, there’s no way to go in and edit,” she said.
Whatever you want to call the tools used today – advanced analytics, machine learning, artificial intelligence – what’s clear is that data scientists must be given the freedom to work with the tools and the data in their own way.
People creating data products can’t be shoehorned into the typical development cycle favored by corporate coders and agile startups alike. Mason said data scientists may want to give the work of researching data, building models and then operationalizing them a different name distinct from what the software dev people are doing–perhaps the “experimental development process,” she said.
“A lot of people are doing this really well, but there’s no standard way to do it, and there’s no one set of best practices,” she said. “Right now we’re in the part of developing this as a community where we all tell our stories and ideas. But if you go from one company to another, you’ll find this is all done really differently.”
Build, Don’t Buy
It would be great to be able to go and buy an awesome data product that you can use in your company to go and win new customers and conquer the industry. But that’s not reality, for better or for worse.
“You can’t buy these things,” Mason said. “This is something that keeps coming up. When somebody says ‘I want that. I want to buy it. I don’t want to build it. I don’t want to understand it. I don’t’ want to get the data for it.’ Unfortunately this doesn’t work in our domain, yet.”
We may get there at some point. That may be part of how we all obtain the data superpowers that Mason is convinced we all will have at some time in the future. But today, the complexity involved in building generic data products that work for multiple customers’ specific use cases is just too great for our current technological capability.
There’s a silver lining to this, however: that high bar of complexity is providing a window of opportunity for organizations that can overcome the technical obstacles and build compelling data products that elevate them over the competition. Dreaming up those data products, and then building them, is what motivates Mason and her team to continue innovating at the edge of what’s possible.
“We get to imagine a future, imagine what we can actually do with this data once we get all the details in place, and to think about what the future might look like and how we’re going to build it,” she said. “It’s really exiting, and we’re at the beginning. If we do it well, it’s going to be a lot of fun over the next eight years or so.”