Now, suppose you want to take into account external factors such as weather and fashion trends. Do short-sleeved blouses sell better when it is hotter or sunnier than when it is cooler or rainier? Probably. You can test that by including historical weather data in your model, although it might be a little unwieldy to do so with a time-series statistical model, so you might try decision forest regression, and while you’re at it try the other 7 kinds of machine learning models for regression (see screenshot above), and then compare the “cost” (a normalized error function) for each model when tested against last year’s actual results, to find the best model.
Is navy blue going to sell better or worse next month than it did the same time last year? You can look at all monthly sales of navy blue clothing and predict annual fashion trends, and perhaps fold that into your machine learning models. Or you might need to apply a manual correction (a.k.a. “a wild guess”) to your models based on what you hear from the fashion press. (“Let’s bump that prediction up by 20% just in case.”)
Perhaps you want to do even better by creating a deep neural network for this prediction. You might discover that you can improve the regression error by a few percent for each hidden layer that you add, until at some point the next layer doesn’t help any more. The point of diminishing returns might come because there are no more features to recognize in the model, or more likely because there just isn’t enough data to support more refinements.
You have enough data scientists
You may have noticed that a person had to build all the models discussed above. No, it isn’t a matter of dumping data into a hopper and pressing a button. It takes experience, intuition, an ability to program and a good background in statistics to get anywhere with machine learning, no matter what tools you use — despite what vendors may claim.
Certain vendors in particular tend to claim that “anyone” or “any business role” can use their pre-trained applied machine learning models. That might be true if the model is for exactly the problem at hand, such as translating written formal Quebecois French to English, but the more usual case is that your data isn’t well-fit by existing trained machine learning (ML) models. Since you have to train the model, you’re going to need data analysts and data scientists to guide the training, which is still more an art than it is engineering or science.
One of the oddest things about hiring data scientists is the posted requirements, especially when compared to the actual skills of those hired. The ads often say “Wanted: Data Scientist. STEM Ph.D. plus 20 years experience.” The first oddity is that the field hasn’t really been around for 20 years. The second oddity is that companies hire 26-year-olds right out of grad school — that is, with no work experience at all outside of academia, much less 20 years — in preference to people who already know how to do this stuff, because they are afraid that senior people will be too expensive, and despite the fact that they asked for 20 years of experience. Yes, it’s hypocritical, and most likely illegal age discrimination, but that’s what’s been happening.
Sign up for CIO Asia eNewsletters.