Yes, great data scientists execute queries and database runs, but they also design suggestions for architecting queries in ways that not only return a defined set of results to answer a question someone already asked, but that also reveal new insights into questions that have not yet been asked by an organization. This is where the real value of a data scientist will present itself over the coming years.
While some might argue that this is a soft skill that's difficult to interview for, carefully crafted hypothetical scenarios presented to candidates during interviews can help you understand their thought process, their approach to a problem, the various ways the candidate would attempt to glean the answers to the problem and what other questions the candidate could pose that would add value to the original query. Stress to candidates during the interviews that outside-the-box thinking is encouraged, while limiting answers to only the problems posed is discouraged.
3. A Good Data Scientist Is Familiar With Database Design and Implementation
It's important for today's data scientists to sit somewhere between an inquisitive university research scientist (which is essentially what the previous point describes) and a software developer or engineer: Someone who knows how to tune his lab and operate his machinery well.
Even though much of what falls under the "big data" category is known as unstructured data, a fundamental understanding of both relational and columnar databases can really serve a data scientist well. Many corporate data warehouses are of the traditional row-based relational database sort. While big data is new and alluring, much actionable data and trends can be teased from traditional databases.
Data scientists will also play a key role in setting up analytics and production databases to take advantage of new techniques. A history of working with databases would provide great context for setting up new systems in the new role.
Additionally, many big data software developers attempt to use SQL-like language in their products in an attempt to woo traditional database administrators who have no desire to learn a MapReduce-like language. Knowledge of traditional SQL will continue to pay dividends, allowing data scientists to play nicely and integrate well with other database professionals that you already have on staff.
4. A Good Data Scientist Has Baseline Proficiency in a Scripting Language
Your most qualified candidates should be awarded extra points for knowing Python at least somewhat well. Many query jobs over vast quantities of unstructured data are issued in scripts and take quite some time to run.
Python is generally accepted as the most compatible, most versatile scripting language for working with columnar databases, MapReduce-style queries and other elements of the data scientist puzzle. Python is an open source language known to be fairly usable and easy to read, so it shouldn't pose much of a hurdle for your base of data scientist candidates to overcome.
Sign up for CIO Asia eNewsletters.