How to become a data scientist
“How to become a data scientist?” is an interesting question, because there’s no real formal training as of yet to become one. Some universities are combining mathematics, computer science, and humanities classes together, but nothing formal has been decided in terms of a major or full concentration of study. Berkeley, Stanford, and the other greats have classes related to data science, but most classes are nestled within existing information technology or math departments. This is perhaps due to the idea that the position still isn’t properly defined, and “data scientist” is usually a catch all term for people with a variety of skills – some that even tend to conflict with each other. Most hard math or science majors are 1+1=2, end of story. Humanities tends to look at the world more abstractly and realize that there is leeway and not everything adds up. Data science requires much from both of these.
The requirements of people with these skills are also somewhat across the board, with specializations reaching from simple large scale data management and storage, to those who can apply analytics and machine learning or artificial intelligence to make predictions of the future or better apply recommendations to consumers ála Netflix, Amazon, Facebook, and Google.
Nonetheless, there are a specific set of skills you can work to develop and fields of study you can dabble in if you’re interested in working with data. While still somewhat vague, the ultimate purpose of today’s data science is to manage, make sense of, and ask questions of data sets.
Applied data science is all about measurement, so work on increasing your statistical chops. In addition to being a general good life skill (probability and common statistics can be used in the media to manipulate human behavior or use to fear monger those into believing false or loosely defined relationships. Knowing even elementary statistics helps you spot bad science.)
Depending on the type of data science you’re into (management vs analytics, for example) a good understanding of computers is a strong skill to develop. Even if you’re interested in only mathematical applications, elementary programming classes can familiarize you with a certain logic and problem solving mindset useful in this space. Being familiar with database languages like MySQL, and the statistical language R, and even web technologies like HTML and PHP can help you write applications to gather data and make life much easier.
Economics / Biology / Bio – Informatics / Physics …
I’ve got a soft spot for my own field of study, economics. But any simple or complex science in which you model reality and try to describe it is useful for data science. Economics itself is the study of efficient allocation of limited resources, so many economic models are built to use data to describe processes and how firms and consumers interact, among many other things. Physics and Biology are also concerned with modeling their “ecosystem” and finding relationships between all of its actors. Being fascinated with how changing inputs changes the outputs is a good mindset to have, all while being able to approach it with a scientific method style of hypothesis testing.
Beyond University, there are a multitude of resources out there for learning how to play with data. MIT OpenCourseWare has a lot of free courses, many dealing with computer science, math, and other sciences. LinkedIn has lots of groups devoted to those who work in data. Try connecting with those people.