Python or R for data science?

The First Concept on a journey to become usually starts with programming language. It’s not compulsory to learn programming first, but it’s more preferable. Ken Jee, a data scientist specializing in data analytics wrote in one of his article on how he would learn data science if he could start over. In that article he preferred to programming first.Two Of the Best Programming languages for data science are : R and Python.

R is preferred in data science because statistics is an important and heavy part of data science and R provides extensive support for statistics. It also have a ability to interface with NoSQL databases and analyze unstructured data. R provides important packages for data wrangling (like dplyr, purrr, readxl,etc) and data visualization tools (like ggplot2,scatterplot3D,etc.)

Python in data science is preferred because of the scalability and flexibility to have multiple approaches for a single problem. It is also easy to learn and has a ton of data science libraries. Python is famous for it’s community. Python community is seen volunteering for creating new data science libraries and are quite helpful when you are stuck on a problem. You can interact with fellow Python Programmers on codementor and stack overflow.

Conclusion

Both the programming languages has it’s pros and are needed for your journey on being a data scientist. The idea is to don't think too hard about it and start somewhere.

Sources

To learn Python for data science, check out Python for Data Science, AI & Development. Try Python for Everybody Specialization by Michigan University on Coursera if you are new to programming.

To learn R for data science, try Learn R with DataCamp. I am currently learning R from the course R programming by John Hopkins University on Coursera.

Ken Jee’s article on How I Would Learn Data Science (If I Had to Start Over)

Comments