As a more advanced topic, once you get down to managing bigger data sets and/or aim for reproducability you will need to find a programing language for your analysis. It is also inevitable that you bump into the issue whether to pick R vs Python or the other way around.
Why R vs Python?
Why these languages and not some other, especially considering that there are so many other options out there, such as SQL, SAS, Matlab and I can go on for a while. The argument of choosing one of these 2 languages usually goes back to a few simple points:
- Simply put, they are free. OK, I know open source is not exactly the same as free, but at the end of the day this is a key factor, especially when comparing e.g.: to SAS.
- Big support community either in the form of mailing lists or stack exchange threads
- Packages specialized for data analysis. R as a whole languages is optimized for this, but Python has very capable packages (pandas, numpy, sci-py, etc.)
But then why is there an R vs Python debate?
I believe the below infographic sums it up pretty well at the end when it says: the choice of best statistical language depends on your needs. Without giving away much of the below collected information from Datacamp, I would summarize it in the following way:
- R is geared very much towards data analysis, so its language and packages are optimized for taking apart and visualizing data sets. Nevertheless that can make its syntax a bit more complex to learn
- Python is a generic programing language, so it’s slightly less focused on fast analysis, but if you need to deploy models or embed these analysis into different applications you may be better of with it
Last, but not least, do check the below infographic for a more detailed picture on the discussion of R vs Python.