End Of Course Reflections

I am just wrapping up a course (ST558 at NCSU) that covers advanced R programming topics, like parallelization, Shiny, and containerization. I started this blog for the course and as my final assignment am making this reflection post.

What has changed about what I think a data scientist is and does?
- About a month ago, I made a blog post describing my views on data science. My views remained unchanged. Data science, to me, is just applied statistics in industry. There are non-statistical duties to the job, data collection and wrangling, possibly some software development, but the core of data science is making data-based inferences and predictions. This is the domain of statistics.
What are my current thoughts in terms of using R for data science and will I continue to use R going forward?
- I used to think of R as a good language for prototyping analyses and models, but I will be using it for much more going forward. I still think R’s speed is limiting and other languages are better for production code, but R deserves more credit than I gave it previously. Between Rmarkdown, Shiny, and ggplot2, R may be the best tool for communicating data analyses.
What things will I do differently in practice after this course?
- I started programming in 2016 with MATLAB and have been using Python as long, so I was used to good programming practices, functional programming, etc. I will change some practices though. Going forward, I will make use of automating Rmarkdown reporting and Shiny for scaling and sharing analyses. I have already started using R’s optimized looping functions, vectorization, and parallelization for speeding up operations that are amenable to those techniques.

Written on August 3, 2021