About the Trainings

Most class sessions have both interactive Modules courtesy of Data Camp1 and Walkthroughs created by me that you will need to work through after doing the readings and reviewing the corresponding content (if applicable). The lessons are a central part of the class and are focused on using the tidyverse family of packages, though the approaches are certainly not the only ways to wrangle, clean, analyze, and visualize data in R.

Advice

Carve out some time everyday to go through these. If you try to complete everything in one sitting, it will probably be overwhelming! However if you have familiarity with some modules, please feel free to work ahead.

Grading

The ultimate point of Data Camp is to get you familiarized with an environment that you likely have never seen or been exposed to. While you should absolutely go through each module, there is certainly no expectation that you will get everything right. In fact, the points that you incur don’t mean anything as far as how you are assessed so please use hints as needed! As with any things data science, you’ll learn by doing. If you have a polar personality type as it pertains to work (i.e. primarily a perfectionist or mostly careless), then the modules will likely prove to be a challenge. The chance that you will be able to comprehend everything by going beyond your limit or conversely assuming it will just come to you is low so please work hard but also take breaks, swear2, look on the Internet, ask peers, or reach out for help. Your score is predicated on putting in a solid effort, rather than getting it perfect because that’s not realistic when it comes to data.

Data Camp Schedule

A tentative schedule is given below. The Course and Chapter names represent Data Camp titles3:

Required

Modules that do

not require a corresponding task will be assessed only on the successful completion of the data camp course

require a task will be assessed on both the successful completion of the data camp course and corresponding assessment to be submitted via eCampus

LinkDueRequiredTaskModuleChapters
Week 18/30/22Introduction to RIntro to basics
Vectors
Matrices
Factors
Data Frames
Lists
Week 18/30/22Introduction to RIntro to basics
Vectors
Matrices
Factors
Data Frames
Lists
Week 29/6/22Introduction to the TidyverseData wrangling
Data visualization
Grouping and summarizing
Types of visualizations
Week 29/6/22Introduction to Data Visualization with ggplot2Explore your data
Tame your data
Tidy your data
Transform your data
Week 29/6/22Introduction to Data Visualization with ggplot2Explore your data
Tame your data
Tidy your data
Transform your data
Week 39/20/22Intermediate Data Visualization with ggplot2Statistics
Coordinates
Facets
Best Practices
Week 39/20/22Intermediate Data Visualization with ggplot2Statistics
Coordinates
Facets
Best Practices
Week 410/4/22Visualization Best Practices in RProportions of a whole
Point data
Single distributions
Comparing distributions
Week 410/4/22Visualization Best Practices in RProportions of a whole
Point data
Single distributions
Comparing distributions
Week 510/18/22Unsupervised Learning in RUnsupervised Learning in R
Hierarchical clustering
Dimensionality reduction with PCA
Putting it all together with a case study
Week 611/1/22Introduction to Text Analysis in RWrangling Text
Visualizing Text
Sentiment Analysis
Topic Modeling
Week 711/15/22Communicating with Data in the TidyverseCustom ggplot2 themes
Creating a custom and unique visualization
Introduction to Rmarkdown
Customizing your RMarkdown report
Week 811/29/2022Analyzing Social Media Data in RUnderstanding Twitter data
Analyzing Twitter data
Visualize Tweet texts
Network Analysis and putting Twitter data on the map

The following module is optional but highly recommended

RequiredTaskModuleChapters
Intermediate RConditionals and Control Flow
Loops
Functions
The apply family
Utilities

Extra Credit

The following modules are optional and may count as extra credit contingnet on the successful completion of the data camp course and corresponding assessment to be submitted via eCampus. Please note that each subsequent module is dependent on the previous one.

DueRequiredTaskModuleChapters
12/9/22Network Analysis in the TidyverseThe hubs of the network
In its weakness lies its strength
Connection patterns
Similarity clusters
12/9/22Supervised Learning in R: Classificationk-Nearest Neighbors (kNN)
Naive Bayes
Logistic Regression
Classification Trees
12/9/22Predictive Analytics using Networked Data in RIntroduction, networks and labelled networks
Homophily
Network Featurization
Putting it all together

R Tasks

In some weeks you will be expected to complete an additional R task which are indicated by a in the table above. Collectively these serve as the R Data EDA noted on the syllabus.

Working Ahead

By no means do you have to wait for a particular module to be assigned. If you wish to enroll in a training - one that is assigned or otherwise - simply search for the name of that course on the Data Camp site. For those modules assigned for this course, you will receive credit after the due date has passed.

Need Help?

While I am happy to meet face-to-face, it is just as easy to schedule a Zoom session using the calendar or by notifying me on Slack by adding @Dr. Abhik Roy to your message.


  1. Please note that if you have (1) used Data Camp before and (2) are logged in with the same username, then any module that was successfully completed will not have to be done again. ↩︎

  2. and curse my name if you have to ↩︎

  3. Subject to change with notice. ↩︎