Towards the end of the Course, you will be working on a live project. You can choose any of the following as your Project work:
Project #1: Sentiment Analysis of Twitter Data
Industry: Social Media
Description: A sports gear company is planning to brand themselves by putting their company logo on the jersey of an IPL team. We assume that any team which is more popular on twitter will give a good ROI. So, we evaluate two different teams of IPL based on their social media popularity and the team which is more popular on twitter will be chosen for brand endorsement. The data to be analyzed is streamed live on Twitter and sentiment analysis is performed on the same. The final output involves a comparable visualization plot of both the teams so that the clear winner can be seen.
The following insights need to be calculated :
1) Setup connection with twitter using twitter package. And perform authentication using handshake function.
2) Import tweets from the official twitter handle of the two teams using a SearchTwitter function.
3) Prepare a sentiment function in R, which will take the arguments and find its negative or positive score.
4) Score against each tweet should be calculated.
5) Compare the scores of both the teams and visualize it.
Project #2: Census Data Analysis
Industry: Government Dataset
Description: Analyze the census data and predict whether the income exceeds $50K per year. Follow end to end modeling process involving:
1) Perform Exploratory Data Analysis and establish the hypothesis of the data.
2) Test for Multi col-linearity, handle outliers and treat missing data.
3) Create training and validation data sets using Stratified Random Sampling (SRS) of data.
4) Fit Classification model on training set (Logistic Regression/Decision Tree)
5) Perform validation of the models (ROC curve, Confusion Matrix)
6) Evaluate and freeze the final model.
Additional Resources:
Here is the list of few additional case studies that you will get at Certhippo for the deeper understanding of R applications.
Study#1: Market Basket Analysis
Industry: Retail - CPG
Description: Market Basket Analysis is done to see if there are combinations of products that frequently co-occur in transactions. The analysis gives clues as to what a customer might have bought if the idea had occurred to them. This is done using the “Association Rules” on real-time data. In this case study, you shall understand various methods for finding useful associations in large data sets using statistical performance measures. You will also learn how to manage the peculiarities of working with transaction data.
Data-set: The dataset used here is a grocery superstore with 9835 rows of free-flowing data without any labels.
Study#2: Strategic Customer Segmentation for Retail Business
Industry: E-Commerce, Retail
Description: In this case study, we will consider the dataset from a UK-based online retail business for the last two years. The objective of this case study is to do customer segmentation in this data set.
For this exercise, we are going to use customer’s recency, frequency, and monetary (RFM) values. From these three derived values, we will segment entire customer base and will generate insights on the dataset provided to do customer segmentation using RFM Model-based Clustering Analysis.
Data-set: comprises 0.5 million records and 8 variables. Each record is for one online order placed by the customer.
Study#3: Pricing Analytics and Price Elasticity
Industry: Retail
Description: A retailer is planning to sell a new type of cheese in some of its stores. This is a pilot project for the retailer & based on the data collected during this pilot phase, the retailer wants to understand a few things.
To promote sales of cheese, the retailer is planning for two different types of in-store advertisement:
1) Cheese as a natural product
2) Cheese as a family caring product
Now the retailer wants to know:
1) Which in-store advertisement theme is better and giving better sales of cheese in the store?
2) How are the sales of cheese reacting to its price change i.e. price elasticity?
3) What is the impact of the price changes of other products in the same store (e.g. Ice-cream & Milk) on the sales of cheese i.e. cross-price elasticity?
4) What should be the best price of cheese to maximize the sales and then do sales forecast.
Data-set: The data set used in this case study will have the following columns -
1) Price of Cheese
2) Sales of Cheese
3) Advertising method for cheese (either as a natural product or as a family product)
4) Price of Ice cream
5) Price of Milk
Study#4: Clustering Application using Shiny
Industry: Consumer Packaged Goods
Description: Shiny turn your analyses into interactive web applications, it is a web application framework for R. The data set that we are using in this case study relates to the clients of a wholesale distributor. It comprises, the annual spending in monetary units (m.u.) on diverse product categories. With this data, we want to create a web-based shiny application which can segment customers of wholesale distributor based upon the parameter passed thru UI.r
Data-set: The data set used in this case study has 440 rows of data and has the following attributes in columns -
1) Channel
2) Region
3) Fresh
4) Milk
5) Grocery
6) Frozen
7) Detergents_Paper
8) Delicatessen