Today, I consolidated my understanding by doing a capstone project analysing Fandango's movie ratings in 2015 compared to three other movie aggregators sites—Metacritic, IMDB, and Rotten Tomatoes. For this exploratory data analysis, all movie titles are the same.
The point of contention is to find out whether Fandango's movie ratings are skewed more positive compared to the others. This arose because of the conflict of interests that Fandango also sells movie tickets for a commission. This was picked up by news outlet, 538, and the author used data analysis to support his findings. This exercise is to confirm his findings.
Here are the steps guided by Jose Portilla's tutorial:
- Read two open-sourced data from 538's repository.
- Explore the DataFrame properties.
- Explore the relationship between the popularity of a film and its rating.
- Create a scatterplot showing the relationship between rating and votes.
- Calculate the correlation between columns.
- Visualise the count of movies per year featured in Fandango with a plot. (This is to validate why the year 2015 was chosen since it was the largest data set available then.)
- Plot KDE to show the distribution and set a lower and upper cap to contain 0 to 5 ratings.
- Repeat steps for the other three providers.
- Merge DataFrames.
- Compare their KDE.
- Conclude that KDE for Fandango is skewed towards higher ratings compared to the others.