The Best Movie


Using data from Kaggle’s The Movies Dataset, let’s determine what makes the best movies. The Data set gives us ratings from The Movie Database(tMDb), where movies are rated out of 10. And the top 10 movies are:

The Godfather8.506,024
The Shawshank Redemption8.508,358
Spirited Away8.303,968
The Dark Knight8.3012,269
The Godfather: Part II8.303,418
Schindler's List8.304,436
One Flew Over the Cuckoo's Nest8.303,001
Fight Club8.309,678
Life Is Beautiful8.303,643

Other Ratings

But that’s not the whole story. Included in the database is a set of 26,024,289 individual ratings by users. Since we’re not making individual movie recomendations – that’s another unit – let’s aggregate the scores into means, merge them into the movie database, and see how that compares to the tMDB ratings. After averaging, we get the following top 10.

Sleepless in Seattle4.3457,070
Once Were Warriors4.2767,662
Hard Target4.2613,994
License to Wed4.2360,024
The Talented Mr. Ripley4.1833,987
Galaxy Quest4.175,453
Terminator 3: Rise of the Machines4.1787,901
Local Color4.1725,245
Hannibal Rising4.165,199
Ice Age: The Meltdown4.153,628


It’s a very different list, with zero overlap. If only there was a way to scale or combine them. But we’re not done yet. There’s another way you could define the best movies. And this is the one the studios care about. How much money they made. The Dataset also provides revenue, so let’s see what that list looks like.

The Lord of the Rings: The Return of the King$1,118,888,979
Pirates of the Caribbean: Dead Man's Chest$1,065,659,812
Pirates of the Caribbean: On Stranger Tides$1,045,713,802
The Dark Knight$1,004,558,444
Harry Potter and the Philosopher's Stone$976,475,550
Finding Nemo$940,335,536
Harry Potter and the Half-Blood Prince$933,959,197
The Lord of the Rings: The Two Towers$926,287,400
Star Wars: Episode I - The Phantom Menace$924,317,558

Adjusted Revenue

Another new list. But that’s not quite right. The value of a dollar has changed over time. 1 billion 1920 dollars isn’t the same as 1 billion 2020 dollars. So using a third dataset provided by the federal government, we can adjust the values based on the year that these movies were released.

titleadjusted revenue
Star Wars$3,028,727,803
E.T. the Extra-Terrestrial$1,945,321,985
The Empire Strikes Back$1,546,829,516
Jurassic Park$1,507,846,186
The Lord of the Rings: The Return of the King$1,439,899,368
The Godfather$1,387,267,307
Return of the Jedi$1,361,232,958
Star Wars: Episode I - The Phantom Menace$1,313,638,874
Harry Potter and the Philosopher's Stone$1,305,536,965

PCA Rating

So now we have 3 different metrics with scales that go from 0-5 for one, to billions for another. Which should we use to determine the best movie? With the magic of principle component analysis, we don’t have to decide! After waving the PCA wand combining the ratings from 2 different databases along with the adjusted revenue, we get the following top 10 movies.

Star Wars8.86514
E.T. the Extra-Terrestrial5.27108
The Empire Strikes Back4.81926
The Godfather4.75741
The Lord of the Rings: The Return of the King4.51399
Jurassic Park4.3515
Return of the Jedi4.21927
The Lord of the Rings: The Two Towers3.98178
The Dark Knight3.87451

So the original Star Wars is the best movie.

Data Insights

Movies through the years

It’s been said that classic movies are better, and that modern movies are just terrible in comparison. We can graph time against the PCA rating to see if that’s true. And from this graph, it becomes clear that movies hit a high point in the 70s and really took a nose dive in the 2000s

If the Null hypothesis were that movies in 1970 were better than movies in 2000s, we would have to reject it with a calculated P value of $4.54 \times 10^{-05}$


But what else might effect a movies ratings? Are adventure movies better than Westerns? Are Romance movies better than comedies?

PCA rating of Genres

From the data, the clear loser are Foreign Films, and that War movies are a safe bet.


How about budgets? Do bigger budgets translate to better movies?

Mean BudgetMean RevenuePCA
Budget Category
[0, 500,000)$197,962$15,368,653-0.15
[500,000, 40,000,000)$18,500,755$61,334,171-0.06
[40,000,000, 100,000,000)$61,415,816$186,069,8710.04
[100,000,000, 1,000,000,000,000)$141,292,135$436,756,7680.70

What else?

Clearly, there are other things that could affect the rating of a movie. Here are possible other insights that might be scraped from the data

  • Who is the best director of all time?

  • Which movies had the greatest return on investment given their budget and their revenue?

  • Should Directors stay in their lane? In other words, are directors who have a body of work that is majority one genre able to switch to a completely different genre and still make movies that are just as good? For example, a renowned Horror director might decide to do a romance film. How do they do? How about the other way around?

