IMDB Movie Analysis¶

To put to use, the data analytics skills I've acquired recently, I've tried to find interesting insights in movies released between 1916 and 2016, using Python. I've downloaded a movie dataset, written Python code to explore the data, gained insights into the movies, actors, directors, and collections. You can use the following links to go to a particular section or scroll the document for complete analysis.

If you are short of time to go through the entire analysis, here are the important conclusions.

We imported movie data from IMDB and inspected the attributes.
We cleaned the data of missing values, uneccessary data.
We found the five worst performing movies to be
- The Host
- Lady Vengeance
- Fateless
- Princess Mononoke
- Steamboy
We also found movies which made the most profit. James Cameron's Avatar is the most profitable movie so far. In fact, his Titanic is in the top 3. Christopher Nolan's Dark Night is the 10th most profitable.One can easily notice that most profitable movies are dominatingly Sci-Fi.
We ranked the top 250 movies by IMDB rating.
Among the best foreign Language films, we see 'Bahubali' and 'Veer Zara' which are quite popular in India.
It's no wonder that Charles Chaplin is the best director. Other noteworthy mentions are Damien Chazelle , the director of Whiplash and La La Land, Christopher Nolan and Alfred Hitchcock.
The most popular genres are Family + SciFi and SciFi Adventure. Looks like a lot of people are still the 'Back to the Future' clan.
We compared the most liked contemporary actors - Meryl Streep, Leonardo Decaprio and Brad Bitt. Turns out that among them, Leonardo Decaprio is both critically acclaimed and an audience favourite.
At the end, we looked at a comparison of popularity of movies by the decade in which they released.

Contents:¶

Reading and Inspection
- Importing and reading
- Dataframe Inspection
Cleaning-the-Data
Data Analysis
Conclusion

# Supressing Warnings
import warnings
warnings.filterwarnings('ignore')

# Importing the numpy and pandas packages
import numpy as np
import pandas as pd
import seaborn as sns

Reading and Inspection¶

Importing and reading¶

Importing and reading the movie database.

movies = pd.read_csv('./Movie+Assignment+Data.csv')
movies

Dataframe Inspection¶

Inspecting the dataframe's columns, shapes, variable types etc.

# Number of rows and columns in the dataset as a tuple : (rows, columns)
shape = movies.shape
print('There are {} rows and {} columns in the movies dataframe'.format(shape[0], shape[1]),'\n\n' )

There are 5043 rows and 28 columns in the movies dataframe

# dataframe info 
print('Column   : Values  \n', movies.info())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5043 entries, 0 to 5042
Data columns (total 28 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   color                      5024 non-null   object 
 1   director_name              4939 non-null   object 
 2   num_critic_for_reviews     4993 non-null   float64
 3   duration                   5028 non-null   float64
 4   director_facebook_likes    4939 non-null   float64
 5   actor_3_facebook_likes     5020 non-null   float64
 6   actor_2_name               5030 non-null   object 
 7   actor_1_facebook_likes     5036 non-null   float64
 8   gross                      4159 non-null   float64
 9   genres                     5043 non-null   object 
 10  actor_1_name               5036 non-null   object 
 11  movie_title                5043 non-null   object 
 12  num_voted_users            5043 non-null   int64  
 13  cast_total_facebook_likes  5043 non-null   int64  
 14  actor_3_name               5020 non-null   object 
 15  facenumber_in_poster       5030 non-null   float64
 16  plot_keywords              4890 non-null   object 
 17  movie_imdb_link            5043 non-null   object 
 18  num_user_for_reviews       5022 non-null   float64
 19  language                   5031 non-null   object 
 20  country                    5038 non-null   object 
 21  content_rating             4740 non-null   object 
 22  budget                     4551 non-null   float64
 23  title_year                 4935 non-null   float64
 24  actor_2_facebook_likes     5030 non-null   float64
 25  imdb_score                 5043 non-null   float64
 26  aspect_ratio               4714 non-null   float64
 27  movie_facebook_likes       5043 non-null   int64  
dtypes: float64(13), int64(3), object(12)
memory usage: 1.1+ MB
Column   : Values  
 None

The above output gives the names of the columns, Number of valid values in each column and the datatype

# Summary Statistics of the data.
movies.describe()

Cleaning the Data¶

Inspecting Null values¶

Finding out the number of Null values in all the columns and rows. Also, finding the percentage of Null values in each column. Rounding off the percentages upto two decimal places.

# No of rows containing null values in each column.
print('Column Name   \t: \tTotal Null Rows  \n', movies.isnull().sum().sort_values(ascending=False))

Column Name   	: 	Total Null Rows  
 gross                        884
budget                       492
aspect_ratio                 329
content_rating               303
plot_keywords                153
title_year                   108
director_name                104
director_facebook_likes      104
num_critic_for_reviews        50
actor_3_name                  23
actor_3_facebook_likes        23
num_user_for_reviews          21
color                         19
duration                      15
facenumber_in_poster          13
actor_2_name                  13
actor_2_facebook_likes        13
language                      12
actor_1_name                   7
actor_1_facebook_likes         7
country                        5
movie_facebook_likes           0
genres                         0
movie_title                    0
num_voted_users                0
movie_imdb_link                0
imdb_score                     0
cast_total_facebook_likes      0
dtype: int64

# No of columns containing null values in each row.
print('Row Index : \tTotal Null Columns  \n', movies.isnull().sum(axis=1).sort_values(ascending=False))

Row Index : 	Total Null Columns  
 279     15
4       14
4945    11
2241    11
2342    10
        ..
2708     0
2707     0
2706     0
2705     0
0        0
Length: 5043, dtype: int64

Note that some rows have more than half of the values missing.

# (Total null rows per column / Total rows in the data frame) * 100 , rounded to 2 decimals
column_nulls = np.round(((movies.isnull().sum()/movies.shape[0])*100).sort_values(ascending=False),2)

print('Column Name : \t\tNull Columns (%)  \n', column_nulls)

Column Name : 		Null Columns (%)  
 gross                        17.53
budget                        9.76
aspect_ratio                  6.52
content_rating                6.01
plot_keywords                 3.03
title_year                    2.14
director_name                 2.06
director_facebook_likes       2.06
num_critic_for_reviews        0.99
actor_3_name                  0.46
actor_3_facebook_likes        0.46
num_user_for_reviews          0.42
color                         0.38
duration                      0.30
facenumber_in_poster          0.26
actor_2_name                  0.26
actor_2_facebook_likes        0.26
language                      0.24
actor_1_name                  0.14
actor_1_facebook_likes        0.14
country                       0.10
movie_facebook_likes          0.00
genres                        0.00
movie_title                   0.00
num_voted_users               0.00
movie_imdb_link               0.00
imdb_score                    0.00
cast_total_facebook_likes     0.00
dtype: float64

Dropping unecessary columns¶

In this analysis, we will mostly be analyzing the movies with respect to the ratings, gross collection, popularity of movies, etc. So many of the columns in this dataframe are not required. So we drop the following columns.

color
director_facebook_likes
actor_1_facebook_likes
actor_2_facebook_likes
actor_3_facebook_likes
actor_2_name
cast_total_facebook_likes
actor_3_name
duration
facenumber_in_poster
content_rating
country
movie_imdb_link
aspect_ratio
plot_keywords

columns_to_drop = ['color','director_facebook_likes','actor_1_facebook_likes','actor_2_facebook_likes',
                   'actor_3_facebook_likes','actor_2_name','cast_total_facebook_likes','actor_3_name',
                   'duration','facenumber_in_poster','content_rating','country','movie_imdb_link',
                   'aspect_ratio','plot_keywords']

#dropping columns in place
movies.drop(columns=columns_to_drop, inplace=True)
movies.shape

(5043, 13)

Dropping unecessary rows using columns with high Null percentages¶

Now, on inspection you might notice that some columns have large percentage (greater than 5%) of Null values. Dropping all the rows which have Null values for such columns.

# columns with null (%) > 5
column_nulls = np.round((movies.isnull().sum()/movies.shape[0])*100,2)
high_null_columns = column_nulls[column_nulls > 5]
print(high_null_columns)

# dropping columns with high null (%)
movies.dropna( axis = 0, subset=high_null_columns.index, inplace=True)
movies

gross     17.53
budget     9.76
dtype: float64

Filling NaN values¶

You might notice that the language column has some NaN values. Here, on inspection, you will see that it is safe to replace all the missing values with 'English'.

# 12 NAs in language column

# Filling NAs with 'English' 
movies.fillna({'language' : 'English'}, inplace=True)
movies

Checking the number of retained rows¶

You might notice that two of the columns viz. num_critic_for_reviews and actor_1_name have small percentages of NaN values left. We can let these columns as it is for now.

print('Column Name : \t\tTotal Null Rows  \n', movies.isnull().sum(),'\n\n')

# Retained column % = (Current Columns / Initial Columns) * 100
#shape contains the shape of the original imported movies dataset
retained = (movies.shape[0]/shape[0])*100
print('Retained rows wrt to origin dataframe :',retained,'%\n\n')

Column Name : 		Total Null Rows  
 director_name             0
num_critic_for_reviews    1
gross                     0
genres                    0
actor_1_name              3
movie_title               0
num_voted_users           0
num_user_for_reviews      0
language                  0
budget                    0
title_year                0
imdb_score                0
movie_facebook_likes      0
dtype: int64 


Retained rows wrt to origin dataframe : 77.15645449137418 %

Data Analysis¶

Changing the unit of columns¶

Converting the unit of the budget and gross columns from $ to million $.

#Not rounding off to preserve accuracy
movies['budget'] = movies['budget']/10**6
movies['gross'] = movies['gross']/10**6
movies.head(10)

Finding the movies with highest profit¶

1. Creating a new column called `profit` which contains the difference of the two columns: `gross` and `budget`.
2. Sorting the dataframe using the `profit` column as reference.
3. Plotting `profit` (y-axis) vs `budget` (x- axis) and observe the outliers using the appropriate chart type.
4. Extracting the top ten profiting movies in descending order and store them in a new dataframe - `top10`

# creating the profit column
movies['profit'] = movies['gross'] - movies['budget']
movies.head()

# sorting the dataframe
movies.sort_values(by='profit', inplace=True, ascending=False)
movies.head()

# profit vs budget plot
import matplotlib.pyplot as plt 

movies.plot.scatter('budget', 'profit')
plt.title('Profit vs Budget')
plt.show()

From the above plot , we can see that some movies have made a lot of losses compared to most. These are outliers / exceptions

#outliers 
movies.loc[(movies['profit'] < -2000) & (movies['budget'] > 2000),'movie_title']

2334             Steamboy 
2323    Princess Mononoke 
3005             Fateless 
3859       Lady Vengeance 
2988             The Host 
Name: movie_title, dtype: object

The above movies have incurred huge losses compared to any movie with average earnings.

#top 10 movies by profit
top10 = movies.iloc[:10]
top10

Notice that, James Cameroon has made two of the top 10 most profitable movies. 'Avatar' & 'Titanic'. He has made more profits than the most profitable Steven Speilberg and Christopher Nolan movies.

Dropping duplicate values¶

Out of the top 10 profiting movies, you might have noticed a duplicate value. So, it seems like the dataframe has duplicate values as well. Dropping the duplicate values from the dataframe and repeat the previous task. Note that the same movie_title can be there in different languages.

# dropping duplicates
movies.drop_duplicates(inplace=True)

# repeating the previous task
movies_by_profit = movies.sort_values(by='profit', ascending=False)
top10 = movies_by_profit.iloc[:10]
top10

IMDb Top 250¶

1. Creating a new dataframe `IMDb_Top_250` and storing the top 250 movies with the highest IMDb Rating (corresponding to the column: `imdb_score`). We are considering only those movies with votes is greater than 25,000.We shall add a `Rank` column containing the values 1 to 250 indicating the ranks of the corresponding films.
2. We shall also extract the movies in the `IMDb_Top_250` dataframe which are not in the English language and store them in a new dataframe named `Top_Foreign_Lang_Film`.

# extracting the top 250 movies as per the IMDb score.

# selecting movies where the number of voted users is greater than 25000.

movies_by_imdb_score = movies[movies['num_voted_users'] > 25000].sort_values(by='imdb_score', ascending=False)

# selecting the first 250 movies
IMDb_Top_250 = movies_by_imdb_score.iloc[:250]

#adding a new column "Rank" which contains rank of the movie
IMDb_Top_250['Rank'] = np.arange(1,251)
IMDb_Top_250.tail()

Best Foreign Language Films¶

# Out of the top 250 movies, selecting movies which are not in 'English' language

Top_Foreign_Lang_Film = IMDb_Top_250[IMDb_Top_250['language'] != 'English'].sort_values(by='Rank')
Top_Foreign_Lang_Film

Well,our very own Veer-Zaara has made the list. So has Bahubali!

Finding the best directors¶

#extracting the top 10 directors

#Grouping 'movies' dataframe by 'director_name'. 
# Calculating mean of 'imdb_score' for each 'director_name'
mean_imdb_score = movies.groupby('director_name')['imdb_score'].mean()

#Creating a new dataframe directors_imdb_score
# mean imdb score is rounded to 1 decimal because that the accuracy of 1 decimal in the provided dataset
directors_mean_imdb_score = pd.DataFrame(np.round((mean_imdb_score),1))

# creating a new index and converting director_name into a column
directors_mean_imdb_score = directors_mean_imdb_score.reset_index()

# sorting in descending order of mean imdb scores
directors_mean_imdb_score.sort_values(by=['imdb_score','director_name'], ascending=[False,True],inplace=True)

# top 10 directors by mean imdb scores
top10director = directors_mean_imdb_score.iloc[:10]

top10director

No surprises that Damien Chazelle (director of Whiplash and La La Land) is in this list.

Most Popular genres¶

# splitting genre into genre_1 and genre_2
movies['genre_1'] = movies['genres'].apply(lambda x : x.split('|')[0])

def genre_2(x) : 
    split = x.split('|')
    if len(split) > 1 : 
        return split[1]
    else : 
        return split[0]
    
movies['genre_2'] = movies['genres'].apply(genre_2)
movies[movies['genre_1'] == movies['genre_2']]

# grouping movies by genre_1 and genre_2 
movies_by_segment = movies.groupby(['genre_1','genre_2'])
movies_by_segment=movies_by_segment['gross'].mean()
movies_by_segment.sort_values(ascending=False, inplace=True)
movies_by_segment
movies_by_segment.head()

genre_1    genre_2  
Family     Sci-Fi       434.949459
Adventure  Sci-Fi       228.627758
           Family       118.919540
           Animation    116.998550
Action     Adventure    109.595465
Name: gross, dtype: float64

Looks like the most popular Genre is Family + Sci-Fi. And it more than 2 times popular than the next category : Adventure + Sci-Fi

Critic favorite and Audience favorite actors¶

Meryl_Streep = movies[movies['actor_1_name'] == 'Meryl Streep']# Include all movies in which Meryl_Streep is the lead

Leo_Caprio = movies[movies['actor_1_name'] == 'Leonardo DiCaprio'] # Include all movies in which Leo_Caprio is the lead

Brad_Pitt = movies[movies['actor_1_name'] == 'Brad Pitt']  # Include all movies in which Brad_Pitt is the lead

# combining the three dataframes
Combined = pd.concat([Meryl_Streep,Leo_Caprio,Brad_Pitt])
Combined.head()

# grouping the combined dataframe
reviews_by_actor = Combined.groupby('actor_1_name')

# Finding the mean of critic reviews and audience reviews

# actors vs mean critic reviews in descending order
critic_reviews = reviews_by_actor['num_critic_for_reviews'].mean().sort_values(ascending=False)

# actors vs mean user reviews in descending order
user_reviews = reviews_by_actor['num_user_for_reviews'].mean().sort_values(ascending=False)
print(critic_reviews,'\n\n',user_reviews,'\n\n')

print('Actor with highest mean critic reviews : ',critic_reviews.index.to_list()[0] )
print('Actor with highest mean user reviews : ',user_reviews.index.to_list()[0] )

actor_1_name
Leonardo DiCaprio    330.190476
Brad Pitt            245.000000
Meryl Streep         181.454545
Name: num_critic_for_reviews, dtype: float64 

 actor_1_name
Leonardo DiCaprio    914.476190
Brad Pitt            742.352941
Meryl Streep         297.181818
Name: num_user_for_reviews, dtype: float64 


Actor with highest mean critic reviews :  Leonardo DiCaprio
Actor with highest mean user reviews :  Leonardo DiCaprio

Leonardo has aced both the lists. He is both the most liked and critically aclaimed actor!

# calculating decade

# Function to convert years into decades
decade_fn = np.vectorize(lambda x : int((x//10)*10 ))

# new column decade added to movies dataframe
movies['decade'] = decade_fn(movies['title_year'])

# number of voters grouped by decade
votes_grouped_by_decade = movies.groupby('decade')['num_voted_users'].sum()

# Write your code for creating the data frame df_by_decade here 
df_by_decade = pd.DataFrame(votes_grouped_by_decade)
df_by_decade.reset_index(inplace=True)
df_by_decade

Movie Votes by Decade¶

# Plotting number of voted users vs decade
sns.barplot(x='decade',y='num_voted_users',data = df_by_decade)
plt.xlabel('Decade')
plt.ylabel('No of votes')
plt.title('User Votes vs Movie Release Decade')
plt.show()

This plot shows the number of votes cast for movies of a certain decade. You can notice that there's no many votes for older movies. This could be because most people are just familiar with recent movies. But its interesting to notice that movies made during 2000-2009 are more popular than in or after 2010.

Conclusion¶

We imported movie data from IMDB and inspected the attributes.
We cleaned the data of missing values, uneccessary data.
We found the five worst performing movies to be
- The Host
- Lady Vengeance
- Fateless
- Princess Mononoke
- Steamboy
We also found movies which made the most profit. James Cameron's Avatar is the most profitable movie so far. In fact, his Titanic is in the top 3. Christopher Nolan's Dark Night is the 10th most profitable.One can easily notice that most profitable movies are dominatingly Sci-Fi.
We ranked the top 250 movies by IMDB rating.
Among the best foreign Language films, we see 'Bahubali' and 'Veer Zara' which are quite popular in India.
It's no wonder that Charles Chaplin is the best director. Other noteworthy mentions are Damien Chazelle , the director of Whiplash and La La Land, Christopher Nolan and Alfred Hitchcock.
The most popular genres are Family + SciFi and SciFi Adventure. Looks like a lot of people are still the 'Back to the Future' clan.
We compared the most liked contemporary actors - Meryl Streep, Leonardo Decaprio and Brad Bitt. Turns out that among them, Leonardo Decaprio is both critically acclaimed and an audience favourite.
At the end, we looked at a comparison of popularity of movies by the decade in which they released.

Finally, this analysis has been an endeavour to apply the python skills I acquired to draw insights from data.

	color	director_name	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_2_name	actor_1_facebook_likes	gross	genres	...	num_user_for_reviews	language	country	content_rating	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
0	Color	James Cameron	723.0	178.0	0.0	855.0	Joel David Moore	1000.0	760505847.0	Action\|Adventure\|Fantasy\|Sci-Fi	...	3054.0	English	USA	PG-13	237000000.0	2009.0	936.0	7.9	1.78	33000
1	Color	Gore Verbinski	302.0	169.0	563.0	1000.0	Orlando Bloom	40000.0	309404152.0	Action\|Adventure\|Fantasy	...	1238.0	English	USA	PG-13	300000000.0	2007.0	5000.0	7.1	2.35	0
2	Color	Sam Mendes	602.0	148.0	0.0	161.0	Rory Kinnear	11000.0	200074175.0	Action\|Adventure\|Thriller	...	994.0	English	UK	PG-13	245000000.0	2015.0	393.0	6.8	2.35	85000
3	Color	Christopher Nolan	813.0	164.0	22000.0	23000.0	Christian Bale	27000.0	448130642.0	Action\|Thriller	...	2701.0	English	USA	PG-13	250000000.0	2012.0	23000.0	8.5	2.35	164000
4	NaN	Doug Walker	NaN	NaN	131.0	NaN	Rob Walker	131.0	NaN	Documentary	...	NaN	NaN	NaN	NaN	NaN	NaN	12.0	7.1	NaN	0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
5038	Color	Scott Smith	1.0	87.0	2.0	318.0	Daphne Zuniga	637.0	NaN	Comedy\|Drama	...	6.0	English	Canada	NaN	NaN	2013.0	470.0	7.7	NaN	84
5039	Color	NaN	43.0	43.0	NaN	319.0	Valorie Curry	841.0	NaN	Crime\|Drama\|Mystery\|Thriller	...	359.0	English	USA	TV-14	NaN	NaN	593.0	7.5	16.00	32000
5040	Color	Benjamin Roberds	13.0	76.0	0.0	0.0	Maxwell Moody	0.0	NaN	Drama\|Horror\|Thriller	...	3.0	English	USA	NaN	1400.0	2013.0	0.0	6.3	NaN	16
5041	Color	Daniel Hsia	14.0	100.0	0.0	489.0	Daniel Henney	946.0	10443.0	Comedy\|Drama\|Romance	...	9.0	English	USA	PG-13	NaN	2012.0	719.0	6.3	2.35	660
5042	Color	Jon Gunn	43.0	90.0	16.0	16.0	Brian Herzlinger	86.0	85222.0	Documentary	...	84.0	English	USA	PG	1100.0	2004.0	23.0	6.6	1.85	456

	num_critic_for_reviews	duration	director_facebook_likes	actor_3_facebook_likes	actor_1_facebook_likes	gross	num_voted_users	cast_total_facebook_likes	facenumber_in_poster	num_user_for_reviews	budget	title_year	actor_2_facebook_likes	imdb_score	aspect_ratio	movie_facebook_likes
count	4993.000000	5028.000000	4939.000000	5020.000000	5036.000000	4.159000e+03	5.043000e+03	5043.000000	5030.000000	5022.000000	4.551000e+03	4935.000000	5030.000000	5043.000000	4714.000000	5043.000000
mean	140.194272	107.201074	686.509212	645.009761	6560.047061	4.846841e+07	8.366816e+04	9699.063851	1.371173	272.770808	3.975262e+07	2002.470517	1651.754473	6.442138	2.220403	7525.964505
std	121.601675	25.197441	2813.328607	1665.041728	15020.759120	6.845299e+07	1.384853e+05	18163.799124	2.013576	377.982886	2.061149e+08	12.474599	4042.438863	1.125116	1.385113	19320.445110
min	1.000000	7.000000	0.000000	0.000000	0.000000	1.620000e+02	5.000000e+00	0.000000	0.000000	1.000000	2.180000e+02	1916.000000	0.000000	1.600000	1.180000	0.000000
25%	50.000000	93.000000	7.000000	133.000000	614.000000	5.340988e+06	8.593500e+03	1411.000000	0.000000	65.000000	6.000000e+06	1999.000000	281.000000	5.800000	1.850000	0.000000
50%	110.000000	103.000000	49.000000	371.500000	988.000000	2.551750e+07	3.435900e+04	3090.000000	1.000000	156.000000	2.000000e+07	2005.000000	595.000000	6.600000	2.350000	166.000000
75%	195.000000	118.000000	194.500000	636.000000	11000.000000	6.230944e+07	9.630900e+04	13756.500000	2.000000	326.000000	4.500000e+07	2011.000000	918.000000	7.200000	2.350000	3000.000000
max	813.000000	511.000000	23000.000000	23000.000000	640000.000000	7.605058e+08	1.689764e+06	656730.000000	43.000000	5060.000000	1.221550e+10	2016.000000	137000.000000	9.500000	16.000000	349000.000000

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes
0	James Cameron	723.0	760505847.0	Action\|Adventure\|Fantasy\|Sci-Fi	CCH Pounder	Avatar	886204	3054.0	English	237000000.0	2009.0	7.9	33000
1	Gore Verbinski	302.0	309404152.0	Action\|Adventure\|Fantasy	Johnny Depp	Pirates of the Caribbean: At World's End	471220	1238.0	English	300000000.0	2007.0	7.1	0
2	Sam Mendes	602.0	200074175.0	Action\|Adventure\|Thriller	Christoph Waltz	Spectre	275868	994.0	English	245000000.0	2015.0	6.8	85000
3	Christopher Nolan	813.0	448130642.0	Action\|Thriller	Tom Hardy	The Dark Knight Rises	1144337	2701.0	English	250000000.0	2012.0	8.5	164000
5	Andrew Stanton	462.0	73058679.0	Action\|Adventure\|Sci-Fi	Daryl Sabara	John Carter	212204	738.0	English	263700000.0	2012.0	6.6	24000
...	...	...	...	...	...	...	...	...	...	...	...	...	...
5033	Shane Carruth	143.0	424760.0	Drama\|Sci-Fi\|Thriller	Shane Carruth	Primer	72639	371.0	English	7000.0	2004.0	7.0	19000
5034	Neill Dela Llana	35.0	70071.0	Thriller	Ian Gamazon	Cavite	589	35.0	English	7000.0	2005.0	6.3	74
5035	Robert Rodriguez	56.0	2040920.0	Action\|Crime\|Drama\|Romance\|Thriller	Carlos Gallardo	El Mariachi	52055	130.0	Spanish	7000.0	1992.0	6.9	0
5037	Edward Burns	14.0	4584.0	Comedy\|Drama	Kerry Bishé	Newlyweds	1338	14.0	English	9000.0	2011.0	6.4	413
5042	Jon Gunn	43.0	85222.0	Documentary	John August	My Date with Drew	4285	84.0	English	1100.0	2004.0	6.6	456

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes
0	James Cameron	723.0	760505847.0	Action\|Adventure\|Fantasy\|Sci-Fi	CCH Pounder	Avatar	886204	3054.0	English	237000000.0	2009.0	7.9	33000
1	Gore Verbinski	302.0	309404152.0	Action\|Adventure\|Fantasy	Johnny Depp	Pirates of the Caribbean: At World's End	471220	1238.0	English	300000000.0	2007.0	7.1	0
2	Sam Mendes	602.0	200074175.0	Action\|Adventure\|Thriller	Christoph Waltz	Spectre	275868	994.0	English	245000000.0	2015.0	6.8	85000
3	Christopher Nolan	813.0	448130642.0	Action\|Thriller	Tom Hardy	The Dark Knight Rises	1144337	2701.0	English	250000000.0	2012.0	8.5	164000
5	Andrew Stanton	462.0	73058679.0	Action\|Adventure\|Sci-Fi	Daryl Sabara	John Carter	212204	738.0	English	263700000.0	2012.0	6.6	24000
...	...	...	...	...	...	...	...	...	...	...	...	...	...
5033	Shane Carruth	143.0	424760.0	Drama\|Sci-Fi\|Thriller	Shane Carruth	Primer	72639	371.0	English	7000.0	2004.0	7.0	19000
5034	Neill Dela Llana	35.0	70071.0	Thriller	Ian Gamazon	Cavite	589	35.0	English	7000.0	2005.0	6.3	74
5035	Robert Rodriguez	56.0	2040920.0	Action\|Crime\|Drama\|Romance\|Thriller	Carlos Gallardo	El Mariachi	52055	130.0	Spanish	7000.0	1992.0	6.9	0
5037	Edward Burns	14.0	4584.0	Comedy\|Drama	Kerry Bishé	Newlyweds	1338	14.0	English	9000.0	2011.0	6.4	413
5042	Jon Gunn	43.0	85222.0	Documentary	John August	My Date with Drew	4285	84.0	English	1100.0	2004.0	6.6	456

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes
0	James Cameron	723.0	760.505847	Action\|Adventure\|Fantasy\|Sci-Fi	CCH Pounder	Avatar	886204	3054.0	English	237.0	2009.0	7.9	33000
1	Gore Verbinski	302.0	309.404152	Action\|Adventure\|Fantasy	Johnny Depp	Pirates of the Caribbean: At World's End	471220	1238.0	English	300.0	2007.0	7.1	0
2	Sam Mendes	602.0	200.074175	Action\|Adventure\|Thriller	Christoph Waltz	Spectre	275868	994.0	English	245.0	2015.0	6.8	85000
3	Christopher Nolan	813.0	448.130642	Action\|Thriller	Tom Hardy	The Dark Knight Rises	1144337	2701.0	English	250.0	2012.0	8.5	164000
5	Andrew Stanton	462.0	73.058679	Action\|Adventure\|Sci-Fi	Daryl Sabara	John Carter	212204	738.0	English	263.7	2012.0	6.6	24000
6	Sam Raimi	392.0	336.530303	Action\|Adventure\|Romance	J.K. Simmons	Spider-Man 3	383056	1902.0	English	258.0	2007.0	6.2	0
7	Nathan Greno	324.0	200.807262	Adventure\|Animation\|Comedy\|Family\|Fantasy\|Musi...	Brad Garrett	Tangled	294810	387.0	English	260.0	2010.0	7.8	29000
8	Joss Whedon	635.0	458.991599	Action\|Adventure\|Sci-Fi	Chris Hemsworth	Avengers: Age of Ultron	462669	1117.0	English	250.0	2015.0	7.5	118000
9	David Yates	375.0	301.956980	Adventure\|Family\|Fantasy\|Mystery	Alan Rickman	Harry Potter and the Half-Blood Prince	321795	973.0	English	250.0	2009.0	7.5	10000
10	Zack Snyder	673.0	330.249062	Action\|Adventure\|Sci-Fi	Henry Cavill	Batman v Superman: Dawn of Justice	371639	3018.0	English	250.0	2016.0	6.9	197000

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit	Rank
4640	Cristian Mungiu	233.0	1.185783	Drama	Anamaria Marinca	4 Months, 3 Weeks and 2 Days	44763	172.0	Romanian	0.59	2007.0	7.9	14000	0.595783	246
2492	John Carpenter	318.0	47.000000	Horror\|Thriller	Jamie Lee Curtis	Halloween	157857	1191.0	English	0.30	1978.0	7.9	12000	46.700000	247
4821	John Carpenter	318.0	47.000000	Horror\|Thriller	Jamie Lee Curtis	Halloween	157863	1191.0	English	0.30	1978.0	7.9	12000	46.700000	248
639	Michael Mann	209.0	28.965197	Biography\|Drama\|Thriller	Al Pacino	The Insider	133526	521.0	English	68.00	1999.0	7.9	0	-39.034803	249
3029	David O. Russell	410.0	93.571803	Biography\|Drama\|Sport	Christian Bale	The Fighter	275869	389.0	English	25.00	2010.0	7.9	36000	68.571803	250

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit	Rank
4498	Sergio Leone	181.0	6.100000	Western	Clint Eastwood	The Good, the Bad and the Ugly	503509	780.0	Italian	1.200000	1966.0	8.9	20000	4.900000	7
4029	Fernando Meirelles	214.0	7.563397	Crime\|Drama	Alice Braga	City of God	533200	749.0	Portuguese	3.300000	2002.0	8.7	28000	4.263397	15
4747	Akira Kurosawa	153.0	0.269061	Action\|Adventure\|Drama	Takashi Shimura	Seven Samurai	229012	596.0	Japanese	2.000000	1954.0	8.7	11000	-1.730939	17
2373	Hayao Miyazaki	246.0	10.049886	Adventure\|Animation\|Family\|Fantasy	Bunta Sugawara	Spirited Away	417971	902.0	Japanese	19.000000	2001.0	8.6	28000	-8.950114	26
4921	Majid Majidi	46.0	0.925402	Drama\|Family	Bahare Seddiqi	Children of Heaven	27882	130.0	Persian	0.180000	1997.0	8.5	0	0.745402	43
4259	Florian Henckel von Donnersmarck	215.0	11.284657	Drama\|Thriller	Sebastian Koch	The Lives of Others	259379	407.0	German	2.000000	2006.0	8.5	39000	9.284657	46
1329	S.S. Rajamouli	44.0	6.498000	Action\|Adventure\|Drama\|Fantasy\|War	Tamannaah Bhatia	Baahubali: The Beginning	62756	410.0	Telugu	18.026148	2015.0	8.4	21000	-11.528148	47
4659	Asghar Farhadi	354.0	7.098492	Drama\|Mystery	Shahab Hosseini	A Separation	151812	264.0	Persian	0.500000	2011.0	8.4	48000	6.598492	49
1298	Jean-Pierre Jeunet	242.0	33.201661	Comedy\|Romance	Mathieu Kassovitz	Amélie	534262	1314.0	French	77.000000	2001.0	8.4	39000	-43.798339	52
4105	Chan-wook Park	305.0	2.181290	Drama\|Mystery\|Thriller	Min-sik Choi	Oldboy	356181	809.0	Korean	3.000000	2003.0	8.4	43000	-0.818710	57
2323	Hayao Miyazaki	174.0	2.298191	Adventure\|Animation\|Fantasy	Minnie Driver	Princess Mononoke	221552	570.0	Japanese	2400.000000	1997.0	8.4	11000	-2397.701809	58
2970	Wolfgang Petersen	96.0	11.433134	Adventure\|Drama\|Thriller\|War	Jürgen Prochnow	Das Boot	168203	426.0	German	14.000000	1981.0	8.4	11000	-2.566866	60
2734	Fritz Lang	260.0	0.026435	Drama\|Sci-Fi	Brigitte Helm	Metropolis	111841	413.0	German	6.000000	1927.0	8.3	12000	-5.973565	68
4033	Thomas Vinterberg	349.0	0.610968	Drama	Thomas Bo Larsen	The Hunt	170155	249.0	Danish	3.800000	2012.0	8.3	60000	-3.189032	70
2829	Oliver Hirschbiegel	192.0	5.501940	Biography\|Drama\|History\|War	Thomas Kretschmann	Downfall	248354	564.0	German	13.500000	2004.0	8.3	14000	-7.998060	74
3550	Denis Villeneuve	226.0	6.857096	Drama\|Mystery\|War	Lubna Azabal	Incendies	80429	156.0	French	6.800000	2010.0	8.2	37000	0.057096	88
4000	Juan José Campanella	262.0	20.167424	Drama\|Mystery\|Thriller	Ricardo Darín	The Secret in Their Eyes	131831	231.0	Spanish	2.000000	2009.0	8.2	33000	18.167424	96
2551	Guillermo del Toro	406.0	37.623143	Drama\|Fantasy\|War	Ivana Baquero	Pan's Labyrinth	467234	1083.0	Spanish	13.500000	2006.0	8.2	27000	24.123143	104
2047	Hayao Miyazaki	212.0	4.710455	Adventure\|Animation\|Family\|Fantasy	Christian Bale	Howl's Moving Castle	214091	330.0	Japanese	24.000000	2004.0	8.2	13000	-19.289545	107
3553	José Padilha	142.0	0.008060	Action\|Crime\|Drama\|Thriller	Wagner Moura	Elite Squad	81644	107.0	Portuguese	4.000000	2007.0	8.1	11000	-3.991940	109
3423	Katsuhiro Ôtomo	150.0	0.439162	Action\|Animation\|Sci-Fi	Mitsuo Iwata	Akira	106160	430.0	Japanese	1100.000000	1988.0	8.1	0	-1099.560838	112
2914	Je-kyu Kang	86.0	1.110186	Action\|Drama\|War	Min-sik Choi	Tae Guk Gi: The Brotherhood of War	31943	224.0	Korean	12.800000	2004.0	8.1	0	-11.689814	123
4461	Thomas Vinterberg	98.0	1.647780	Drama	Ulrich Thomsen	The Celebration	65951	258.0	Danish	1.300000	1998.0	8.1	5000	0.347780	125
4267	Alejandro G. Iñárritu	157.0	5.383834	Drama\|Thriller	Adriana Barraza	Amores Perros	173551	361.0	Spanish	2.000000	2000.0	8.1	11000	3.383834	147
2830	Alejandro Amenábar	157.0	2.086345	Biography\|Drama\|Romance	Belén Rueda	The Sea Inside	64556	140.0	Spanish	10.000000	2004.0	8.1	0	-7.913655	152
4284	Ari Folman	231.0	2.283276	Animation\|Biography\|Documentary\|Drama\|History\|War	Ari Folman	Waltz with Bashir	46107	156.0	Hebrew	1.500000	2008.0	8.0	0	0.783276	157
3456	Vincent Paronnaud	242.0	4.443403	Animation\|Biography\|Drama\|War	Catherine Deneuve	Persepolis	70194	158.0	French	7.300000	2007.0	8.0	14000	-2.856597	163
3344	Karan Johar	210.0	4.018695	Adventure\|Drama\|Thriller	Shah Rukh Khan	My Name Is Khan	69759	235.0	Hindi	12.000000	2010.0	8.0	27000	-7.981305	166
4897	Sergio Leone	122.0	3.500000	Action\|Drama\|Western	Clint Eastwood	A Fistful of Dollars	147566	235.0	Italian	0.200000	1964.0	8.0	0	3.300000	182
4144	Walter Salles	71.0	5.595428	Drama	Fernanda Montenegro	Central Station	28951	257.0	Portuguese	2.900000	1998.0	8.0	0	2.695428	200
3264	Michael Haneke	447.0	0.225377	Drama\|Romance	Isabelle Huppert	Amour	70382	190.0	French	8.900000	2012.0	7.9	33000	-8.674623	206
2863	Clint Eastwood	251.0	13.753931	Drama\|History\|War	Yuki Matsuzaki	Letters from Iwo Jima	132149	316.0	Japanese	19.000000	2006.0	7.9	5000	-5.246069	208
2605	Ang Lee	287.0	128.067808	Action\|Drama\|Romance	Chen Chang	Crouching Tiger, Hidden Dragon	217740	1641.0	Mandarin	15.000000	2000.0	7.9	0	113.067808	211
3510	Yash Chopra	29.0	2.921738	Drama\|Musical\|Romance	Shah Rukh Khan	Veer-Zaara	34449	119.0	Hindi	7.000000	2004.0	7.9	2000	-4.078262	227
4415	Fabián Bielinsky	94.0	1.221261	Crime\|Drama\|Thriller	Ricardo Darín	Nine Queens	38215	125.0	Spanish	1.500000	2000.0	7.9	0	-0.278739	238
3677	Christophe Barratier	112.0	3.629758	Drama\|Music	Jean-Baptiste Maunier	The Chorus	44151	110.0	French	5.500000	2004.0	7.9	0	-1.870242	241
2493	Yimou Zhang	283.0	0.084961	Action\|Adventure\|History	Jet Li	Hero	149414	841.0	Mandarin	31.000000	2002.0	7.9	0	-30.915039	242
4640	Cristian Mungiu	233.0	1.185783	Drama	Anamaria Marinca	4 Months, 3 Weeks and 2 Days	44763	172.0	Romanian	0.590000	2007.0	7.9	14000	0.595783	246

	director_name	imdb_score
216	Charles Chaplin	8.6
1675	Tony Kaye	8.6
45	Alfred Hitchcock	8.5
302	Damien Chazelle	8.5
1017	Majid Majidi	8.5
1440	Ron Fricke	8.5
103	Asghar Farhadi	8.4
260	Christopher Nolan	8.4
1035	Marius A. Markevicius	8.4
1371	Richard Marquand	8.4

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit	genre_1	genre_2
1397	Todd Phillips	334.0	277.313371	Comedy	Bradley Cooper	The Hangover	583341	626.0	English	35.0	2009.0	7.8	24000	242.313371	Comedy	Comedy
2916	William Friedkin	304.0	204.565000	Horror	Ellen Burstyn	The Exorcist	284252	1058.0	English	8.0	1973.0	8.0	18000	196.565000	Horror	Horror
440	Todd Phillips	383.0	254.455986	Comedy	Bradley Cooper	The Hangover Part II	375879	402.0	English	80.0	2011.0	6.5	56000	174.455986	Comedy	Comedy
1868	Barry Levinson	100.0	172.825435	Drama	Tom Cruise	Rain Man	383784	331.0	English	25.0	1988.0	8.0	12000	147.825435	Drama	Drama
1875	Tate Taylor	373.0	169.705587	Drama	Emma Stone	The Help	318955	460.0	English	25.0	2011.0	8.1	75000	144.705587	Drama	Drama
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
934	Adam McKay	272.0	2.175312	Comedy	Harrison Ford	Anchorman 2: The Legend Continues	131227	346.0	English	50.0	2013.0	6.3	41000	-47.824688	Comedy	Comedy
4397	Romesh Sharma	4.0	0.129319	Romance	Annabelle Wallis	Dil Jo Bhi Kahey...	257	4.0	English	70.0	2005.0	5.1	9	-69.870681	Romance	Romance
1782	Jacques Perrin	100.0	10.762178	Documentary	Jacques Perrin	Winged Migration	10369	153.0	English	160.0	2001.0	8.0	1000	-149.237822	Documentary	Documentary
2740	Tony Jaa	110.0	0.102055	Action	Nirut Sirichanya	Ong-bak 2	24570	72.0	Thai	300.0	2008.0	6.2	0	-299.897945	Action	Action
3075	Karan Johar	20.0	3.275443	Drama	Shah Rukh Khan	Kabhi Alvida Naa Kehna	13998	264.0	Hindi	700.0	2006.0	6.0	659	-696.724557	Drama	Drama

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit	genre_1	genre_2
1408	David Frankel	208.0	124.732962	Comedy\|Drama\|Romance	Meryl Streep	The Devil Wears Prada	286178	631.0	English	35.0	2006.0	6.8	0	89.732962	Comedy	Drama
1575	Sydney Pollack	66.0	87.100000	Biography\|Drama\|Romance	Meryl Streep	Out of Africa	52339	200.0	English	31.0	1985.0	7.2	0	56.100000	Biography	Drama
1204	Nora Ephron	252.0	94.125426	Biography\|Drama\|Romance	Meryl Streep	Julie & Julia	79264	277.0	English	40.0	2009.0	7.0	13000	54.125426	Biography	Drama
1618	David Frankel	234.0	63.536011	Comedy\|Drama\|Romance	Meryl Streep	Hope Springs	34258	178.0	English	30.0	2012.0	6.3	0	33.536011	Comedy	Drama
410	Nancy Meyers	187.0	112.703470	Comedy\|Drama\|Romance	Meryl Streep	It's Complicated	69860	214.0	English	85.0	2009.0	6.6	0	27.703470	Comedy	Drama

	decade	num_voted_users
0	1920	116392
1	1930	804839
2	1940	230838
3	1950	678336
4	1960	2983442
5	1970	8524102
6	1980	19987476
7	1990	69735679
8	2000	170908676
9	2010	120640994

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit
0	James Cameron	723.0	760.505847	Action\|Adventure\|Fantasy\|Sci-Fi	CCH Pounder	Avatar	886204	3054.0	English	237.0	2009.0	7.9	33000	523.505847
29	Colin Trevorrow	644.0	652.177271	Action\|Adventure\|Sci-Fi\|Thriller	Bryce Dallas Howard	Jurassic World	418214	1290.0	English	150.0	2015.0	7.0	150000	502.177271
26	James Cameron	315.0	658.672302	Drama\|Romance	Leonardo DiCaprio	Titanic	793059	2528.0	English	200.0	1997.0	7.7	26000	458.672302
3024	George Lucas	282.0	460.935665	Action\|Adventure\|Fantasy\|Sci-Fi	Harrison Ford	Star Wars: Episode IV - A New Hope	911097	1470.0	English	11.0	1977.0	8.7	33000	449.935665
3080	Steven Spielberg	215.0	434.949459	Family\|Sci-Fi	Henry Thomas	E.T. the Extra-Terrestrial	281842	515.0	English	10.5	1982.0	7.9	34000	424.449459
794	Joss Whedon	703.0	623.279547	Action\|Adventure\|Sci-Fi	Chris Hemsworth	The Avengers	995415	1722.0	English	220.0	2012.0	8.1	123000	403.279547
17	Joss Whedon	703.0	623.279547	Action\|Adventure\|Sci-Fi	Chris Hemsworth	The Avengers	995415	1722.0	English	220.0	2012.0	8.1	123000	403.279547
509	Roger Allers	186.0	422.783777	Adventure\|Animation\|Drama\|Family\|Musical	Matthew Broderick	The Lion King	644348	656.0	English	45.0	1994.0	8.5	17000	377.783777
240	George Lucas	320.0	474.544677	Action\|Adventure\|Fantasy\|Sci-Fi	Natalie Portman	Star Wars: Episode I - The Phantom Menace	534658	3597.0	English	115.0	1999.0	6.5	13000	359.544677
66	Christopher Nolan	645.0	533.316061	Action\|Crime\|Drama\|Thriller	Christian Bale	The Dark Knight	1676169	4667.0	English	185.0	2008.0	9.0	37000	348.316061

	director_name	num_critic_for_reviews	gross	genres	actor_1_name	movie_title	num_voted_users	num_user_for_reviews	language	budget	title_year	imdb_score	movie_facebook_likes	profit
0	James Cameron	723.0	760.505847	Action\|Adventure\|Fantasy\|Sci-Fi	CCH Pounder	Avatar	886204	3054.0	English	237.0	2009.0	7.9	33000	523.505847
29	Colin Trevorrow	644.0	652.177271	Action\|Adventure\|Sci-Fi\|Thriller	Bryce Dallas Howard	Jurassic World	418214	1290.0	English	150.0	2015.0	7.0	150000	502.177271
26	James Cameron	315.0	658.672302	Drama\|Romance	Leonardo DiCaprio	Titanic	793059	2528.0	English	200.0	1997.0	7.7	26000	458.672302
3024	George Lucas	282.0	460.935665	Action\|Adventure\|Fantasy\|Sci-Fi	Harrison Ford	Star Wars: Episode IV - A New Hope	911097	1470.0	English	11.0	1977.0	8.7	33000	449.935665
3080	Steven Spielberg	215.0	434.949459	Family\|Sci-Fi	Henry Thomas	E.T. the Extra-Terrestrial	281842	515.0	English	10.5	1982.0	7.9	34000	424.449459
794	Joss Whedon	703.0	623.279547	Action\|Adventure\|Sci-Fi	Chris Hemsworth	The Avengers	995415	1722.0	English	220.0	2012.0	8.1	123000	403.279547
509	Roger Allers	186.0	422.783777	Adventure\|Animation\|Drama\|Family\|Musical	Matthew Broderick	The Lion King	644348	656.0	English	45.0	1994.0	8.5	17000	377.783777
240	George Lucas	320.0	474.544677	Action\|Adventure\|Fantasy\|Sci-Fi	Natalie Portman	Star Wars: Episode I - The Phantom Menace	534658	3597.0	English	115.0	1999.0	6.5	13000	359.544677
66	Christopher Nolan	645.0	533.316061	Action\|Crime\|Drama\|Thriller	Christian Bale	The Dark Knight	1676169	4667.0	English	185.0	2008.0	9.0	37000	348.316061
439	Gary Ross	673.0	407.999255	Adventure\|Drama\|Sci-Fi\|Thriller	Jennifer Lawrence	The Hunger Games	701607	1959.0	English	78.0	2012.0	7.3	140000	329.999255