This is my 4th Tableau Dashboard Training Project. We will use this "The Movies Dataset".
The dashboard I made is here (Jump to my Tableau Public).
Contents:
1. What Data to Visualize
2. Dashboard Framework
3. Tableau's String Functions
4. How to prep data where a column has several labels: UNION them.
First of all, let's see what the data we will use.
We're going to use only movies_metadata.csv which contains 45,000 records of movies. It has columns such as Genres, name of production company, release date, language, budget, revenue and so on.
The scenario for this training is here: we are in a marketing department of a company. You are using movie marketing agency and you ask them to put our product in movies like the Ford Mustang in the Movie: John Wick, in order to increase your bland awareness.
Since your budget is limited you want to know what kind of movies can be the best for our sponsoring in terms of their trendiness and cost efficiency. You will make a report about what movie industry we should sponsor.
Then our dashboard framework is here.
Our "Task" is now divided into 2 components: Big goal and Approach. Our big goal is to create more impressions/bland awareness of our product. Also we care of our budget to make our profit more. To perform this big goal, we want to sponsor only some movie industries/productions that have more audiences but don't have much budget.
I'm not used to the industry but let's say that if the budget is high, a sponsoring fee will be also high because they need more budget. So in this case we care of budgets as well as revenue which could indicate the number of audiences.
Then we define our stakeholders are these 3 people. In the end you have to say where they should go.
The rest 2 components are about what to visualize. We'll identify what movie genres and what movie production are good. In order to measure their performance, we will see number of movies, revenues and budgets mainly.
This is what we will deal with.
(I had a conversation with my friend about the data. The data seems to be JSON file but the data is provided as CSV file. If you can convert it into JSON and import it into Tableau, it could be much easier.)
Then let's see the data. Honestly, when I opened the data, I was so confused.
For example, let's take a look at "genres". It looks like this.
[{'id': 16, 'name': 'Animation'}, {'id': 35, 'name': 'Comedy'}, {'id': 10751, 'name': 'Family'}]
The problem here is, one column includes more than 2 genres. If we want to calculate the number of videos of "Animation" movies, we need to include all records that have "Animation" in genre column. However, if we put the "genre" in the column in Tableau worksheets, I think we will not have what we want to have. We want to have a data-viz like the picture below.
The same thing is happening on some other columns so we need to do some data prep.
Let's deal with "genres" first. As you saw above, We have our genres after "'name': " and each information is separated with "{}". We want to split the genre with "}, {" and delete strings before the genres. The code for it is here.
genres - Split 1
REPLACE( SPLIT( TRIM( SPLIT( [genres], "}, {", 1 ) ) ,"'name': ",2) ,"}]","")
Let me explain about the code. I used Tableau's Custom Split first to separate genres with "}, {".
Then we apply SPLIT() function to delete strings before "'name': ", and we use REPLACE() to delete "}]".
We use the code for all genres we obtain after Tableau's Custom Split. The same thing can be done with other columns. For example, since we will consider the production companies, I used the code here:
production_companies - Split 1
SPLIT( SPLIT( TRIM( SPLIT( [production_companies], "}, {", 1 ) ) ,"'name': ",2) ,',',1)
And to make the situation simple while the dataset includes movies all over the world, I created two calculation fields to extract movies spoken in English and produced in America.
English Movie
CONTAINS([spoken_languages],'English')
American Movie
CONTAINS([production_countries],'United States of America')
Since we will also use movies that already released to use the revenue data, we set the Data Source Filters like this.
Then, we export the data to do another data prep.
I used KNIME for the prep because I had an encoding problem with R.
For those not used to KNIME, this is a opensource software for statistics and predictive modeling. It also allows us to manipulate data.
In order to assign some genres, I decided to UNION the data with every split genres.
Tu conduct UNION, we need to trim the data like this.
First we create the data that only has genres - split 1 and rename the column as Genre.
We do the same thing to genre 2 to genre 7 and then we UNION them.
The same process has been done on the production company column too.
The preped data is in my Dropbox here. You can download these 2 files.
Then we set our Tableau Data Source like this.
In the next post I'll explain how to create my dashboard.
Today I explained how to deal with data that has a column including more than 2 labels like "genres" column.
I had been dealing with how to prep it and my answer was to UNION the data. I may find a better way in the future and I'll share it.
I hope you enjoyed this post. See you soon.
Yoshi