Stucked at home, Endless scrolling on my phone led me to a Google Card: visualization with Streamlit, excited for something new, I Clicked.
I decided to see how data visualization can be easy using streamlit. Now I needed to compute some dataset and display so Pandas was my go to library. But what kind of data would be cool to analyze?
I decided to analyze my Uber data, turns out it would take days for my Uber to be prepared and sent to me. But I was too excited and I was in the zone. So I looked up my Netflix View Activity and lucky I downloaded it.
My Netflix Data was just with 2 Columns [Title, Date]
So I decided to see How many Netflix Titles I watch every month and my Visualization will come with a Drop Down SelectBox with Distinct Years.
Now I Installed streamlit using
Pip install streamlit
Create your py file and streamlit run yourpyfile.py
This runs the application for you
Now what should my py file have?
Import of pandas, streamlit, calendar & altair libraries
Next I read my data using pandas read_csv into a dataframe, performed my count and my output data was Month, MonthID and Count
st.line_chart(data) this plot a line chart using the index as main id and others columns as variables which is not what I was looking for
st.line_chart(data) but with the monthID as the index that gave me the monthids as the x axis and count on the Y axis but again what does 1,2,3 Mean they should be some attributes to explain what 1..12 mean.
So this is where altair library comes in play If you want to customize your chart and add axis labels and a title, you’re better off creating an Altair chart and displaying it on Streamlit using st.altair (or Matplotlib, or Plotly, or Bokeh etc…), so there’s hardly point in adding arguments to line_chart for each property to specify.
So I created a bar chart using altair and displaying it on my streamlit application.
Below is my code:
import streamlit as st import numpy as np import pandas as pd import calendar import altair as alt st.title('Monthly Netflix Views') movies = pd.read_csv('E:/Learning WorkSpace/Pentaho Data Integration/Other Files/NetflixViewingHistory.csv') movies['MonthID'] = pd.DatetimeIndex(movies['Date'],dayfirst=True).month movies['Month'] = movies['MonthID'].apply(lambda x: calendar.month_abbr[x]) movies['Year'] = pd.DatetimeIndex(movies['Date']).year Year = movies.Year.unique() option = st.selectbox('Select Year!',Year) st.write('You selected:', option) CountMoviesPerMonth = movies.groupby(['Month','MonthID','Year']).size().reset_index(name='count') filterCount = CountMoviesPerMonth[CountMoviesPerMonth.Year.eq(option)] data = filterCount[['Month','MonthID','count']] print(data) linec = alt.Chart(data).mark_bar().encode( x=alt.X('Month',title='Month',sort=alt.SortField('MonthID')) y=alt.Y('count',title='# Of Title Per Month'), tooltip=['Month', 'count'] ) st.altair_chart(linec, use_container_width=True)
I created a simple filter which is distinct Years from my data, which is now my filter which will be passed to my data for visualization.
Now let’s see how it renders for 2020 Selected Data
and for 2019
Turns out in October I watched 251 Titles (movies/series episodes), and in 2020 I can see a decline in Titles Streamed.
Do comment a series to watch, let me send those numbers up as these are rookies numbers.
Thank You
Sohail Izebhijie
Helical IT Solutions Pvt Ltd
Best Open Source Business Intelligence Software Helical Insight Here
A Business Intelligence Framework
Best Open Source Business Intelligence Software Helical Insight is Here