The dataset contains customer-level information for a span of four consecutive months - June, July, August and September. The months are encoded as 6, 7, 8 and 9, respectively.
The business objective is to predict the churn in the last (i.e. the ninth) month using the data (features) from the first three months.
This is a classification problem, where we need to predict whether the customers is about to churn or not. We have carried out Baseline Logistic Regression, then Logistic Regression with PCA, PCA + Random Forest, PCA + XGBoost.
Model 1 : Logistic Regression with RFE & Manual Elimination ( Interpretable Model )
Most important predictors of Churn , in order of importance and their coefficients are as follows :
PCA: PCA : 95% of variance in the train set can be explained by first 16 principal components and 100% of variance is explained by the first 45 principal components.
Model 2 : PCA + Logistic Regression
Train Performance :
Accuracy : 0.627
Sensitivity / True Positive Rate / Recall : 0.918
Specificity / True Negative Rate : 0.599
Precision / Positive Predictive Value : 0.179
F1-score : 0.3
Test Performance :
Accuracy : 0.086
Sensitivity / True Positive Rate / Recall : 1.0
Specificity / True Negative Rate : 0.0
Precision / Positive Predictive Value : 0.086
F1-score : 0.158
Model 3 : PCA + Random Forest Classifier
Train Performance :
Accuracy : 0.882
Sensitivity / True Positive Rate / Recall : 0.816
Specificity / True Negative Rate : 0.888
Precision / Positive Predictive Value : 0.408
F1-score : 0.544
Test Performance :
Accuracy : 0.86
Sensitivity / True Positive Rate / Recall : 0.80
Specificity / True Negative Rate : 0.78
Precision / Positive Predictive Value :0.37
F1-score :0.51
Model 4 : PCA + XGBoost
Train Performance :
Accuracy : 0.873
Sensitivity / True Positive Rate / Recall : 0.887
Specificity / True Negative Rate : 0.872
Precision / Positive Predictive Value : 0.396
F1-score : 0.548
Test Performance :
Accuracy : 0.086
Sensitivity / True Positive Rate / Recall : 1.0
Specificity / True Negative Rate : 0.0
Precision / Positive Predictive Value : 0.086
F1-score : 0.158
Following are the strongest indicators of churn
Customers who churn show lower average monthly local incoming calls from fixed line in the action period by 1.27 standard deviations , compared to users who don't churn , when all other factors are held constant. This is the strongest indicator of churn. Customers who churn show lower number of recharges done in action period by 1.20 standard deviations, when all other factors are held constant. This is the second strongest indicator of churn. Further customers who churn have done 0.6 standard deviations higher recharge than non-churn customers. This factor when coupled with above factors is a good indicator of churn. Customers who churn are more likely to be users of 'monthly 2g package-0 / monthly 3g package-0' in action period (approximately 0.3 std deviations higher than other packages), when all other factors are held constant.
Based on the above indicators the recommendations to the telecom company are :
Concentrate on users with 1.27 std devations lower than average incoming calls from fixed line. They are most likely to churn. Concentrate on users who recharge less number of times ( less than 1.2 std deviations compared to avg) in the 8th month. They are second most likely to churn. Models with high sensitivity are the best for predicting churn. Use the PCA + Logistic Regression model to predict churn. It has an ROC score of 0.87, test sensitivity of 100%.
# Importing Necessary Libraries.
import numpy as np, pandas as pd, matplotlib.pyplot as plt, seaborn as sns
import warnings
warnings.filterwarnings('ignore')
# Setting max display columns and rows.
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)
# Reading Dataset into a DataFrame.
data=pd.read_csv('telecom_churn_data.csv')
data.head()
mobile_number | circle_id | loc_og_t2o_mou | std_og_t2o_mou | loc_ic_t2o_mou | last_date_of_month_6 | last_date_of_month_7 | last_date_of_month_8 | last_date_of_month_9 | arpu_6 | arpu_7 | arpu_8 | arpu_9 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | onnet_mou_9 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | offnet_mou_9 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_ic_mou_9 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | roam_og_mou_9 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2t_mou_9 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2m_mou_9 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2f_mou_9 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_t2c_mou_9 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | loc_og_mou_9 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2t_mou_9 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2m_mou_9 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_t2f_mou_9 | std_og_t2c_mou_6 | std_og_t2c_mou_7 | std_og_t2c_mou_8 | std_og_t2c_mou_9 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | std_og_mou_9 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | isd_og_mou_9 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | spl_og_mou_9 | og_others_6 | og_others_7 | og_others_8 | og_others_9 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | total_og_mou_9 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2t_mou_9 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2m_mou_9 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_t2f_mou_9 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | loc_ic_mou_9 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2t_mou_9 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2m_mou_9 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_t2f_mou_9 | std_ic_t2o_mou_6 | std_ic_t2o_mou_7 | std_ic_t2o_mou_8 | std_ic_t2o_mou_9 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | std_ic_mou_9 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | total_ic_mou_9 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | spl_ic_mou_9 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | isd_ic_mou_9 | ic_others_6 | ic_others_7 | ic_others_8 | ic_others_9 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_num_9 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | total_rech_amt_9 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | max_rech_amt_9 | date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | last_day_rch_amt_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | total_rech_data_6 | total_rech_data_7 | total_rech_data_8 | total_rech_data_9 | max_rech_data_6 | max_rech_data_7 | max_rech_data_8 | max_rech_data_9 | count_rech_2g_6 | count_rech_2g_7 | count_rech_2g_8 | count_rech_2g_9 | count_rech_3g_6 | count_rech_3g_7 | count_rech_3g_8 | count_rech_3g_9 | av_rech_amt_data_6 | av_rech_amt_data_7 | av_rech_amt_data_8 | av_rech_amt_data_9 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_2g_mb_9 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | vol_3g_mb_9 | arpu_3g_6 | arpu_3g_7 | arpu_3g_8 | arpu_3g_9 | arpu_2g_6 | arpu_2g_7 | arpu_2g_8 | arpu_2g_9 | night_pck_user_6 | night_pck_user_7 | night_pck_user_8 | night_pck_user_9 | monthly_2g_6 | monthly_2g_7 | monthly_2g_8 | monthly_2g_9 | sachet_2g_6 | sachet_2g_7 | sachet_2g_8 | sachet_2g_9 | monthly_3g_6 | monthly_3g_7 | monthly_3g_8 | monthly_3g_9 | sachet_3g_6 | sachet_3g_7 | sachet_3g_8 | sachet_3g_9 | fb_user_6 | fb_user_7 | fb_user_8 | fb_user_9 | aon | aug_vbc_3g | jul_vbc_3g | jun_vbc_3g | sep_vbc_3g | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 7000842753 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 197.385 | 214.816 | 213.803 | 21.100 | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | 0.16 | NaN | NaN | NaN | 4.13 | NaN | NaN | NaN | 1.15 | NaN | NaN | NaN | 5.44 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.00 | NaN | 0.00 | 0.00 | 5.44 | 0.00 | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | NaN | 4 | 3 | 2 | 6 | 362 | 252 | 252 | 0 | 252 | 252 | 252 | 0 | 6/21/2014 | 7/16/2014 | 8/8/2014 | 9/28/2014 | 252 | 252 | 252 | 0 | 6/21/2014 | 7/16/2014 | 8/8/2014 | NaN | 1.0 | 1.0 | 1.0 | NaN | 252.0 | 252.0 | 252.0 | NaN | 0.0 | 0.0 | 0.0 | NaN | 1.0 | 1.0 | 1.0 | NaN | 252.0 | 252.0 | 252.0 | NaN | 30.13 | 1.32 | 5.75 | 0.0 | 83.57 | 150.76 | 109.61 | 0.00 | 212.17 | 212.17 | 212.17 | NaN | 212.17 | 212.17 | 212.17 | NaN | 0.0 | 0.0 | 0.0 | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1.0 | 1.0 | 1.0 | NaN | 968 | 30.4 | 0.0 | 101.20 | 3.58 |
1 | 7001865778 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 34.047 | 355.074 | 268.321 | 86.285 | 24.11 | 78.68 | 7.68 | 18.34 | 15.74 | 99.84 | 304.76 | 53.76 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 23.88 | 74.56 | 7.68 | 18.34 | 11.51 | 75.94 | 291.86 | 53.76 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 2.91 | 0.00 | 0.00 | 35.39 | 150.51 | 299.54 | 72.11 | 0.23 | 4.11 | 0.00 | 0.00 | 0.00 | 0.46 | 0.13 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.23 | 4.58 | 0.13 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 4.68 | 23.43 | 12.76 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 40.31 | 178.53 | 312.44 | 72.11 | 1.61 | 29.91 | 29.23 | 116.09 | 17.48 | 65.38 | 375.58 | 56.93 | 0.00 | 8.93 | 3.61 | 0.00 | 19.09 | 104.23 | 408.43 | 173.03 | 0.00 | 0.00 | 2.35 | 0.00 | 5.90 | 0.00 | 12.49 | 15.01 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 5.90 | 0.00 | 14.84 | 15.01 | 26.83 | 104.23 | 423.28 | 188.04 | 0.00 | 0.0 | 0.0 | 0.00 | 1.83 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 4 | 9 | 11 | 5 | 74 | 384 | 283 | 121 | 44 | 154 | 65 | 50 | 6/29/2014 | 7/31/2014 | 8/28/2014 | 9/30/2014 | 44 | 23 | 30 | 0 | NaN | 7/25/2014 | 8/10/2014 | NaN | NaN | 1.0 | 2.0 | NaN | NaN | 154.0 | 25.0 | NaN | NaN | 1.0 | 2.0 | NaN | NaN | 0.0 | 0.0 | NaN | NaN | 154.0 | 50.0 | NaN | 0.00 | 108.07 | 365.47 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | 0.00 | 0.00 | NaN | NaN | 28.61 | 7.60 | NaN | NaN | 0.0 | 0.0 | NaN | 0 | 1 | 0 | 0 | 0 | 0 | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | 1.0 | 1.0 | NaN | 1006 | 0.0 | 0.0 | 0.00 | 0.00 |
2 | 7001625959 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 167.690 | 189.058 | 210.226 | 290.714 | 11.54 | 55.24 | 37.26 | 74.81 | 143.33 | 220.59 | 208.36 | 118.91 | 0.0 | 0.00 | 0.00 | 38.49 | 0.0 | 0.00 | 0.00 | 70.94 | 7.19 | 28.74 | 13.58 | 14.39 | 29.34 | 16.86 | 38.46 | 28.16 | 24.11 | 21.79 | 15.61 | 22.24 | 0.0 | 135.54 | 45.76 | 0.48 | 60.66 | 67.41 | 67.66 | 64.81 | 4.34 | 26.49 | 22.58 | 8.76 | 41.81 | 67.41 | 75.53 | 9.28 | 1.48 | 14.76 | 22.83 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 47.64 | 108.68 | 120.94 | 18.04 | 0.0 | 0.0 | 0.0 | 0.0 | 46.56 | 236.84 | 96.84 | 42.08 | 0.45 | 0.0 | 0.0 | 0.0 | 155.33 | 412.94 | 285.46 | 124.94 | 115.69 | 71.11 | 67.46 | 148.23 | 14.38 | 15.44 | 38.89 | 38.98 | 99.48 | 122.29 | 49.63 | 158.19 | 229.56 | 208.86 | 155.99 | 345.41 | 72.41 | 71.29 | 28.69 | 49.44 | 45.18 | 177.01 | 167.09 | 118.18 | 21.73 | 58.34 | 43.23 | 3.86 | 0.0 | 0.0 | 0.0 | 0.0 | 139.33 | 306.66 | 239.03 | 171.49 | 370.04 | 519.53 | 395.03 | 517.74 | 0.21 | 0.0 | 0.0 | 0.45 | 0.00 | 0.85 | 0.0 | 0.01 | 0.93 | 3.14 | 0.0 | 0.36 | 5 | 4 | 2 | 7 | 168 | 315 | 116 | 358 | 86 | 200 | 86 | 100 | 6/17/2014 | 7/24/2014 | 8/14/2014 | 9/29/2014 | 0 | 200 | 86 | 0 | NaN | NaN | NaN | 9/17/2014 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 46.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 46.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 8.42 | NaN | NaN | NaN | 2.84 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0.0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | 1.0 | 1103 | 0.0 | 0.0 | 4.17 | 0.00 |
3 | 7001204172 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 221.338 | 251.102 | 508.054 | 389.500 | 99.91 | 54.39 | 310.98 | 241.71 | 123.31 | 109.01 | 71.68 | 113.54 | 0.0 | 54.86 | 44.38 | 0.00 | 0.0 | 28.09 | 39.04 | 0.00 | 73.68 | 34.81 | 10.61 | 15.49 | 107.43 | 83.21 | 22.46 | 65.46 | 1.91 | 0.65 | 4.91 | 2.06 | 0.0 | 0.00 | 0.00 | 0.00 | 183.03 | 118.68 | 37.99 | 83.03 | 26.23 | 14.89 | 289.58 | 226.21 | 2.99 | 1.73 | 6.53 | 9.99 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 29.23 | 16.63 | 296.11 | 236.21 | 0.0 | 0.0 | 0.0 | 0.0 | 10.96 | 0.00 | 18.09 | 43.29 | 0.00 | 0.0 | 0.0 | 0.0 | 223.23 | 135.31 | 352.21 | 362.54 | 62.08 | 19.98 | 8.04 | 41.73 | 113.96 | 64.51 | 20.28 | 52.86 | 57.43 | 27.09 | 19.84 | 65.59 | 233.48 | 111.59 | 48.18 | 160.19 | 43.48 | 66.44 | 0.00 | 129.84 | 1.33 | 38.56 | 4.94 | 13.98 | 1.18 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 45.99 | 105.01 | 4.94 | 143.83 | 280.08 | 216.61 | 53.13 | 305.38 | 0.59 | 0.0 | 0.0 | 0.55 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.80 | 10 | 11 | 18 | 14 | 230 | 310 | 601 | 410 | 60 | 50 | 50 | 50 | 6/28/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 30 | 50 | 50 | 30 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | NaN | NaN | NaN | NaN | 2491 | 0.0 | 0.0 | 0.00 | 0.00 |
4 | 7000142493 | 109 | 0.0 | 0.0 | 0.0 | 6/30/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | 261.636 | 309.876 | 238.174 | 163.426 | 50.31 | 149.44 | 83.89 | 58.78 | 76.96 | 91.88 | 124.26 | 45.81 | 0.0 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 50.31 | 149.44 | 83.89 | 58.78 | 67.64 | 91.88 | 124.26 | 37.89 | 0.00 | 0.00 | 0.00 | 1.93 | 0.0 | 0.00 | 0.00 | 0.00 | 117.96 | 241.33 | 208.16 | 98.61 | 0.00 | 0.00 | 0.00 | 0.00 | 9.31 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 9.31 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 5.98 | 0.00 | 0.0 | 0.0 | 0.0 | 127.28 | 241.33 | 208.16 | 104.59 | 105.68 | 88.49 | 233.81 | 154.56 | 106.84 | 109.54 | 104.13 | 48.24 | 1.50 | 0.00 | 0.00 | 0.00 | 214.03 | 198.04 | 337.94 | 202.81 | 0.00 | 0.00 | 0.86 | 2.31 | 1.93 | 0.25 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 1.93 | 0.25 | 0.86 | 2.31 | 216.44 | 198.29 | 338.81 | 205.31 | 0.00 | 0.0 | 0.0 | 0.18 | 0.00 | 0.00 | 0.0 | 0.00 | 0.48 | 0.00 | 0.0 | 0.00 | 5 | 6 | 3 | 4 | 196 | 350 | 287 | 200 | 56 | 110 | 110 | 50 | 6/26/2014 | 7/28/2014 | 8/9/2014 | 9/28/2014 | 50 | 110 | 110 | 50 | 6/4/2014 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 56.0 | NaN | NaN | NaN | 1.0 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 56.0 | NaN | NaN | NaN | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | NaN | NaN | NaN | 0.00 | NaN | NaN | NaN | 0.0 | NaN | NaN | NaN | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0.0 | NaN | NaN | NaN | 1526 | 0.0 | 0.0 | 0.00 | 0.00 |
# Checking information about data.
print(data.info())
def metadata_matrix(data) :
return pd.DataFrame({
'Datatype' : data.dtypes.astype(str),
'Non_Null_Count': data.count(axis = 0).astype(int),
'Null_Count': data.isnull().sum().astype(int),
'Null_Percentage': round(data.isnull().sum()/len(data) * 100 , 2),
'Unique_Values_Count': data.nunique().astype(int)
}).sort_values(by='Null_Percentage', ascending=False)
metadata_matrix(data)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 99999 entries, 0 to 99998 Columns: 226 entries, mobile_number to sep_vbc_3g dtypes: float64(179), int64(35), object(12) memory usage: 172.4+ MB None
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
arpu_3g_6 | float64 | 25153 | 74846 | 74.85 | 7418 |
night_pck_user_6 | float64 | 25153 | 74846 | 74.85 | 2 |
total_rech_data_6 | float64 | 25153 | 74846 | 74.85 | 37 |
arpu_2g_6 | float64 | 25153 | 74846 | 74.85 | 6990 |
max_rech_data_6 | float64 | 25153 | 74846 | 74.85 | 48 |
fb_user_6 | float64 | 25153 | 74846 | 74.85 | 2 |
av_rech_amt_data_6 | float64 | 25153 | 74846 | 74.85 | 887 |
date_of_last_rech_data_6 | object | 25153 | 74846 | 74.85 | 30 |
count_rech_2g_6 | float64 | 25153 | 74846 | 74.85 | 31 |
count_rech_3g_6 | float64 | 25153 | 74846 | 74.85 | 25 |
date_of_last_rech_data_7 | object | 25571 | 74428 | 74.43 | 31 |
total_rech_data_7 | float64 | 25571 | 74428 | 74.43 | 42 |
fb_user_7 | float64 | 25571 | 74428 | 74.43 | 2 |
max_rech_data_7 | float64 | 25571 | 74428 | 74.43 | 48 |
night_pck_user_7 | float64 | 25571 | 74428 | 74.43 | 2 |
count_rech_2g_7 | float64 | 25571 | 74428 | 74.43 | 36 |
av_rech_amt_data_7 | float64 | 25571 | 74428 | 74.43 | 961 |
arpu_2g_7 | float64 | 25571 | 74428 | 74.43 | 6586 |
count_rech_3g_7 | float64 | 25571 | 74428 | 74.43 | 28 |
arpu_3g_7 | float64 | 25571 | 74428 | 74.43 | 7246 |
total_rech_data_9 | float64 | 25922 | 74077 | 74.08 | 37 |
count_rech_3g_9 | float64 | 25922 | 74077 | 74.08 | 27 |
fb_user_9 | float64 | 25922 | 74077 | 74.08 | 2 |
max_rech_data_9 | float64 | 25922 | 74077 | 74.08 | 50 |
arpu_3g_9 | float64 | 25922 | 74077 | 74.08 | 8063 |
date_of_last_rech_data_9 | object | 25922 | 74077 | 74.08 | 30 |
night_pck_user_9 | float64 | 25922 | 74077 | 74.08 | 2 |
arpu_2g_9 | float64 | 25922 | 74077 | 74.08 | 6795 |
count_rech_2g_9 | float64 | 25922 | 74077 | 74.08 | 32 |
av_rech_amt_data_9 | float64 | 25922 | 74077 | 74.08 | 945 |
total_rech_data_8 | float64 | 26339 | 73660 | 73.66 | 46 |
arpu_3g_8 | float64 | 26339 | 73660 | 73.66 | 7787 |
fb_user_8 | float64 | 26339 | 73660 | 73.66 | 2 |
night_pck_user_8 | float64 | 26339 | 73660 | 73.66 | 2 |
av_rech_amt_data_8 | float64 | 26339 | 73660 | 73.66 | 973 |
max_rech_data_8 | float64 | 26339 | 73660 | 73.66 | 50 |
count_rech_3g_8 | float64 | 26339 | 73660 | 73.66 | 29 |
arpu_2g_8 | float64 | 26339 | 73660 | 73.66 | 6652 |
count_rech_2g_8 | float64 | 26339 | 73660 | 73.66 | 34 |
date_of_last_rech_data_8 | object | 26339 | 73660 | 73.66 | 31 |
ic_others_9 | float64 | 92254 | 7745 | 7.75 | 1923 |
std_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 26553 |
std_og_t2c_mou_9 | float64 | 92254 | 7745 | 7.75 | 1 |
isd_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 5557 |
std_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 11266 |
isd_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 1255 |
spl_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 4095 |
spl_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 384 |
og_others_9 | float64 | 92254 | 7745 | 7.75 | 235 |
loc_ic_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 12993 |
std_ic_t2o_mou_9 | float64 | 92254 | 7745 | 7.75 | 1 |
loc_ic_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 21484 |
std_ic_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 3090 |
loc_ic_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 7091 |
loc_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 27697 |
std_ic_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 8933 |
std_og_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 2295 |
std_og_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 17934 |
std_ic_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 6157 |
loc_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 25376 |
roam_og_mou_9 | float64 | 92254 | 7745 | 7.75 | 5882 |
loc_og_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 20141 |
loc_og_t2f_mou_9 | float64 | 92254 | 7745 | 7.75 | 3758 |
roam_ic_mou_9 | float64 | 92254 | 7745 | 7.75 | 4827 |
offnet_mou_9 | float64 | 92254 | 7745 | 7.75 | 30077 |
loc_og_t2c_mou_9 | float64 | 92254 | 7745 | 7.75 | 2332 |
loc_og_t2t_mou_9 | float64 | 92254 | 7745 | 7.75 | 12949 |
std_og_t2m_mou_9 | float64 | 92254 | 7745 | 7.75 | 19052 |
onnet_mou_9 | float64 | 92254 | 7745 | 7.75 | 23565 |
onnet_mou_8 | float64 | 94621 | 5378 | 5.38 | 24089 |
std_ic_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 6352 |
std_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 11662 |
loc_ic_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 13346 |
roam_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 6504 |
std_ic_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 9304 |
loc_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 28200 |
std_ic_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 3051 |
roam_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 5315 |
std_ic_t2o_mou_8 | float64 | 94621 | 5378 | 5.38 | 1 |
loc_og_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 13336 |
loc_ic_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 7097 |
offnet_mou_8 | float64 | 94621 | 5378 | 5.38 | 30908 |
loc_ic_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 21886 |
loc_og_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 20544 |
isd_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 1276 |
ic_others_8 | float64 | 94621 | 5378 | 5.38 | 1896 |
og_others_8 | float64 | 94621 | 5378 | 5.38 | 216 |
spl_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 102 |
loc_og_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 3807 |
std_og_t2m_mou_8 | float64 | 94621 | 5378 | 5.38 | 19786 |
spl_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 4390 |
std_og_t2c_mou_8 | float64 | 94621 | 5378 | 5.38 | 1 |
isd_ic_mou_8 | float64 | 94621 | 5378 | 5.38 | 5844 |
loc_og_t2c_mou_8 | float64 | 94621 | 5378 | 5.38 | 2516 |
std_og_t2f_mou_8 | float64 | 94621 | 5378 | 5.38 | 2333 |
std_og_t2t_mou_8 | float64 | 94621 | 5378 | 5.38 | 18291 |
loc_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 25990 |
std_og_mou_8 | float64 | 94621 | 5378 | 5.38 | 27491 |
date_of_last_rech_9 | object | 95239 | 4760 | 4.76 | 30 |
std_ic_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 3125 |
ic_others_6 | float64 | 96062 | 3937 | 3.94 | 1817 |
isd_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 5521 |
std_ic_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 9308 |
std_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 11646 |
spl_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 84 |
std_ic_t2o_mou_6 | float64 | 96062 | 3937 | 3.94 | 1 |
loc_ic_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 7250 |
loc_ic_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 13540 |
std_og_t2c_mou_6 | float64 | 96062 | 3937 | 3.94 | 1 |
std_og_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 2450 |
std_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 27502 |
std_og_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 19734 |
isd_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 1381 |
std_og_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 18244 |
spl_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 3965 |
loc_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 26372 |
og_others_6 | float64 | 96062 | 3937 | 3.94 | 1018 |
loc_og_t2c_mou_6 | float64 | 96062 | 3937 | 3.94 | 2235 |
loc_og_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 20905 |
loc_og_t2f_mou_6 | float64 | 96062 | 3937 | 3.94 | 3860 |
loc_og_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 13539 |
roam_og_mou_6 | float64 | 96062 | 3937 | 3.94 | 8038 |
std_ic_t2t_mou_6 | float64 | 96062 | 3937 | 3.94 | 6279 |
onnet_mou_6 | float64 | 96062 | 3937 | 3.94 | 24313 |
loc_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 28569 |
offnet_mou_6 | float64 | 96062 | 3937 | 3.94 | 31140 |
roam_ic_mou_6 | float64 | 96062 | 3937 | 3.94 | 6512 |
loc_ic_t2m_mou_6 | float64 | 96062 | 3937 | 3.94 | 22065 |
loc_og_t2c_mou_7 | float64 | 96140 | 3859 | 3.86 | 2426 |
roam_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 5230 |
loc_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 26091 |
loc_og_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 13411 |
offnet_mou_7 | float64 | 96140 | 3859 | 3.86 | 31023 |
loc_og_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 3863 |
std_og_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 18567 |
std_ic_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 6481 |
onnet_mou_7 | float64 | 96140 | 3859 | 3.86 | 24336 |
std_og_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 20018 |
loc_og_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 20637 |
std_og_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 2391 |
roam_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 6639 |
std_og_t2c_mou_7 | float64 | 96140 | 3859 | 3.86 | 1 |
std_ic_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 9464 |
isd_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 1380 |
ic_others_7 | float64 | 96140 | 3859 | 3.86 | 2002 |
loc_ic_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 7395 |
loc_ic_t2m_mou_7 | float64 | 96140 | 3859 | 3.86 | 21918 |
std_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 11889 |
loc_ic_t2t_mou_7 | float64 | 96140 | 3859 | 3.86 | 13511 |
std_ic_t2f_mou_7 | float64 | 96140 | 3859 | 3.86 | 3209 |
loc_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 28390 |
spl_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 107 |
og_others_7 | float64 | 96140 | 3859 | 3.86 | 187 |
spl_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 4396 |
isd_ic_mou_7 | float64 | 96140 | 3859 | 3.86 | 5789 |
std_ic_t2o_mou_7 | float64 | 96140 | 3859 | 3.86 | 1 |
std_og_mou_7 | float64 | 96140 | 3859 | 3.86 | 27951 |
date_of_last_rech_8 | object | 96377 | 3622 | 3.62 | 31 |
date_of_last_rech_7 | object | 98232 | 1767 | 1.77 | 31 |
last_date_of_month_9 | object | 98340 | 1659 | 1.66 | 1 |
date_of_last_rech_6 | object | 98392 | 1607 | 1.61 | 30 |
last_date_of_month_8 | object | 98899 | 1100 | 1.10 | 1 |
loc_ic_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
std_og_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
loc_og_t2o_mou | float64 | 98981 | 1018 | 1.02 | 1 |
last_date_of_month_7 | object | 99398 | 601 | 0.60 | 1 |
sachet_3g_8 | int64 | 99999 | 0 | 0.00 | 29 |
jul_vbc_3g | float64 | 99999 | 0 | 0.00 | 14162 |
aug_vbc_3g | float64 | 99999 | 0 | 0.00 | 14676 |
aon | int64 | 99999 | 0 | 0.00 | 3489 |
jun_vbc_3g | float64 | 99999 | 0 | 0.00 | 13312 |
monthly_2g_9 | int64 | 99999 | 0 | 0.00 | 5 |
sachet_3g_6 | int64 | 99999 | 0 | 0.00 | 25 |
vol_3g_mb_9 | float64 | 99999 | 0 | 0.00 | 14472 |
sachet_3g_7 | int64 | 99999 | 0 | 0.00 | 27 |
monthly_2g_8 | int64 | 99999 | 0 | 0.00 | 6 |
monthly_3g_9 | int64 | 99999 | 0 | 0.00 | 11 |
monthly_3g_8 | int64 | 99999 | 0 | 0.00 | 12 |
sachet_3g_9 | int64 | 99999 | 0 | 0.00 | 27 |
monthly_3g_7 | int64 | 99999 | 0 | 0.00 | 15 |
monthly_3g_6 | int64 | 99999 | 0 | 0.00 | 12 |
sachet_2g_9 | int64 | 99999 | 0 | 0.00 | 32 |
sachet_2g_8 | int64 | 99999 | 0 | 0.00 | 34 |
sachet_2g_7 | int64 | 99999 | 0 | 0.00 | 35 |
sachet_2g_6 | int64 | 99999 | 0 | 0.00 | 32 |
monthly_2g_7 | int64 | 99999 | 0 | 0.00 | 6 |
monthly_2g_6 | int64 | 99999 | 0 | 0.00 | 5 |
mobile_number | int64 | 99999 | 0 | 0.00 | 99999 |
vol_3g_mb_8 | float64 | 99999 | 0 | 0.00 | 14960 |
total_og_mou_9 | float64 | 99999 | 0 | 0.00 | 39160 |
total_rech_num_7 | int64 | 99999 | 0 | 0.00 | 101 |
total_rech_num_6 | int64 | 99999 | 0 | 0.00 | 102 |
total_ic_mou_9 | float64 | 99999 | 0 | 0.00 | 31260 |
total_ic_mou_8 | float64 | 99999 | 0 | 0.00 | 32128 |
total_ic_mou_7 | float64 | 99999 | 0 | 0.00 | 32242 |
total_ic_mou_6 | float64 | 99999 | 0 | 0.00 | 32247 |
circle_id | int64 | 99999 | 0 | 0.00 | 1 |
total_og_mou_8 | float64 | 99999 | 0 | 0.00 | 40074 |
vol_3g_mb_7 | float64 | 99999 | 0 | 0.00 | 14519 |
total_og_mou_7 | float64 | 99999 | 0 | 0.00 | 40477 |
total_og_mou_6 | float64 | 99999 | 0 | 0.00 | 40327 |
arpu_9 | float64 | 99999 | 0 | 0.00 | 79937 |
arpu_8 | float64 | 99999 | 0 | 0.00 | 83615 |
arpu_7 | float64 | 99999 | 0 | 0.00 | 85308 |
arpu_6 | float64 | 99999 | 0 | 0.00 | 85681 |
last_date_of_month_6 | object | 99999 | 0 | 0.00 | 1 |
total_rech_num_8 | int64 | 99999 | 0 | 0.00 | 96 |
total_rech_num_9 | int64 | 99999 | 0 | 0.00 | 97 |
total_rech_amt_6 | int64 | 99999 | 0 | 0.00 | 2305 |
total_rech_amt_7 | int64 | 99999 | 0 | 0.00 | 2329 |
vol_3g_mb_6 | float64 | 99999 | 0 | 0.00 | 13773 |
vol_2g_mb_9 | float64 | 99999 | 0 | 0.00 | 13919 |
vol_2g_mb_8 | float64 | 99999 | 0 | 0.00 | 14994 |
vol_2g_mb_7 | float64 | 99999 | 0 | 0.00 | 15114 |
vol_2g_mb_6 | float64 | 99999 | 0 | 0.00 | 15201 |
last_day_rch_amt_9 | int64 | 99999 | 0 | 0.00 | 185 |
last_day_rch_amt_8 | int64 | 99999 | 0 | 0.00 | 199 |
last_day_rch_amt_7 | int64 | 99999 | 0 | 0.00 | 173 |
last_day_rch_amt_6 | int64 | 99999 | 0 | 0.00 | 186 |
max_rech_amt_9 | int64 | 99999 | 0 | 0.00 | 201 |
max_rech_amt_8 | int64 | 99999 | 0 | 0.00 | 213 |
max_rech_amt_7 | int64 | 99999 | 0 | 0.00 | 183 |
max_rech_amt_6 | int64 | 99999 | 0 | 0.00 | 202 |
total_rech_amt_9 | int64 | 99999 | 0 | 0.00 | 2304 |
total_rech_amt_8 | int64 | 99999 | 0 | 0.00 | 2347 |
sep_vbc_3g | float64 | 99999 | 0 | 0.00 | 3720 |
# Checking if there are any duplicate records.
data['mobile_number'].value_counts().sum()
99999
# mobile_number is a unique identifier
# Setting mobile_number as the index
data = data.set_index('mobile_number')
# Renaming columns
data = data.rename({'jun_vbc_3g' : 'vbc_3g_6', 'jul_vbc_3g' : 'vbc_3g_7', 'aug_vbc_3g' : 'vbc_3g_8', 'sep_vbc_3g' : 'vbc_3g_9'}, axis=1)
#Converting columns into appropriate data types and extracting singe value columns.
# Columns with unique values < 29 are considered as categorical variables.
# The number 30 is arrived at, by looking at the above metadata_matrix output.
columns=data.columns
change_to_cat=[]
single_value_col=[]
for column in columns:
unique_value_count=data[column].nunique()
if unique_value_count==1:
single_value_col.append(column)
if unique_value_count<=29 and unique_value_count!=0 and data[column].dtype in ['int','float']:
change_to_cat.append(column)
print( ' Columns to change to categorical data type : \n' ,pd.DataFrame(change_to_cat), '\n')
Columns to change to categorical data type : 0 0 circle_id 1 loc_og_t2o_mou 2 std_og_t2o_mou 3 loc_ic_t2o_mou 4 std_og_t2c_mou_6 5 std_og_t2c_mou_7 6 std_og_t2c_mou_8 7 std_og_t2c_mou_9 8 std_ic_t2o_mou_6 9 std_ic_t2o_mou_7 10 std_ic_t2o_mou_8 11 std_ic_t2o_mou_9 12 count_rech_3g_6 13 count_rech_3g_7 14 count_rech_3g_8 15 count_rech_3g_9 16 night_pck_user_6 17 night_pck_user_7 18 night_pck_user_8 19 night_pck_user_9 20 monthly_2g_6 21 monthly_2g_7 22 monthly_2g_8 23 monthly_2g_9 24 monthly_3g_6 25 monthly_3g_7 26 monthly_3g_8 27 monthly_3g_9 28 sachet_3g_6 29 sachet_3g_7 30 sachet_3g_8 31 sachet_3g_9 32 fb_user_6 33 fb_user_7 34 fb_user_8 35 fb_user_9
# Converting all the above columns having <=29 unique values into categorical data type.
data[change_to_cat]=data[change_to_cat].astype('category')
# Converting *sachet* variables to categorical data type
sachet_columns = data.filter(regex='.*sachet.*', axis=1).columns.values
data[sachet_columns] = data[sachet_columns].astype('category')
#Changing datatype of date variables to datetime.
columns=data.columns
col_with_date=[]
import re
for column in columns:
x = re.findall("^date", column)
if x:
col_with_date.append(column)
data[col_with_date].dtypes
date_of_last_rech_6 object date_of_last_rech_7 object date_of_last_rech_8 object date_of_last_rech_9 object date_of_last_rech_data_6 object date_of_last_rech_data_7 object date_of_last_rech_data_8 object date_of_last_rech_data_9 object dtype: object
# Checking the date format
data[col_with_date].head()
date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | |
---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||
7000842753 | 6/21/2014 | 7/16/2014 | 8/8/2014 | 9/28/2014 | 6/21/2014 | 7/16/2014 | 8/8/2014 | NaN |
7001865778 | 6/29/2014 | 7/31/2014 | 8/28/2014 | 9/30/2014 | NaN | 7/25/2014 | 8/10/2014 | NaN |
7001625959 | 6/17/2014 | 7/24/2014 | 8/14/2014 | 9/29/2014 | NaN | NaN | NaN | 9/17/2014 |
7001204172 | 6/28/2014 | 7/31/2014 | 8/31/2014 | 9/30/2014 | NaN | NaN | NaN | NaN |
7000142493 | 6/26/2014 | 7/28/2014 | 8/9/2014 | 9/28/2014 | 6/4/2014 | NaN | NaN | NaN |
for col in col_with_date:
data[col]=pd.to_datetime(data[col], format="%m/%d/%Y")
data[col_with_date].head()
date_of_last_rech_6 | date_of_last_rech_7 | date_of_last_rech_8 | date_of_last_rech_9 | date_of_last_rech_data_6 | date_of_last_rech_data_7 | date_of_last_rech_data_8 | date_of_last_rech_data_9 | |
---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||
7000842753 | 2014-06-21 | 2014-07-16 | 2014-08-08 | 2014-09-28 | 2014-06-21 | 2014-07-16 | 2014-08-08 | NaT |
7001865778 | 2014-06-29 | 2014-07-31 | 2014-08-28 | 2014-09-30 | NaT | 2014-07-25 | 2014-08-10 | NaT |
7001625959 | 2014-06-17 | 2014-07-24 | 2014-08-14 | 2014-09-29 | NaT | NaT | NaT | 2014-09-17 |
7001204172 | 2014-06-28 | 2014-07-31 | 2014-08-31 | 2014-09-30 | NaT | NaT | NaT | NaT |
7000142493 | 2014-06-26 | 2014-07-28 | 2014-08-09 | 2014-09-28 | 2014-06-04 | NaT | NaT | NaT |
#Deriving Average recharge amount of June and July.
data['Average_rech_amt_6n7']=(data['total_rech_amt_6']+data['total_rech_amt_7'])/2
#Filtering based HIGH VALUED CUSTOMERS based on (Average_rech_amt_6n7 >= 70th percentile of Average_rech_amt_6n7)
data=data[(data['Average_rech_amt_6n7']>= data['Average_rech_amt_6n7'].quantile(0.7))]
#Checking for missing values.
missing_values = metadata_matrix(data)[['Datatype', 'Null_Percentage']].sort_values(by='Null_Percentage', ascending=False)
missing_values
Datatype | Null_Percentage | |
---|---|---|
av_rech_amt_data_6 | float64 | 62.02 |
count_rech_2g_6 | float64 | 62.02 |
arpu_2g_6 | float64 | 62.02 |
max_rech_data_6 | float64 | 62.02 |
night_pck_user_6 | category | 62.02 |
date_of_last_rech_data_6 | datetime64[ns] | 62.02 |
total_rech_data_6 | float64 | 62.02 |
arpu_3g_6 | float64 | 62.02 |
fb_user_6 | category | 62.02 |
count_rech_3g_6 | category | 62.02 |
av_rech_amt_data_9 | float64 | 61.81 |
count_rech_2g_9 | float64 | 61.81 |
night_pck_user_9 | category | 61.81 |
arpu_3g_9 | float64 | 61.81 |
arpu_2g_9 | float64 | 61.81 |
fb_user_9 | category | 61.81 |
date_of_last_rech_data_9 | datetime64[ns] | 61.81 |
total_rech_data_9 | float64 | 61.81 |
count_rech_3g_9 | category | 61.81 |
max_rech_data_9 | float64 | 61.81 |
count_rech_2g_7 | float64 | 61.14 |
count_rech_3g_7 | category | 61.14 |
arpu_2g_7 | float64 | 61.14 |
arpu_3g_7 | float64 | 61.14 |
av_rech_amt_data_7 | float64 | 61.14 |
max_rech_data_7 | float64 | 61.14 |
fb_user_7 | category | 61.14 |
total_rech_data_7 | float64 | 61.14 |
date_of_last_rech_data_7 | datetime64[ns] | 61.14 |
night_pck_user_7 | category | 61.14 |
av_rech_amt_data_8 | float64 | 60.83 |
count_rech_3g_8 | category | 60.83 |
total_rech_data_8 | float64 | 60.83 |
arpu_3g_8 | float64 | 60.83 |
max_rech_data_8 | float64 | 60.83 |
date_of_last_rech_data_8 | datetime64[ns] | 60.83 |
arpu_2g_8 | float64 | 60.83 |
fb_user_8 | category | 60.83 |
night_pck_user_8 | category | 60.83 |
count_rech_2g_8 | float64 | 60.83 |
loc_og_t2t_mou_9 | float64 | 5.68 |
ic_others_9 | float64 | 5.68 |
isd_ic_mou_9 | float64 | 5.68 |
og_others_9 | float64 | 5.68 |
loc_og_t2f_mou_9 | float64 | 5.68 |
roam_ic_mou_9 | float64 | 5.68 |
loc_og_mou_9 | float64 | 5.68 |
std_og_t2f_mou_9 | float64 | 5.68 |
loc_og_t2m_mou_9 | float64 | 5.68 |
std_og_t2m_mou_9 | float64 | 5.68 |
loc_og_t2c_mou_9 | float64 | 5.68 |
std_og_t2t_mou_9 | float64 | 5.68 |
std_ic_t2o_mou_9 | category | 5.68 |
std_ic_mou_9 | float64 | 5.68 |
spl_ic_mou_9 | float64 | 5.68 |
std_ic_t2f_mou_9 | float64 | 5.68 |
roam_og_mou_9 | float64 | 5.68 |
std_ic_t2m_mou_9 | float64 | 5.68 |
offnet_mou_9 | float64 | 5.68 |
std_og_mou_9 | float64 | 5.68 |
spl_og_mou_9 | float64 | 5.68 |
loc_ic_t2t_mou_9 | float64 | 5.68 |
onnet_mou_9 | float64 | 5.68 |
loc_ic_t2m_mou_9 | float64 | 5.68 |
loc_ic_t2f_mou_9 | float64 | 5.68 |
std_og_t2c_mou_9 | category | 5.68 |
loc_ic_mou_9 | float64 | 5.68 |
std_ic_t2t_mou_9 | float64 | 5.68 |
isd_og_mou_9 | float64 | 5.68 |
std_og_t2t_mou_8 | float64 | 3.13 |
std_og_t2c_mou_8 | category | 3.13 |
std_og_t2f_mou_8 | float64 | 3.13 |
std_og_mou_8 | float64 | 3.13 |
roam_og_mou_8 | float64 | 3.13 |
isd_og_mou_8 | float64 | 3.13 |
loc_og_t2t_mou_8 | float64 | 3.13 |
spl_ic_mou_8 | float64 | 3.13 |
std_og_t2m_mou_8 | float64 | 3.13 |
ic_others_8 | float64 | 3.13 |
offnet_mou_8 | float64 | 3.13 |
og_others_8 | float64 | 3.13 |
isd_ic_mou_8 | float64 | 3.13 |
roam_ic_mou_8 | float64 | 3.13 |
spl_og_mou_8 | float64 | 3.13 |
loc_og_t2f_mou_8 | float64 | 3.13 |
std_ic_t2m_mou_8 | float64 | 3.13 |
std_ic_t2f_mou_8 | float64 | 3.13 |
std_ic_t2t_mou_8 | float64 | 3.13 |
loc_og_t2c_mou_8 | float64 | 3.13 |
loc_ic_mou_8 | float64 | 3.13 |
onnet_mou_8 | float64 | 3.13 |
loc_og_t2m_mou_8 | float64 | 3.13 |
loc_ic_t2f_mou_8 | float64 | 3.13 |
std_ic_t2o_mou_8 | category | 3.13 |
loc_og_mou_8 | float64 | 3.13 |
loc_ic_t2m_mou_8 | float64 | 3.13 |
std_ic_mou_8 | float64 | 3.13 |
loc_ic_t2t_mou_8 | float64 | 3.13 |
date_of_last_rech_9 | datetime64[ns] | 2.89 |
date_of_last_rech_8 | datetime64[ns] | 1.98 |
last_date_of_month_9 | object | 1.20 |
loc_og_mou_6 | float64 | 1.05 |
std_ic_t2m_mou_6 | float64 | 1.05 |
roam_og_mou_6 | float64 | 1.05 |
std_ic_t2t_mou_6 | float64 | 1.05 |
loc_ic_mou_6 | float64 | 1.05 |
roam_ic_mou_6 | float64 | 1.05 |
loc_ic_t2f_mou_6 | float64 | 1.05 |
loc_ic_t2m_mou_6 | float64 | 1.05 |
std_og_t2t_mou_6 | float64 | 1.05 |
onnet_mou_6 | float64 | 1.05 |
loc_ic_t2t_mou_6 | float64 | 1.05 |
offnet_mou_6 | float64 | 1.05 |
og_others_6 | float64 | 1.05 |
loc_og_t2t_mou_6 | float64 | 1.05 |
isd_og_mou_6 | float64 | 1.05 |
std_og_t2m_mou_6 | float64 | 1.05 |
loc_og_t2f_mou_6 | float64 | 1.05 |
spl_ic_mou_6 | float64 | 1.05 |
std_ic_mou_6 | float64 | 1.05 |
isd_ic_mou_6 | float64 | 1.05 |
loc_og_t2m_mou_6 | float64 | 1.05 |
std_ic_t2o_mou_6 | category | 1.05 |
spl_og_mou_6 | float64 | 1.05 |
ic_others_6 | float64 | 1.05 |
std_ic_t2f_mou_6 | float64 | 1.05 |
loc_og_t2c_mou_6 | float64 | 1.05 |
std_og_mou_6 | float64 | 1.05 |
std_og_t2f_mou_6 | float64 | 1.05 |
std_og_t2c_mou_6 | category | 1.05 |
roam_ic_mou_7 | float64 | 1.01 |
loc_og_t2c_mou_7 | float64 | 1.01 |
loc_og_t2f_mou_7 | float64 | 1.01 |
loc_og_t2m_mou_7 | float64 | 1.01 |
loc_og_t2t_mou_7 | float64 | 1.01 |
roam_og_mou_7 | float64 | 1.01 |
std_ic_t2t_mou_7 | float64 | 1.01 |
offnet_mou_7 | float64 | 1.01 |
onnet_mou_7 | float64 | 1.01 |
std_ic_t2f_mou_7 | float64 | 1.01 |
std_ic_mou_7 | float64 | 1.01 |
loc_ic_t2f_mou_7 | float64 | 1.01 |
std_ic_t2m_mou_7 | float64 | 1.01 |
loc_og_mou_7 | float64 | 1.01 |
loc_ic_t2t_mou_7 | float64 | 1.01 |
std_og_t2t_mou_7 | float64 | 1.01 |
std_og_t2c_mou_7 | category | 1.01 |
std_og_mou_7 | float64 | 1.01 |
isd_og_mou_7 | float64 | 1.01 |
spl_og_mou_7 | float64 | 1.01 |
og_others_7 | float64 | 1.01 |
spl_ic_mou_7 | float64 | 1.01 |
loc_ic_t2m_mou_7 | float64 | 1.01 |
loc_ic_mou_7 | float64 | 1.01 |
ic_others_7 | float64 | 1.01 |
std_og_t2m_mou_7 | float64 | 1.01 |
isd_ic_mou_7 | float64 | 1.01 |
std_ic_t2o_mou_7 | category | 1.01 |
std_og_t2f_mou_7 | float64 | 1.01 |
last_date_of_month_8 | object | 0.52 |
loc_og_t2o_mou | category | 0.38 |
loc_ic_t2o_mou | category | 0.38 |
date_of_last_rech_7 | datetime64[ns] | 0.38 |
std_og_t2o_mou | category | 0.38 |
date_of_last_rech_6 | datetime64[ns] | 0.21 |
last_date_of_month_7 | object | 0.10 |
vol_3g_mb_6 | float64 | 0.00 |
arpu_6 | float64 | 0.00 |
total_rech_amt_8 | int64 | 0.00 |
total_rech_amt_7 | int64 | 0.00 |
total_rech_amt_6 | int64 | 0.00 |
total_rech_num_9 | int64 | 0.00 |
last_date_of_month_6 | object | 0.00 |
vol_3g_mb_8 | float64 | 0.00 |
arpu_7 | float64 | 0.00 |
arpu_8 | float64 | 0.00 |
arpu_9 | float64 | 0.00 |
total_og_mou_6 | float64 | 0.00 |
total_og_mou_7 | float64 | 0.00 |
vol_3g_mb_7 | float64 | 0.00 |
max_rech_amt_9 | int64 | 0.00 |
vol_2g_mb_9 | float64 | 0.00 |
vol_2g_mb_8 | float64 | 0.00 |
vol_2g_mb_7 | float64 | 0.00 |
vol_2g_mb_6 | float64 | 0.00 |
last_day_rch_amt_9 | int64 | 0.00 |
last_day_rch_amt_8 | int64 | 0.00 |
last_day_rch_amt_7 | int64 | 0.00 |
last_day_rch_amt_6 | int64 | 0.00 |
max_rech_amt_8 | int64 | 0.00 |
max_rech_amt_7 | int64 | 0.00 |
max_rech_amt_6 | int64 | 0.00 |
total_rech_amt_9 | int64 | 0.00 |
total_ic_mou_6 | float64 | 0.00 |
total_og_mou_8 | float64 | 0.00 |
vbc_3g_8 | float64 | 0.00 |
total_ic_mou_7 | float64 | 0.00 |
total_ic_mou_8 | float64 | 0.00 |
sachet_3g_9 | category | 0.00 |
sachet_3g_7 | category | 0.00 |
vbc_3g_9 | float64 | 0.00 |
vbc_3g_6 | float64 | 0.00 |
vbc_3g_7 | float64 | 0.00 |
aon | int64 | 0.00 |
sachet_3g_6 | category | 0.00 |
monthly_3g_8 | category | 0.00 |
monthly_3g_9 | category | 0.00 |
sachet_3g_8 | category | 0.00 |
monthly_3g_7 | category | 0.00 |
sachet_2g_9 | category | 0.00 |
sachet_2g_8 | category | 0.00 |
sachet_2g_7 | category | 0.00 |
sachet_2g_6 | category | 0.00 |
monthly_2g_9 | category | 0.00 |
monthly_2g_8 | category | 0.00 |
monthly_2g_7 | category | 0.00 |
monthly_2g_6 | category | 0.00 |
monthly_3g_6 | category | 0.00 |
circle_id | category | 0.00 |
vol_3g_mb_9 | float64 | 0.00 |
total_og_mou_9 | float64 | 0.00 |
total_rech_num_8 | int64 | 0.00 |
total_rech_num_7 | int64 | 0.00 |
total_rech_num_6 | int64 | 0.00 |
total_ic_mou_9 | float64 | 0.00 |
Average_rech_amt_6n7 | float64 | 0.00 |
# Columns with high missing values , > 50%
metadata = metadata_matrix(data)
condition = metadata['Null_Percentage'] > 50
high_missing_values = metadata[condition]
high_missing_values
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
av_rech_amt_data_6 | float64 | 11397 | 18614 | 62.02 | 794 |
count_rech_3g_6 | category | 11397 | 18614 | 62.02 | 25 |
count_rech_2g_6 | float64 | 11397 | 18614 | 62.02 | 30 |
arpu_2g_6 | float64 | 11397 | 18614 | 62.02 | 4503 |
max_rech_data_6 | float64 | 11397 | 18614 | 62.02 | 43 |
night_pck_user_6 | category | 11397 | 18614 | 62.02 | 2 |
date_of_last_rech_data_6 | datetime64[ns] | 11397 | 18614 | 62.02 | 30 |
total_rech_data_6 | float64 | 11397 | 18614 | 62.02 | 36 |
arpu_3g_6 | float64 | 11397 | 18614 | 62.02 | 4875 |
fb_user_6 | category | 11397 | 18614 | 62.02 | 2 |
max_rech_data_9 | float64 | 11461 | 18550 | 61.81 | 48 |
count_rech_3g_9 | category | 11461 | 18550 | 61.81 | 27 |
fb_user_9 | category | 11461 | 18550 | 61.81 | 2 |
total_rech_data_9 | float64 | 11461 | 18550 | 61.81 | 35 |
date_of_last_rech_data_9 | datetime64[ns] | 11461 | 18550 | 61.81 | 30 |
av_rech_amt_data_9 | float64 | 11461 | 18550 | 61.81 | 812 |
arpu_2g_9 | float64 | 11461 | 18550 | 61.81 | 3846 |
arpu_3g_9 | float64 | 11461 | 18550 | 61.81 | 4800 |
night_pck_user_9 | category | 11461 | 18550 | 61.81 | 2 |
count_rech_2g_9 | float64 | 11461 | 18550 | 61.81 | 29 |
fb_user_7 | category | 11662 | 18349 | 61.14 | 2 |
date_of_last_rech_data_7 | datetime64[ns] | 11662 | 18349 | 61.14 | 31 |
total_rech_data_7 | float64 | 11662 | 18349 | 61.14 | 40 |
night_pck_user_7 | category | 11662 | 18349 | 61.14 | 2 |
max_rech_data_7 | float64 | 11662 | 18349 | 61.14 | 46 |
count_rech_2g_7 | float64 | 11662 | 18349 | 61.14 | 35 |
arpu_3g_7 | float64 | 11662 | 18349 | 61.14 | 4860 |
av_rech_amt_data_7 | float64 | 11662 | 18349 | 61.14 | 863 |
arpu_2g_7 | float64 | 11662 | 18349 | 61.14 | 4219 |
count_rech_3g_7 | category | 11662 | 18349 | 61.14 | 28 |
night_pck_user_8 | category | 11754 | 18257 | 60.83 | 2 |
fb_user_8 | category | 11754 | 18257 | 60.83 | 2 |
arpu_2g_8 | float64 | 11754 | 18257 | 60.83 | 3854 |
count_rech_2g_8 | float64 | 11754 | 18257 | 60.83 | 33 |
date_of_last_rech_data_8 | datetime64[ns] | 11754 | 18257 | 60.83 | 31 |
av_rech_amt_data_8 | float64 | 11754 | 18257 | 60.83 | 837 |
arpu_3g_8 | float64 | 11754 | 18257 | 60.83 | 4769 |
total_rech_data_8 | float64 | 11754 | 18257 | 60.83 | 45 |
count_rech_3g_8 | category | 11754 | 18257 | 60.83 | 29 |
max_rech_data_8 | float64 | 11754 | 18257 | 60.83 | 47 |
# Dropping above columns with high missing values
high_missing_value_columns = high_missing_values.index
data.drop(columns=high_missing_value_columns, inplace=True)
# Looking at remaining columns with missing values
metadata_matrix(data)
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
std_ic_t2o_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
spl_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 2966 |
isd_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 908 |
roam_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3370 |
std_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 15900 |
roam_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 4004 |
std_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1971 |
std_og_t2c_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
loc_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 10360 |
std_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1595 |
std_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 7745 |
loc_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15585 |
std_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 12445 |
loc_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 3111 |
std_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 11141 |
loc_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 18018 |
loc_og_t2c_mou_9 | float64 | 28307 | 1704 | 5.68 | 1576 |
offnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 20452 |
loc_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 18207 |
spl_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 287 |
std_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 6168 |
loc_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 4611 |
ic_others_9 | float64 | 28307 | 1704 | 5.68 | 1284 |
loc_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15194 |
loc_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 9407 |
std_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 4280 |
isd_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3329 |
og_others_9 | float64 | 28307 | 1704 | 5.68 | 132 |
onnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 16674 |
std_og_mou_8 | float64 | 29073 | 938 | 3.13 | 16864 |
std_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 13326 |
og_others_8 | float64 | 29073 | 938 | 3.13 | 133 |
loc_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 4705 |
std_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 11781 |
loc_og_mou_8 | float64 | 29073 | 938 | 3.13 | 18885 |
std_ic_t2o_mou_8 | category | 29073 | 938 | 3.13 | 1 |
loc_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 15598 |
std_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 6420 |
std_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 4486 |
std_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1627 |
std_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1941 |
spl_og_mou_8 | float64 | 29073 | 938 | 3.13 | 3238 |
loc_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 9671 |
std_og_t2c_mou_8 | category | 29073 | 938 | 3.13 | 1 |
isd_og_mou_8 | float64 | 29073 | 938 | 3.13 | 940 |
loc_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 18573 |
roam_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3655 |
isd_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3493 |
onnet_mou_8 | float64 | 29073 | 938 | 3.13 | 17604 |
loc_og_t2c_mou_8 | float64 | 29073 | 938 | 3.13 | 1730 |
spl_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 85 |
loc_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 3124 |
std_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 8033 |
roam_og_mou_8 | float64 | 29073 | 938 | 3.13 | 4382 |
ic_others_8 | float64 | 29073 | 938 | 3.13 | 1259 |
loc_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 16165 |
loc_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 10772 |
offnet_mou_8 | float64 | 29073 | 938 | 3.13 | 21513 |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
std_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 8391 |
offnet_mou_6 | float64 | 29695 | 316 | 1.05 | 22454 |
std_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 2033 |
isd_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 3429 |
ic_others_6 | float64 | 29695 | 316 | 1.05 | 1227 |
onnet_mou_6 | float64 | 29695 | 316 | 1.05 | 18813 |
std_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 6680 |
loc_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 9872 |
loc_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16015 |
loc_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 4817 |
loc_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 19133 |
std_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 4608 |
og_others_6 | float64 | 29695 | 316 | 1.05 | 862 |
spl_og_mou_6 | float64 | 29695 | 316 | 1.05 | 3053 |
roam_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 4338 |
spl_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 78 |
std_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 12777 |
loc_og_t2c_mou_6 | float64 | 29695 | 316 | 1.05 | 1658 |
std_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 14518 |
loc_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 3252 |
std_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 1773 |
loc_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16747 |
std_ic_t2o_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_t2c_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_mou_6 | float64 | 29695 | 316 | 1.05 | 18325 |
loc_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 11151 |
isd_og_mou_6 | float64 | 29695 | 316 | 1.05 | 1113 |
roam_og_mou_6 | float64 | 29695 | 316 | 1.05 | 5174 |
loc_og_mou_6 | float64 | 29695 | 316 | 1.05 | 19691 |
isd_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3639 |
std_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 2075 |
std_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 6747 |
std_ic_t2o_mou_7 | category | 29708 | 303 | 1.01 | 1 |
ic_others_7 | float64 | 29708 | 303 | 1.01 | 1371 |
spl_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 93 |
std_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 4706 |
std_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 8543 |
loc_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 4897 |
og_others_7 | float64 | 29708 | 303 | 1.01 | 123 |
loc_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 19030 |
std_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 1714 |
onnet_mou_7 | float64 | 29708 | 303 | 1.01 | 18938 |
roam_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3649 |
roam_og_mou_7 | float64 | 29708 | 303 | 1.01 | 4431 |
loc_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 11154 |
loc_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16872 |
loc_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 3267 |
loc_og_t2c_mou_7 | float64 | 29708 | 303 | 1.01 | 1750 |
loc_og_mou_7 | float64 | 29708 | 303 | 1.01 | 19880 |
std_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 12983 |
std_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 14589 |
offnet_mou_7 | float64 | 29708 | 303 | 1.01 | 22650 |
std_og_t2c_mou_7 | category | 29708 | 303 | 1.01 | 1 |
loc_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 9961 |
isd_og_mou_7 | float64 | 29708 | 303 | 1.01 | 1125 |
spl_og_mou_7 | float64 | 29708 | 303 | 1.01 | 3399 |
std_og_mou_7 | float64 | 29708 | 303 | 1.01 | 18445 |
loc_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16068 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
std_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
loc_ic_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
loc_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
aon | int64 | 30011 | 0 | 0.00 | 3321 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
circle_id | category | 30011 | 0 | 0.00 | 1 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
# Month 6
sixth_month_columns = []
for column in data.columns:
x = re.search("6$", column)
if x:
sixth_month_columns.append(column)
# missing_values.loc[sixth_month_columns].sort_values(by='Null_Percentage', ascending=False)
metadata = metadata_matrix(data)
condition = metadata.index.isin(sixth_month_columns)
sixth_month_metadata = metadata[condition]
sixth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
std_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 8391 |
offnet_mou_6 | float64 | 29695 | 316 | 1.05 | 22454 |
std_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 2033 |
isd_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 3429 |
ic_others_6 | float64 | 29695 | 316 | 1.05 | 1227 |
onnet_mou_6 | float64 | 29695 | 316 | 1.05 | 18813 |
std_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 6680 |
loc_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 9872 |
loc_ic_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16015 |
loc_ic_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 4817 |
loc_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 19133 |
std_ic_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 4608 |
og_others_6 | float64 | 29695 | 316 | 1.05 | 862 |
spl_og_mou_6 | float64 | 29695 | 316 | 1.05 | 3053 |
roam_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 4338 |
spl_ic_mou_6 | float64 | 29695 | 316 | 1.05 | 78 |
std_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 12777 |
loc_og_t2c_mou_6 | float64 | 29695 | 316 | 1.05 | 1658 |
std_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 14518 |
loc_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 3252 |
std_og_t2f_mou_6 | float64 | 29695 | 316 | 1.05 | 1773 |
loc_og_t2m_mou_6 | float64 | 29695 | 316 | 1.05 | 16747 |
std_ic_t2o_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_t2c_mou_6 | category | 29695 | 316 | 1.05 | 1 |
std_og_mou_6 | float64 | 29695 | 316 | 1.05 | 18325 |
loc_og_t2t_mou_6 | float64 | 29695 | 316 | 1.05 | 11151 |
isd_og_mou_6 | float64 | 29695 | 316 | 1.05 | 1113 |
roam_og_mou_6 | float64 | 29695 | 316 | 1.05 | 5174 |
loc_og_mou_6 | float64 | 29695 | 316 | 1.05 | 19691 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
# columns with meaningful missing in 6th month
sixth_month_meaningful_missing_condition = sixth_month_metadata['Null_Percentage'] == 1.05
sixth_month_meaningful_missing_cols = sixth_month_metadata[sixth_month_meaningful_missing_condition].index.values
sixth_month_meaningful_missing_cols
array(['std_ic_mou_6', 'offnet_mou_6', 'std_ic_t2f_mou_6', 'isd_ic_mou_6', 'ic_others_6', 'onnet_mou_6', 'std_ic_t2m_mou_6', 'loc_ic_t2t_mou_6', 'loc_ic_t2m_mou_6', 'loc_ic_t2f_mou_6', 'loc_ic_mou_6', 'std_ic_t2t_mou_6', 'og_others_6', 'spl_og_mou_6', 'roam_ic_mou_6', 'spl_ic_mou_6', 'std_og_t2t_mou_6', 'loc_og_t2c_mou_6', 'std_og_t2m_mou_6', 'loc_og_t2f_mou_6', 'std_og_t2f_mou_6', 'loc_og_t2m_mou_6', 'std_ic_t2o_mou_6', 'std_og_t2c_mou_6', 'std_og_mou_6', 'loc_og_t2t_mou_6', 'isd_og_mou_6', 'roam_og_mou_6', 'loc_og_mou_6'], dtype=object)
# Looking at all sixth month columns where rows of *_mou are null
condition = data[sixth_month_meaningful_missing_cols].isnull()
# data.loc[condition, sixth_month_columns]
# Rows is null for all the above columns
missing_rows = pd.Series([True]*data.shape[0], index = data.index)
for column in sixth_month_meaningful_missing_cols :
missing_rows = missing_rows & data[column].isnull()
print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_6'].unique()[0])
print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_6'].unique()[0])
Total outgoing mou for each customer with missing *_mou data is 0.0 Total incoming mou for each customer with missing *_mou data is 0.0
# Imputation
data[sixth_month_meaningful_missing_cols] = data[sixth_month_meaningful_missing_cols].fillna(0)
metadata = metadata_matrix(data)
# Remaining Missing Values
metadata.iloc[metadata.index.isin(sixth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
# Looking at 'recharge' related 6th month columns for customers with missing 'date_of_last_rech_6'
condition = data['date_of_last_rech_6'].isnull()
data[condition].filter(regex='.*rech.*6$', axis=1).head()
total_rech_num_6 | total_rech_amt_6 | max_rech_amt_6 | date_of_last_rech_6 | |
---|---|---|---|---|
mobile_number | ||||
7001588448 | 0 | 0 | 0 | NaT |
7001223277 | 0 | 0 | 0 | NaT |
7000721536 | 0 | 0 | 0 | NaT |
7001490351 | 0 | 0 | 0 | NaT |
7000665415 | 0 | 0 | 0 | NaT |
data[condition].filter(regex='.*rech.*6$', axis=1).nunique()
total_rech_num_6 1 total_rech_amt_6 1 max_rech_amt_6 1 date_of_last_rech_6 0 dtype: int64
# Check for missing values in 6th month variables
metadata = metadata_matrix(data)
metadata[metadata.index.isin(sixth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
# Month : 7
seventh_month_columns = data.filter(regex='7$', axis=1).columns
seventh_month_columns
Index(['last_date_of_month_7', 'arpu_7', 'onnet_mou_7', 'offnet_mou_7', 'roam_ic_mou_7', 'roam_og_mou_7', 'loc_og_t2t_mou_7', 'loc_og_t2m_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7', 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7', 'std_og_t2f_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7', 'isd_og_mou_7', 'spl_og_mou_7', 'og_others_7', 'total_og_mou_7', 'loc_ic_t2t_mou_7', 'loc_ic_t2m_mou_7', 'loc_ic_t2f_mou_7', 'loc_ic_mou_7', 'std_ic_t2t_mou_7', 'std_ic_t2m_mou_7', 'std_ic_t2f_mou_7', 'std_ic_t2o_mou_7', 'std_ic_mou_7', 'total_ic_mou_7', 'spl_ic_mou_7', 'isd_ic_mou_7', 'ic_others_7', 'total_rech_num_7', 'total_rech_amt_7', 'max_rech_amt_7', 'date_of_last_rech_7', 'last_day_rch_amt_7', 'vol_2g_mb_7', 'vol_3g_mb_7', 'monthly_2g_7', 'sachet_2g_7', 'monthly_3g_7', 'sachet_3g_7', 'vbc_3g_7', 'Average_rech_amt_6n7'], dtype='object')
seventh_month_metadata = metadata[metadata.index.isin(seventh_month_columns)]
seventh_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
loc_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 9961 |
og_others_7 | float64 | 29708 | 303 | 1.01 | 123 |
loc_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 4897 |
loc_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16068 |
loc_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 19030 |
std_ic_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 4706 |
std_ic_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 2075 |
std_ic_t2o_mou_7 | category | 29708 | 303 | 1.01 | 1 |
std_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 8543 |
spl_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 93 |
isd_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3639 |
ic_others_7 | float64 | 29708 | 303 | 1.01 | 1371 |
std_ic_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 6747 |
isd_og_mou_7 | float64 | 29708 | 303 | 1.01 | 1125 |
spl_og_mou_7 | float64 | 29708 | 303 | 1.01 | 3399 |
std_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 1714 |
onnet_mou_7 | float64 | 29708 | 303 | 1.01 | 18938 |
offnet_mou_7 | float64 | 29708 | 303 | 1.01 | 22650 |
roam_ic_mou_7 | float64 | 29708 | 303 | 1.01 | 3649 |
roam_og_mou_7 | float64 | 29708 | 303 | 1.01 | 4431 |
loc_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 11154 |
loc_og_t2f_mou_7 | float64 | 29708 | 303 | 1.01 | 3267 |
loc_og_t2c_mou_7 | float64 | 29708 | 303 | 1.01 | 1750 |
loc_og_mou_7 | float64 | 29708 | 303 | 1.01 | 19880 |
std_og_t2t_mou_7 | float64 | 29708 | 303 | 1.01 | 12983 |
std_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 14589 |
loc_og_t2m_mou_7 | float64 | 29708 | 303 | 1.01 | 16872 |
std_og_t2c_mou_7 | category | 29708 | 303 | 1.01 | 1 |
std_og_mou_7 | float64 | 29708 | 303 | 1.01 | 18445 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
# columns with meaningful missing in 7th month
seventh_month_meaningful_missing_condition = seventh_month_metadata['Null_Percentage'] == 1.01
seventh_month_meaningful_missing_cols = seventh_month_metadata[seventh_month_meaningful_missing_condition].index.values
seventh_month_meaningful_missing_cols
array(['loc_ic_t2t_mou_7', 'og_others_7', 'loc_ic_t2f_mou_7', 'loc_ic_t2m_mou_7', 'loc_ic_mou_7', 'std_ic_t2t_mou_7', 'std_ic_t2f_mou_7', 'std_ic_t2o_mou_7', 'std_ic_mou_7', 'spl_ic_mou_7', 'isd_ic_mou_7', 'ic_others_7', 'std_ic_t2m_mou_7', 'isd_og_mou_7', 'spl_og_mou_7', 'std_og_t2f_mou_7', 'onnet_mou_7', 'offnet_mou_7', 'roam_ic_mou_7', 'roam_og_mou_7', 'loc_og_t2t_mou_7', 'loc_og_t2f_mou_7', 'loc_og_t2c_mou_7', 'loc_og_mou_7', 'std_og_t2t_mou_7', 'std_og_t2m_mou_7', 'loc_og_t2m_mou_7', 'std_og_t2c_mou_7', 'std_og_mou_7'], dtype=object)
# Looking at all 7th month columns where rows of *_mou are null
condition = data[seventh_month_meaningful_missing_cols].isnull()
# Rows is null for all the above columns
missing_rows = pd.Series([True]*data.shape[0], index = data.index)
for column in seventh_month_meaningful_missing_cols :
missing_rows = missing_rows & data[column].isnull()
print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_7'].unique()[0])
print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_7'].unique()[0])
Total outgoing mou for each customer with missing *_mou data is 0.0 Total incoming mou for each customer with missing *_mou data is 0.0
# Imputation
data[seventh_month_meaningful_missing_cols] = data[seventh_month_meaningful_missing_cols].fillna(0)
metadata = metadata_matrix(data)
# Remaining Missing Values
metadata.iloc[metadata.index.isin(seventh_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
last_date_of_month_7 | object | 29980 | 31 | 0.10 | 1 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
ic_others_7 | float64 | 30011 | 0 | 0.00 | 1371 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3639 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 93 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 4897 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 11154 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 14589 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 12983 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.00 | 19880 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.00 | 1750 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 3267 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16872 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.00 | 4431 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3649 |
offnet_mou_7 | float64 | 30011 | 0 | 0.00 | 22650 |
onnet_mou_7 | float64 | 30011 | 0 | 0.00 | 18938 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 1714 |
std_og_t2c_mou_7 | category | 30011 | 0 | 0.00 | 1 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16068 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 8543 |
std_ic_t2o_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 2075 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 6747 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 4706 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 19030 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 9961 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
og_others_7 | float64 | 30011 | 0 | 0.00 | 123 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.00 | 3399 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.00 | 1125 |
std_og_mou_7 | float64 | 30011 | 0 | 0.00 | 18445 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
# Looking at 'recharge' related 7th month columns for customers with missing 'date_of_last_rech_7'
condition = data['date_of_last_rech_7'].isnull()
data[condition].filter(regex='.*rech.*7$', axis=1).head()
total_rech_num_7 | total_rech_amt_7 | max_rech_amt_7 | date_of_last_rech_7 | Average_rech_amt_6n7 | |
---|---|---|---|---|---|
mobile_number | |||||
7000369789 | 0 | 0 | 0 | NaT | 393.0 |
7001967148 | 0 | 0 | 0 | NaT | 500.5 |
7000066601 | 0 | 0 | 0 | NaT | 490.0 |
7001189556 | 0 | 0 | 0 | NaT | 523.5 |
7002024450 | 0 | 0 | 0 | NaT | 493.0 |
data[condition].filter(regex='.*rech.*7$', axis=1).nunique()
total_rech_num_7 1 total_rech_amt_7 1 max_rech_amt_7 1 date_of_last_rech_7 0 Average_rech_amt_6n7 90 dtype: int64
# Month : 8
eighth_month_columns = data.filter(regex="8$", axis=1).columns
metadata = metadata_matrix(data)
condition = metadata.index.isin(eighth_month_columns)
eighth_month_metadata = metadata[condition]
eighth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
std_og_t2c_mou_8 | category | 29073 | 938 | 3.13 | 1 |
std_og_mou_8 | float64 | 29073 | 938 | 3.13 | 16864 |
isd_og_mou_8 | float64 | 29073 | 938 | 3.13 | 940 |
loc_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 18573 |
std_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 13326 |
loc_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 15598 |
loc_og_mou_8 | float64 | 29073 | 938 | 3.13 | 18885 |
std_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 11781 |
std_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1627 |
loc_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 4705 |
loc_og_t2c_mou_8 | float64 | 29073 | 938 | 3.13 | 1730 |
ic_others_8 | float64 | 29073 | 938 | 3.13 | 1259 |
loc_og_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 16165 |
spl_og_mou_8 | float64 | 29073 | 938 | 3.13 | 3238 |
roam_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3655 |
std_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 8033 |
spl_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 85 |
std_ic_t2o_mou_8 | category | 29073 | 938 | 3.13 | 1 |
onnet_mou_8 | float64 | 29073 | 938 | 3.13 | 17604 |
loc_og_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 3124 |
offnet_mou_8 | float64 | 29073 | 938 | 3.13 | 21513 |
std_ic_t2f_mou_8 | float64 | 29073 | 938 | 3.13 | 1941 |
og_others_8 | float64 | 29073 | 938 | 3.13 | 133 |
loc_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 9671 |
std_ic_t2m_mou_8 | float64 | 29073 | 938 | 3.13 | 6420 |
std_ic_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 4486 |
roam_og_mou_8 | float64 | 29073 | 938 | 3.13 | 4382 |
isd_ic_mou_8 | float64 | 29073 | 938 | 3.13 | 3493 |
loc_og_t2t_mou_8 | float64 | 29073 | 938 | 3.13 | 10772 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
# columns with meaningful missing in 8th month
eighth_month_meaningful_missing_condition = eighth_month_metadata['Null_Percentage'] == 3.13
eighth_month_meaningful_missing_cols = eighth_month_metadata[eighth_month_meaningful_missing_condition].index.values
eighth_month_meaningful_missing_cols
array(['std_og_t2c_mou_8', 'std_og_mou_8', 'isd_og_mou_8', 'loc_ic_mou_8', 'std_og_t2m_mou_8', 'loc_ic_t2m_mou_8', 'loc_og_mou_8', 'std_og_t2t_mou_8', 'std_og_t2f_mou_8', 'loc_ic_t2f_mou_8', 'loc_og_t2c_mou_8', 'ic_others_8', 'loc_og_t2m_mou_8', 'spl_og_mou_8', 'roam_ic_mou_8', 'std_ic_mou_8', 'spl_ic_mou_8', 'std_ic_t2o_mou_8', 'onnet_mou_8', 'loc_og_t2f_mou_8', 'offnet_mou_8', 'std_ic_t2f_mou_8', 'og_others_8', 'loc_ic_t2t_mou_8', 'std_ic_t2m_mou_8', 'std_ic_t2t_mou_8', 'roam_og_mou_8', 'isd_ic_mou_8', 'loc_og_t2t_mou_8'], dtype=object)
# Looking at all 8th month columns where rows of *_mou are null
condition = data[eighth_month_meaningful_missing_cols].isnull()
# Rows is null for all the above columns
missing_rows = pd.Series([True]*data.shape[0], index = data.index)
for column in eighth_month_meaningful_missing_cols :
missing_rows = missing_rows & data[column].isnull()
print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_8'].unique()[0])
print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_8'].unique()[0])
Total outgoing mou for each customer with missing *_mou data is 0.0 Total incoming mou for each customer with missing *_mou data is 0.0
# Imputation
data[eighth_month_meaningful_missing_cols] = data[eighth_month_meaningful_missing_cols].fillna(0)
metadata = metadata_matrix(data)
# Remaining Missing Values
metadata.iloc[metadata.index.isin(eighth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
last_date_of_month_8 | object | 29854 | 157 | 0.52 | 1 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 85 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1941 |
ic_others_8 | float64 | 30011 | 0 | 0.00 | 1259 |
std_ic_t2o_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 8033 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3493 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 6420 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 16165 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 3124 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.00 | 1730 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.00 | 18885 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 11781 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 10772 |
onnet_mou_8 | float64 | 30011 | 0 | 0.00 | 17604 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.00 | 4382 |
offnet_mou_8 | float64 | 30011 | 0 | 0.00 | 21513 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3655 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 13326 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 9671 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 15598 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 4705 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 18573 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 4486 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
og_others_8 | float64 | 30011 | 0 | 0.00 | 133 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1627 |
std_og_t2c_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_8 | float64 | 30011 | 0 | 0.00 | 16864 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.00 | 940 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.00 | 3238 |
# Looking at 'recharge' related 8th month columns for customers with missing 'date_of_last_rech_8'
condition = data['date_of_last_rech_8'].isnull()
data[condition].filter(regex='.*rech.*8$', axis=1).head()
total_rech_num_8 | total_rech_amt_8 | max_rech_amt_8 | date_of_last_rech_8 | |
---|---|---|---|---|
mobile_number | ||||
7000340381 | 0 | 0 | 0 | NaT |
7000608224 | 0 | 0 | 0 | NaT |
7000369789 | 0 | 0 | 0 | NaT |
7000248548 | 0 | 0 | 0 | NaT |
7001967063 | 0 | 0 | 0 | NaT |
data[condition].filter(regex='.*rech.*8$', axis=1).nunique()
total_rech_num_8 1 total_rech_amt_8 1 max_rech_amt_8 1 date_of_last_rech_8 0 dtype: int64
# Month : 9
ninth_month_columns = data.filter(regex="9$", axis=1).columns
metadata = metadata_matrix(data)
condition = metadata.index.isin(ninth_month_columns)
ninth_month_metadata = metadata[condition]
ninth_month_metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
std_og_t2c_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
spl_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 287 |
loc_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15585 |
og_others_9 | float64 | 28307 | 1704 | 5.68 | 132 |
loc_og_t2c_mou_9 | float64 | 28307 | 1704 | 5.68 | 1576 |
isd_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3329 |
loc_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 10360 |
spl_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 2966 |
loc_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 9407 |
loc_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 18207 |
roam_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 4004 |
std_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 7745 |
loc_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 15194 |
roam_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 3370 |
std_og_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 11141 |
offnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 20452 |
loc_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 4611 |
std_ic_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1971 |
isd_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 908 |
std_og_mou_9 | float64 | 28307 | 1704 | 5.68 | 15900 |
std_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 1595 |
ic_others_9 | float64 | 28307 | 1704 | 5.68 | 1284 |
std_ic_t2t_mou_9 | float64 | 28307 | 1704 | 5.68 | 4280 |
std_ic_t2o_mou_9 | category | 28307 | 1704 | 5.68 | 1 |
loc_og_t2f_mou_9 | float64 | 28307 | 1704 | 5.68 | 3111 |
std_og_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 12445 |
loc_ic_mou_9 | float64 | 28307 | 1704 | 5.68 | 18018 |
std_ic_t2m_mou_9 | float64 | 28307 | 1704 | 5.68 | 6168 |
onnet_mou_9 | float64 | 28307 | 1704 | 5.68 | 16674 |
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
# columns with meaningful missing in 9th month
ninth_month_meaningful_missing_condition = ninth_month_metadata['Null_Percentage'] == 5.68
ninth_month_meaningful_missing_cols = ninth_month_metadata[ninth_month_meaningful_missing_condition].index.values
ninth_month_meaningful_missing_cols
array(['std_og_t2c_mou_9', 'spl_ic_mou_9', 'loc_og_t2m_mou_9', 'og_others_9', 'loc_og_t2c_mou_9', 'isd_ic_mou_9', 'loc_og_t2t_mou_9', 'spl_og_mou_9', 'loc_ic_t2t_mou_9', 'loc_og_mou_9', 'roam_og_mou_9', 'std_ic_mou_9', 'loc_ic_t2m_mou_9', 'roam_ic_mou_9', 'std_og_t2t_mou_9', 'offnet_mou_9', 'loc_ic_t2f_mou_9', 'std_ic_t2f_mou_9', 'isd_og_mou_9', 'std_og_mou_9', 'std_og_t2f_mou_9', 'ic_others_9', 'std_ic_t2t_mou_9', 'std_ic_t2o_mou_9', 'loc_og_t2f_mou_9', 'std_og_t2m_mou_9', 'loc_ic_mou_9', 'std_ic_t2m_mou_9', 'onnet_mou_9'], dtype=object)
# Looking at all 9th month columns where rows of *_mou are null
condition = data[ninth_month_meaningful_missing_cols].isnull()
# Rows is null for all the above columns
missing_rows = pd.Series([True]*data.shape[0], index = data.index)
for column in ninth_month_meaningful_missing_cols :
missing_rows = missing_rows & data[column].isnull()
print('Total outgoing mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_og_mou_9'].unique()[0])
print('Total incoming mou for each customer with missing *_mou data is ', data.loc[missing_rows,'total_ic_mou_9'].unique()[0])
Total outgoing mou for each customer with missing *_mou data is 0.0 Total incoming mou for each customer with missing *_mou data is 0.0
# Imputation
data[ninth_month_meaningful_missing_cols] = data[ninth_month_meaningful_missing_cols].fillna(0)
metadata = metadata_matrix(data)
# Remaining Missing Values
metadata.iloc[metadata.index.isin(ninth_month_columns)]
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
last_date_of_month_9 | object | 29651 | 360 | 1.20 | 1 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 287 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 7745 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3329 |
ic_others_9 | float64 | 30011 | 0 | 0.00 | 1284 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 18018 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 4280 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 6168 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1971 |
std_ic_t2o_mou_9 | category | 30011 | 0 | 0.00 | 1 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 4611 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 10360 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15585 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 3111 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.00 | 1576 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.00 | 18207 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.00 | 4004 |
onnet_mou_9 | float64 | 30011 | 0 | 0.00 | 16674 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
offnet_mou_9 | float64 | 30011 | 0 | 0.00 | 20452 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3370 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 11141 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.00 | 2966 |
og_others_9 | float64 | 30011 | 0 | 0.00 | 132 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 9407 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15194 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.00 | 908 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 12445 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1595 |
std_og_t2c_mou_9 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_9 | float64 | 30011 | 0 | 0.00 | 15900 |
# Looking at 'recharge' related 9th month columns for customers with missing 'date_of_last_rech_9'
condition = data['date_of_last_rech_9'].isnull()
data[condition].filter(regex='.*rech.*9$', axis=1).head()
total_rech_num_9 | total_rech_amt_9 | max_rech_amt_9 | date_of_last_rech_9 | |
---|---|---|---|---|
mobile_number | ||||
7000340381 | 0 | 0 | 0 | NaT |
7000854899 | 0 | 0 | 0 | NaT |
7000369789 | 0 | 0 | 0 | NaT |
7001967063 | 0 | 0 | 0 | NaT |
7000066601 | 0 | 0 | 0 | NaT |
data[condition].filter(regex='.*rech.*9$', axis=1).nunique()
total_rech_num_9 1 total_rech_amt_9 1 max_rech_amt_9 1 date_of_last_rech_9 0 dtype: int64
# Imputing "last_date_of_month_*"
print('Missing Value Percentage in last_date_of_month columns : \n', 100*data.filter(regex='last_date_of_month_.*', axis=1).isnull().sum() / data.shape[0], '\n')
print('The unique values in last_date_of_month_6 : ' , data['last_date_of_month_6'].unique())
print('The unique values in last_date_of_month_7 : ' , data['last_date_of_month_7'].unique())
print('The unique values in last_date_of_month_8 : ' , data['last_date_of_month_8'].unique())
print('The unique values in last_date_of_month_9 : ' , data['last_date_of_month_9'].unique())
Missing Value Percentage in last_date_of_month columns : last_date_of_month_6 0.000000 last_date_of_month_7 0.103295 last_date_of_month_8 0.523142 last_date_of_month_9 1.199560 dtype: float64 The unique values in last_date_of_month_6 : ['6/30/2014'] The unique values in last_date_of_month_7 : ['7/31/2014' nan] The unique values in last_date_of_month_8 : ['8/31/2014' nan] The unique values in last_date_of_month_9 : ['9/30/2014' nan]
# Imputing last_date_of_month_* values
data['last_date_of_month_7'] = data['last_date_of_month_7'].fillna(data['last_date_of_month_7'].mode()[0])
data['last_date_of_month_8'] = data['last_date_of_month_8'].fillna(data['last_date_of_month_8'].mode()[0])
data['last_date_of_month_9'] = data['last_date_of_month_9'].fillna(data['last_date_of_month_9'].mode()[0])
data['last_date_of_month_7'].unique()
array(['7/31/2014'], dtype=object)
metadata = metadata_matrix(data)
metadata
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
date_of_last_rech_9 | datetime64[ns] | 29145 | 866 | 2.89 | 30 |
date_of_last_rech_8 | datetime64[ns] | 29417 | 594 | 1.98 | 31 |
loc_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_7 | datetime64[ns] | 29897 | 114 | 0.38 | 31 |
std_og_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
loc_ic_t2o_mou | category | 29897 | 114 | 0.38 | 1 |
date_of_last_rech_6 | datetime64[ns] | 29949 | 62 | 0.21 | 30 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 3429 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 20602 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 20711 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 19437 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 78 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 93 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 85 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 287 |
total_rech_num_6 | int64 | 30011 | 0 | 0.00 | 102 |
ic_others_9 | float64 | 30011 | 0 | 0.00 | 1284 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 8033 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3639 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3493 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3329 |
ic_others_6 | float64 | 30011 | 0 | 0.00 | 1227 |
ic_others_7 | float64 | 30011 | 0 | 0.00 | 1371 |
ic_others_8 | float64 | 30011 | 0 | 0.00 | 1259 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 7745 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 8543 |
total_rech_num_8 | int64 | 30011 | 0 | 0.00 | 96 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 6747 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 19133 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 19030 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 18573 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 18018 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 4608 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 4706 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 4486 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 4280 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 6680 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 6420 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 8391 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 6168 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 2033 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 2075 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1941 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1971 |
std_ic_t2o_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_ic_t2o_mou_9 | category | 30011 | 0 | 0.00 | 1 |
total_rech_num_7 | int64 | 30011 | 0 | 0.00 | 101 |
circle_id | category | 30011 | 0 | 0.00 | 1 |
total_rech_num_9 | int64 | 30011 | 0 | 0.00 | 96 |
monthly_3g_9 | category | 30011 | 0 | 0.00 | 11 |
monthly_2g_9 | category | 30011 | 0 | 0.00 | 5 |
sachet_2g_6 | category | 30011 | 0 | 0.00 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.00 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.00 | 29 |
monthly_3g_6 | category | 30011 | 0 | 0.00 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.00 | 15 |
monthly_3g_8 | category | 30011 | 0 | 0.00 | 12 |
sachet_3g_6 | category | 30011 | 0 | 0.00 | 25 |
monthly_2g_7 | category | 30011 | 0 | 0.00 | 6 |
sachet_3g_7 | category | 30011 | 0 | 0.00 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.00 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.00 | 27 |
aon | int64 | 30011 | 0 | 0.00 | 3321 |
vbc_3g_8 | float64 | 30011 | 0 | 0.00 | 7291 |
vbc_3g_7 | float64 | 30011 | 0 | 0.00 | 7318 |
vbc_3g_6 | float64 | 30011 | 0 | 0.00 | 6864 |
vbc_3g_9 | float64 | 30011 | 0 | 0.00 | 2171 |
monthly_2g_8 | category | 30011 | 0 | 0.00 | 6 |
monthly_2g_6 | category | 30011 | 0 | 0.00 | 5 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 2241 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.00 | 149 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 4611 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 2299 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.00 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.00 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.00 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.00 | 158 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.00 | 179 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.00 | 7016 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.00 | 170 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.00 | 7809 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.00 | 7813 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.00 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.00 | 6984 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.00 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.00 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.00 | 7151 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.00 | 2265 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 4897 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 4705 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.00 | 4431 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.00 | 4004 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 11151 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 11154 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 10772 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 10360 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16747 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16872 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 16165 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15585 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 3252 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 3267 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 3124 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 3111 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.00 | 1658 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.00 | 1750 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.00 | 1730 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.00 | 1576 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.00 | 19691 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.00 | 19880 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.00 | 4382 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.00 | 5174 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.00 | 18207 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.00 | 3370 |
last_date_of_month_6 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_7 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_8 | object | 30011 | 0 | 0.00 | 1 |
last_date_of_month_9 | object | 30011 | 0 | 0.00 | 1 |
arpu_6 | float64 | 30011 | 0 | 0.00 | 29261 |
arpu_7 | float64 | 30011 | 0 | 0.00 | 29260 |
arpu_8 | float64 | 30011 | 0 | 0.00 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.00 | 27327 |
onnet_mou_6 | float64 | 30011 | 0 | 0.00 | 18813 |
onnet_mou_7 | float64 | 30011 | 0 | 0.00 | 18938 |
onnet_mou_8 | float64 | 30011 | 0 | 0.00 | 17604 |
onnet_mou_9 | float64 | 30011 | 0 | 0.00 | 16674 |
offnet_mou_6 | float64 | 30011 | 0 | 0.00 | 22454 |
offnet_mou_7 | float64 | 30011 | 0 | 0.00 | 22650 |
offnet_mou_8 | float64 | 30011 | 0 | 0.00 | 21513 |
offnet_mou_9 | float64 | 30011 | 0 | 0.00 | 20452 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.00 | 4338 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.00 | 3649 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.00 | 3655 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.00 | 18885 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 12777 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 4817 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.00 | 908 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.00 | 3399 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.00 | 3238 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.00 | 2966 |
og_others_6 | float64 | 30011 | 0 | 0.00 | 862 |
og_others_7 | float64 | 30011 | 0 | 0.00 | 123 |
og_others_8 | float64 | 30011 | 0 | 0.00 | 133 |
og_others_9 | float64 | 30011 | 0 | 0.00 | 132 |
total_og_mou_6 | float64 | 30011 | 0 | 0.00 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.00 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.00 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.00 | 22615 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.00 | 9872 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 9961 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 9671 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 9407 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 16015 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 16068 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 15598 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 15194 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.00 | 3053 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.00 | 940 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.00 | 12983 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.00 | 1125 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.00 | 11781 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.00 | 11141 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.00 | 14518 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.00 | 14589 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.00 | 13326 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.00 | 12445 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.00 | 1773 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.00 | 1714 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.00 | 1627 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.00 | 1595 |
std_og_t2c_mou_6 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_7 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_8 | category | 30011 | 0 | 0.00 | 1 |
std_og_t2c_mou_9 | category | 30011 | 0 | 0.00 | 1 |
std_og_mou_6 | float64 | 30011 | 0 | 0.00 | 18325 |
std_og_mou_7 | float64 | 30011 | 0 | 0.00 | 18445 |
std_og_mou_8 | float64 | 30011 | 0 | 0.00 | 16864 |
std_og_mou_9 | float64 | 30011 | 0 | 0.00 | 15900 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.00 | 1113 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.00 | 3025 |
print(data[data['date_of_last_rech_6'].isnull()][['date_of_last_rech_6','total_rech_amt_6','total_rech_num_6']].nunique())
print(data[data['date_of_last_rech_7'].isnull()][['date_of_last_rech_7','total_rech_amt_7','total_rech_num_7']].nunique())
print(data[data['date_of_last_rech_8'].isnull()][['date_of_last_rech_8','total_rech_amt_8','total_rech_num_8']].nunique())
print(data[data['date_of_last_rech_9'].isnull()][['date_of_last_rech_9','total_rech_amt_9','total_rech_num_9']].nunique())
date_of_last_rech_6 0 total_rech_amt_6 1 total_rech_num_6 1 dtype: int64 date_of_last_rech_7 0 total_rech_amt_7 1 total_rech_num_7 1 dtype: int64 date_of_last_rech_8 0 total_rech_amt_8 1 total_rech_num_8 1 dtype: int64 date_of_last_rech_9 0 total_rech_amt_9 1 total_rech_num_9 1 dtype: int64
print("\n",data[data['date_of_last_rech_6'].isnull()][['total_rech_amt_6','total_rech_num_6']].head())
print("\n",data[data['date_of_last_rech_7'].isnull()][['total_rech_amt_7','total_rech_num_7']].head())
print("\n",data[data['date_of_last_rech_8'].isnull()][['total_rech_amt_8','total_rech_num_8']].head())
print("\n",data[data['date_of_last_rech_9'].isnull()][['total_rech_amt_9','total_rech_num_9']].head())
total_rech_amt_6 total_rech_num_6 mobile_number 7001588448 0 0 7001223277 0 0 7000721536 0 0 7001490351 0 0 7000665415 0 0 total_rech_amt_7 total_rech_num_7 mobile_number 7000369789 0 0 7001967148 0 0 7000066601 0 0 7001189556 0 0 7002024450 0 0 total_rech_amt_8 total_rech_num_8 mobile_number 7000340381 0 0 7000608224 0 0 7000369789 0 0 7000248548 0 0 7001967063 0 0 total_rech_amt_9 total_rech_num_9 mobile_number 7000340381 0 0 7000854899 0 0 7000369789 0 0 7001967063 0 0 7000066601 0 0
metadata=metadata_matrix(data)
singular_value_cols=metadata[metadata['Unique_Values_Count']==1].index.values
#data.loc[metadata_matrix(data)['Unique_Values_Count']==1].index
#Dropping singular value columns.
data.drop(columns=singular_value_cols,inplace=True)
# Dropping date columns
# since they are not usage related columns and can't be used for modelling
date_columns = data.filter(regex='^date.*').columns
data.drop(columns=date_columns, inplace=True)
metadata_matrix(data)
Datatype | Non_Null_Count | Null_Count | Null_Percentage | Unique_Values_Count | |
---|---|---|---|---|---|
arpu_6 | float64 | 30011 | 0 | 0.0 | 29261 |
total_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 20602 |
total_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 20096 |
total_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 19437 |
spl_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 78 |
spl_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 93 |
spl_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 85 |
spl_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 287 |
isd_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 3429 |
isd_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 3639 |
isd_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 3493 |
isd_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 3329 |
ic_others_6 | float64 | 30011 | 0 | 0.0 | 1227 |
ic_others_7 | float64 | 30011 | 0 | 0.0 | 1371 |
ic_others_8 | float64 | 30011 | 0 | 0.0 | 1259 |
ic_others_9 | float64 | 30011 | 0 | 0.0 | 1284 |
total_rech_num_6 | int64 | 30011 | 0 | 0.0 | 102 |
total_rech_num_7 | int64 | 30011 | 0 | 0.0 | 101 |
total_rech_num_8 | int64 | 30011 | 0 | 0.0 | 96 |
total_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 20711 |
std_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 7745 |
total_rech_amt_6 | int64 | 30011 | 0 | 0.0 | 2241 |
std_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 8033 |
loc_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 19030 |
loc_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 18573 |
loc_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 18018 |
std_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 4608 |
std_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 4706 |
std_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 4486 |
std_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 4280 |
std_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 6680 |
std_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 6747 |
std_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 6420 |
std_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 6168 |
std_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 2033 |
std_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 2075 |
std_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 1941 |
std_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 1971 |
std_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 8391 |
std_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 8543 |
total_rech_num_9 | int64 | 30011 | 0 | 0.0 | 96 |
total_rech_amt_7 | int64 | 30011 | 0 | 0.0 | 2265 |
arpu_7 | float64 | 30011 | 0 | 0.0 | 29260 |
monthly_2g_8 | category | 30011 | 0 | 0.0 | 6 |
sachet_2g_6 | category | 30011 | 0 | 0.0 | 30 |
sachet_2g_7 | category | 30011 | 0 | 0.0 | 34 |
sachet_2g_8 | category | 30011 | 0 | 0.0 | 34 |
sachet_2g_9 | category | 30011 | 0 | 0.0 | 29 |
monthly_3g_6 | category | 30011 | 0 | 0.0 | 12 |
monthly_3g_7 | category | 30011 | 0 | 0.0 | 15 |
monthly_3g_8 | category | 30011 | 0 | 0.0 | 12 |
monthly_3g_9 | category | 30011 | 0 | 0.0 | 11 |
sachet_3g_6 | category | 30011 | 0 | 0.0 | 25 |
sachet_3g_7 | category | 30011 | 0 | 0.0 | 27 |
sachet_3g_8 | category | 30011 | 0 | 0.0 | 29 |
sachet_3g_9 | category | 30011 | 0 | 0.0 | 27 |
aon | int64 | 30011 | 0 | 0.0 | 3321 |
vbc_3g_8 | float64 | 30011 | 0 | 0.0 | 7291 |
vbc_3g_7 | float64 | 30011 | 0 | 0.0 | 7318 |
vbc_3g_6 | float64 | 30011 | 0 | 0.0 | 6864 |
vbc_3g_9 | float64 | 30011 | 0 | 0.0 | 2171 |
monthly_2g_9 | category | 30011 | 0 | 0.0 | 5 |
monthly_2g_7 | category | 30011 | 0 | 0.0 | 6 |
total_rech_amt_8 | int64 | 30011 | 0 | 0.0 | 2299 |
monthly_2g_6 | category | 30011 | 0 | 0.0 | 5 |
total_rech_amt_9 | int64 | 30011 | 0 | 0.0 | 2248 |
max_rech_amt_6 | int64 | 30011 | 0 | 0.0 | 170 |
max_rech_amt_7 | int64 | 30011 | 0 | 0.0 | 151 |
max_rech_amt_8 | int64 | 30011 | 0 | 0.0 | 182 |
max_rech_amt_9 | int64 | 30011 | 0 | 0.0 | 186 |
last_day_rch_amt_6 | int64 | 30011 | 0 | 0.0 | 158 |
last_day_rch_amt_7 | int64 | 30011 | 0 | 0.0 | 149 |
last_day_rch_amt_8 | int64 | 30011 | 0 | 0.0 | 179 |
last_day_rch_amt_9 | int64 | 30011 | 0 | 0.0 | 170 |
vol_2g_mb_6 | float64 | 30011 | 0 | 0.0 | 7809 |
vol_2g_mb_7 | float64 | 30011 | 0 | 0.0 | 7813 |
vol_2g_mb_8 | float64 | 30011 | 0 | 0.0 | 7310 |
vol_2g_mb_9 | float64 | 30011 | 0 | 0.0 | 6984 |
vol_3g_mb_6 | float64 | 30011 | 0 | 0.0 | 7043 |
vol_3g_mb_7 | float64 | 30011 | 0 | 0.0 | 7440 |
vol_3g_mb_8 | float64 | 30011 | 0 | 0.0 | 7151 |
vol_3g_mb_9 | float64 | 30011 | 0 | 0.0 | 7016 |
loc_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 19133 |
loc_ic_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 4611 |
loc_ic_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 4705 |
loc_og_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 11154 |
loc_og_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 10360 |
loc_og_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 16747 |
loc_og_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 16872 |
loc_og_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 16165 |
loc_og_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 15585 |
loc_og_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 3252 |
loc_og_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 3267 |
loc_og_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 3124 |
loc_og_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 3111 |
loc_og_t2c_mou_6 | float64 | 30011 | 0 | 0.0 | 1658 |
loc_og_t2c_mou_7 | float64 | 30011 | 0 | 0.0 | 1750 |
loc_og_t2c_mou_8 | float64 | 30011 | 0 | 0.0 | 1730 |
loc_og_t2c_mou_9 | float64 | 30011 | 0 | 0.0 | 1576 |
loc_og_mou_6 | float64 | 30011 | 0 | 0.0 | 19691 |
loc_og_mou_7 | float64 | 30011 | 0 | 0.0 | 19880 |
loc_og_mou_8 | float64 | 30011 | 0 | 0.0 | 18885 |
loc_og_mou_9 | float64 | 30011 | 0 | 0.0 | 18207 |
loc_og_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 10772 |
loc_og_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 11151 |
loc_ic_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 4897 |
roam_og_mou_9 | float64 | 30011 | 0 | 0.0 | 4004 |
arpu_8 | float64 | 30011 | 0 | 0.0 | 28405 |
arpu_9 | float64 | 30011 | 0 | 0.0 | 27327 |
onnet_mou_6 | float64 | 30011 | 0 | 0.0 | 18813 |
onnet_mou_7 | float64 | 30011 | 0 | 0.0 | 18938 |
onnet_mou_8 | float64 | 30011 | 0 | 0.0 | 17604 |
onnet_mou_9 | float64 | 30011 | 0 | 0.0 | 16674 |
offnet_mou_6 | float64 | 30011 | 0 | 0.0 | 22454 |
offnet_mou_7 | float64 | 30011 | 0 | 0.0 | 22650 |
offnet_mou_8 | float64 | 30011 | 0 | 0.0 | 21513 |
offnet_mou_9 | float64 | 30011 | 0 | 0.0 | 20452 |
roam_ic_mou_6 | float64 | 30011 | 0 | 0.0 | 4338 |
roam_ic_mou_7 | float64 | 30011 | 0 | 0.0 | 3649 |
roam_ic_mou_8 | float64 | 30011 | 0 | 0.0 | 3655 |
roam_ic_mou_9 | float64 | 30011 | 0 | 0.0 | 3370 |
roam_og_mou_6 | float64 | 30011 | 0 | 0.0 | 5174 |
roam_og_mou_7 | float64 | 30011 | 0 | 0.0 | 4431 |
roam_og_mou_8 | float64 | 30011 | 0 | 0.0 | 4382 |
std_og_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 12777 |
std_og_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 12983 |
std_og_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 11781 |
std_og_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 11141 |
og_others_6 | float64 | 30011 | 0 | 0.0 | 862 |
og_others_7 | float64 | 30011 | 0 | 0.0 | 123 |
og_others_8 | float64 | 30011 | 0 | 0.0 | 133 |
og_others_9 | float64 | 30011 | 0 | 0.0 | 132 |
total_og_mou_6 | float64 | 30011 | 0 | 0.0 | 24607 |
total_og_mou_7 | float64 | 30011 | 0 | 0.0 | 24913 |
total_og_mou_8 | float64 | 30011 | 0 | 0.0 | 23644 |
total_og_mou_9 | float64 | 30011 | 0 | 0.0 | 22615 |
loc_ic_t2t_mou_6 | float64 | 30011 | 0 | 0.0 | 9872 |
loc_ic_t2t_mou_7 | float64 | 30011 | 0 | 0.0 | 9961 |
loc_ic_t2t_mou_8 | float64 | 30011 | 0 | 0.0 | 9671 |
loc_ic_t2t_mou_9 | float64 | 30011 | 0 | 0.0 | 9407 |
loc_ic_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 16015 |
loc_ic_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 16068 |
loc_ic_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 15598 |
loc_ic_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 15194 |
loc_ic_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 4817 |
spl_og_mou_9 | float64 | 30011 | 0 | 0.0 | 2966 |
spl_og_mou_8 | float64 | 30011 | 0 | 0.0 | 3238 |
spl_og_mou_7 | float64 | 30011 | 0 | 0.0 | 3399 |
std_og_t2f_mou_9 | float64 | 30011 | 0 | 0.0 | 1595 |
std_og_t2m_mou_6 | float64 | 30011 | 0 | 0.0 | 14518 |
std_og_t2m_mou_7 | float64 | 30011 | 0 | 0.0 | 14589 |
std_og_t2m_mou_8 | float64 | 30011 | 0 | 0.0 | 13326 |
std_og_t2m_mou_9 | float64 | 30011 | 0 | 0.0 | 12445 |
std_og_t2f_mou_6 | float64 | 30011 | 0 | 0.0 | 1773 |
std_og_t2f_mou_7 | float64 | 30011 | 0 | 0.0 | 1714 |
std_og_t2f_mou_8 | float64 | 30011 | 0 | 0.0 | 1627 |
std_og_mou_6 | float64 | 30011 | 0 | 0.0 | 18325 |
spl_og_mou_6 | float64 | 30011 | 0 | 0.0 | 3053 |
std_og_mou_7 | float64 | 30011 | 0 | 0.0 | 18445 |
std_og_mou_8 | float64 | 30011 | 0 | 0.0 | 16864 |
std_og_mou_9 | float64 | 30011 | 0 | 0.0 | 15900 |
isd_og_mou_6 | float64 | 30011 | 0 | 0.0 | 1113 |
isd_og_mou_7 | float64 | 30011 | 0 | 0.0 | 1125 |
isd_og_mou_8 | float64 | 30011 | 0 | 0.0 | 940 |
isd_og_mou_9 | float64 | 30011 | 0 | 0.0 | 908 |
Average_rech_amt_6n7 | float64 | 30011 | 0 | 0.0 | 3025 |
data['Churn'] = 0
churned_customers = data.query('total_og_mou_9 == 0 & total_ic_mou_9 == 0 & vol_2g_mb_9 == 0 & vol_3g_mb_9 == 0').index
data.loc[churned_customers,'Churn']=1
data['Churn'] = data['Churn'].astype('category')
# Churn proportions
data['Churn'].value_counts(normalize=True).to_frame()
Churn | |
---|---|
0 | 0.913598 |
1 | 0.086402 |
churn_phase_columns = data.filter(regex='9$').columns
data.drop(columns=churn_phase_columns, inplace=True)
print('Retained Columns')
data.columns.to_frame(index=False)
Retained Columns
0 | |
---|---|
0 | arpu_6 |
1 | arpu_7 |
2 | arpu_8 |
3 | onnet_mou_6 |
4 | onnet_mou_7 |
5 | onnet_mou_8 |
6 | offnet_mou_6 |
7 | offnet_mou_7 |
8 | offnet_mou_8 |
9 | roam_ic_mou_6 |
10 | roam_ic_mou_7 |
11 | roam_ic_mou_8 |
12 | roam_og_mou_6 |
13 | roam_og_mou_7 |
14 | roam_og_mou_8 |
15 | loc_og_t2t_mou_6 |
16 | loc_og_t2t_mou_7 |
17 | loc_og_t2t_mou_8 |
18 | loc_og_t2m_mou_6 |
19 | loc_og_t2m_mou_7 |
20 | loc_og_t2m_mou_8 |
21 | loc_og_t2f_mou_6 |
22 | loc_og_t2f_mou_7 |
23 | loc_og_t2f_mou_8 |
24 | loc_og_t2c_mou_6 |
25 | loc_og_t2c_mou_7 |
26 | loc_og_t2c_mou_8 |
27 | loc_og_mou_6 |
28 | loc_og_mou_7 |
29 | loc_og_mou_8 |
30 | std_og_t2t_mou_6 |
31 | std_og_t2t_mou_7 |
32 | std_og_t2t_mou_8 |
33 | std_og_t2m_mou_6 |
34 | std_og_t2m_mou_7 |
35 | std_og_t2m_mou_8 |
36 | std_og_t2f_mou_6 |
37 | std_og_t2f_mou_7 |
38 | std_og_t2f_mou_8 |
39 | std_og_mou_6 |
40 | std_og_mou_7 |
41 | std_og_mou_8 |
42 | isd_og_mou_6 |
43 | isd_og_mou_7 |
44 | isd_og_mou_8 |
45 | spl_og_mou_6 |
46 | spl_og_mou_7 |
47 | spl_og_mou_8 |
48 | og_others_6 |
49 | og_others_7 |
50 | og_others_8 |
51 | total_og_mou_6 |
52 | total_og_mou_7 |
53 | total_og_mou_8 |
54 | loc_ic_t2t_mou_6 |
55 | loc_ic_t2t_mou_7 |
56 | loc_ic_t2t_mou_8 |
57 | loc_ic_t2m_mou_6 |
58 | loc_ic_t2m_mou_7 |
59 | loc_ic_t2m_mou_8 |
60 | loc_ic_t2f_mou_6 |
61 | loc_ic_t2f_mou_7 |
62 | loc_ic_t2f_mou_8 |
63 | loc_ic_mou_6 |
64 | loc_ic_mou_7 |
65 | loc_ic_mou_8 |
66 | std_ic_t2t_mou_6 |
67 | std_ic_t2t_mou_7 |
68 | std_ic_t2t_mou_8 |
69 | std_ic_t2m_mou_6 |
70 | std_ic_t2m_mou_7 |
71 | std_ic_t2m_mou_8 |
72 | std_ic_t2f_mou_6 |
73 | std_ic_t2f_mou_7 |
74 | std_ic_t2f_mou_8 |
75 | std_ic_mou_6 |
76 | std_ic_mou_7 |
77 | std_ic_mou_8 |
78 | total_ic_mou_6 |
79 | total_ic_mou_7 |
80 | total_ic_mou_8 |
81 | spl_ic_mou_6 |
82 | spl_ic_mou_7 |
83 | spl_ic_mou_8 |
84 | isd_ic_mou_6 |
85 | isd_ic_mou_7 |
86 | isd_ic_mou_8 |
87 | ic_others_6 |
88 | ic_others_7 |
89 | ic_others_8 |
90 | total_rech_num_6 |
91 | total_rech_num_7 |
92 | total_rech_num_8 |
93 | total_rech_amt_6 |
94 | total_rech_amt_7 |
95 | total_rech_amt_8 |
96 | max_rech_amt_6 |
97 | max_rech_amt_7 |
98 | max_rech_amt_8 |
99 | last_day_rch_amt_6 |
100 | last_day_rch_amt_7 |
101 | last_day_rch_amt_8 |
102 | vol_2g_mb_6 |
103 | vol_2g_mb_7 |
104 | vol_2g_mb_8 |
105 | vol_3g_mb_6 |
106 | vol_3g_mb_7 |
107 | vol_3g_mb_8 |
108 | monthly_2g_6 |
109 | monthly_2g_7 |
110 | monthly_2g_8 |
111 | sachet_2g_6 |
112 | sachet_2g_7 |
113 | sachet_2g_8 |
114 | monthly_3g_6 |
115 | monthly_3g_7 |
116 | monthly_3g_8 |
117 | sachet_3g_6 |
118 | sachet_3g_7 |
119 | sachet_3g_8 |
120 | aon |
121 | vbc_3g_8 |
122 | vbc_3g_7 |
123 | vbc_3g_6 |
124 | Average_rech_amt_6n7 |
125 | Churn |
print('retained no of rows', data.shape[0])
print('retain no of columns', data.shape[1])
retained no of rows 30011 retain no of columns 126
data.describe()
arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | aon | vbc_3g_8 | vbc_3g_7 | vbc_3g_6 | Average_rech_amt_6n7 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.00000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 | 30011.000000 |
mean | 587.284404 | 589.135427 | 534.857433 | 296.034461 | 304.343206 | 267.600412 | 417.933372 | 423.924375 | 375.021691 | 17.412764 | 13.522114 | 13.25627 | 29.321648 | 22.036003 | 21.469272 | 94.680696 | 95.729729 | 87.139995 | 181.279583 | 181.271524 | 167.591199 | 6.97933 | 7.097268 | 6.494314 | 1.567160 | 1.862229 | 1.712739 | 282.948414 | 284.107492 | 261.233938 | 189.753131 | 199.877508 | 172.196408 | 203.097767 | 213.411914 | 179.568790 | 2.010766 | 2.034241 | 1.789728 | 394.865994 | 415.327988 | 353.558826 | 2.264425 | 2.207400 | 2.029314 | 5.916364 | 7.425487 | 6.885193 | 0.692507 | 0.047600 | 0.059131 | 686.697541 | 709.124730 | 623.774684 | 68.749054 | 70.311351 | 65.936968 | 159.613810 | 160.813032 | 153.628517 | 15.595629 | 16.510023 | 14.706512 | 243.968340 | 247.644401 | 234.281577 | 16.229350 | 16.893723 | 15.051559 | 32.015163 | 33.477150 | 30.434765 | 2.874506 | 2.992948 | 2.680925 | 51.122992 | 53.36786 | 48.170990 | 307.512073 | 314.875472 | 295.426531 | 0.066731 | 0.018066 | 0.027660 | 11.156530 | 12.360190 | 11.700835 | 1.188803 | 1.476889 | 1.237756 | 12.121322 | 11.913465 | 10.225317 | 697.365833 | 695.962880 | 613.638799 | 171.414048 | 175.661058 | 162.869348 | 104.485655 | 105.287128 | 95.653294 | 78.859009 | 78.171382 | 69.209105 | 258.392681 | 278.093737 | 269.864111 | 1264.064776 | 129.439626 | 135.127102 | 121.360548 | 696.664356 |
std | 442.722413 | 462.897814 | 492.259586 | 460.775592 | 481.780488 | 466.560947 | 470.588583 | 486.525332 | 477.489377 | 79.152657 | 76.303736 | 74.55207 | 118.570414 | 97.925249 | 106.244774 | 236.849265 | 248.132623 | 234.721938 | 250.132066 | 240.722132 | 234.862468 | 22.66552 | 22.588864 | 20.220028 | 6.889317 | 9.255645 | 7.397562 | 379.985249 | 375.837282 | 366.539171 | 409.716719 | 428.119476 | 410.033964 | 413.489240 | 437.941904 | 416.752834 | 12.457422 | 13.350441 | 11.700376 | 606.508681 | 637.446710 | 616.219690 | 45.918087 | 45.619381 | 44.794926 | 18.621373 | 23.065743 | 22.893414 | 2.281325 | 2.741786 | 3.320320 | 660.356820 | 685.071178 | 685.983313 | 158.647160 | 167.315954 | 155.702334 | 222.001036 | 219.432004 | 217.026349 | 45.827009 | 49.478371 | 43.714061 | 312.805586 | 315.468343 | 307.043800 | 78.862358 | 84.691403 | 72.433104 | 101.084965 | 105.806605 | 105.308898 | 19.928472 | 20.511317 | 20.269535 | 140.504104 | 149.17944 | 140.965196 | 361.159561 | 369.654489 | 360.343153 | 0.194273 | 0.181944 | 0.116574 | 67.258387 | 76.992293 | 74.928607 | 13.987003 | 15.406483 | 12.889879 | 9.543550 | 9.605532 | 9.478572 | 539.325984 | 562.143146 | 601.821630 | 174.703215 | 181.545389 | 172.605809 | 142.767207 | 141.148386 | 145.260363 | 277.445058 | 280.331857 | 268.494284 | 866.195376 | 855.682340 | 859.299266 | 975.263117 | 390.478591 | 408.024394 | 389.726031 | 488.782088 |
min | -2258.709000 | -2014.045000 | -945.808000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 180.000000 | 0.000000 | 0.000000 | 0.000000 | 368.500000 |
25% | 364.161000 | 365.004500 | 289.609500 | 41.110000 | 40.950000 | 27.010000 | 137.335000 | 135.680000 | 95.695000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 8.320000 | 9.130000 | 5.790000 | 30.290000 | 33.580000 | 22.420000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 51.010000 | 56.710000 | 38.270000 | 0.000000 | 0.000000 | 0.000000 | 1.600000 | 1.330000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 5.950000 | 5.555000 | 1.780000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 266.170000 | 275.045000 | 188.790000 | 8.290000 | 9.460000 | 6.810000 | 33.460000 | 38.130000 | 29.660000 | 0.000000 | 0.000000 | 0.000000 | 56.700000 | 63.535000 | 49.985000 | 0.000000 | 0.000000 | 0.000000 | 0.450000 | 0.480000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 2.630000 | 2.78000 | 1.430000 | 89.975000 | 98.820000 | 78.930000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 6.000000 | 6.000000 | 4.000000 | 432.000000 | 426.500000 | 309.000000 | 110.000000 | 110.000000 | 67.000000 | 30.000000 | 27.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 480.000000 | 0.000000 | 0.000000 | 0.000000 | 450.000000 |
50% | 495.682000 | 493.561000 | 452.091000 | 125.830000 | 125.460000 | 99.440000 | 282.190000 | 281.940000 | 240.940000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 32.590000 | 33.160000 | 28.640000 | 101.240000 | 104.340000 | 89.810000 | 0.33000 | 0.400000 | 0.160000 | 0.000000 | 0.000000 | 0.000000 | 166.310000 | 170.440000 | 148.280000 | 12.830000 | 13.350000 | 5.930000 | 37.730000 | 37.530000 | 23.660000 | 0.000000 | 0.000000 | 0.000000 | 126.010000 | 131.730000 | 72.890000 | 0.000000 | 0.000000 | 0.000000 | 0.210000 | 0.780000 | 0.490000 | 0.000000 | 0.000000 | 0.000000 | 510.230000 | 525.580000 | 435.330000 | 29.130000 | 30.130000 | 26.840000 | 93.940000 | 96.830000 | 89.810000 | 1.960000 | 2.210000 | 1.850000 | 151.060000 | 154.830000 | 142.840000 | 1.050000 | 1.200000 | 0.560000 | 7.080000 | 7.460000 | 5.710000 | 0.000000 | 0.000000 | 0.000000 | 15.030000 | 16.11000 | 12.560000 | 205.240000 | 211.190000 | 193.440000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 9.000000 | 9.000000 | 8.000000 | 584.000000 | 581.000000 | 520.000000 | 120.000000 | 128.000000 | 130.000000 | 110.000000 | 98.000000 | 50.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 914.000000 | 0.000000 | 0.000000 | 0.000000 | 568.500000 |
75% | 703.922000 | 700.788000 | 671.150000 | 353.310000 | 359.925000 | 297.735000 | 523.125000 | 532.695000 | 482.610000 | 0.000000 | 0.000000 | 0.00000 | 0.000000 | 0.000000 | 0.000000 | 91.460000 | 91.480000 | 84.670000 | 240.165000 | 239.485000 | 223.590000 | 5.09000 | 5.260000 | 4.680000 | 0.000000 | 0.100000 | 0.050000 | 374.475000 | 375.780000 | 348.310000 | 178.085000 | 191.380000 | 132.820000 | 211.210000 | 223.010000 | 164.725000 | 0.000000 | 0.000000 | 0.000000 | 573.090000 | 615.150000 | 481.030000 | 0.000000 | 0.000000 | 0.000000 | 5.160000 | 7.110000 | 6.380000 | 0.000000 | 0.000000 | 0.000000 | 899.505000 | 931.050000 | 833.100000 | 73.640000 | 74.680000 | 70.330000 | 202.830000 | 203.485000 | 196.975000 | 12.440000 | 13.035000 | 11.605000 | 315.500000 | 316.780000 | 302.110000 | 10.280000 | 10.980000 | 8.860000 | 27.540000 | 29.235000 | 25.330000 | 0.180000 | 0.260000 | 0.130000 | 47.540000 | 50.36000 | 43.410000 | 393.680000 | 396.820000 | 380.410000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.000000 | 0.060000 | 0.000000 | 0.060000 | 15.000000 | 15.000000 | 13.000000 | 837.000000 | 835.000000 | 790.000000 | 200.000000 | 200.000000 | 198.000000 | 120.000000 | 130.000000 | 130.000000 | 14.450000 | 14.960000 | 9.620000 | 0.000000 | 2.080000 | 0.000000 | 1924.000000 | 1.600000 | 1.990000 | 0.000000 | 795.500000 |
max | 27731.088000 | 35145.834000 | 33543.624000 | 7376.710000 | 8157.780000 | 10752.560000 | 8362.360000 | 9667.130000 | 14007.340000 | 2613.310000 | 3813.290000 | 4169.81000 | 3775.110000 | 2812.040000 | 5337.040000 | 6431.330000 | 7400.660000 | 10752.560000 | 4729.740000 | 4557.140000 | 4961.330000 | 1466.03000 | 1196.430000 | 928.490000 | 342.860000 | 569.710000 | 351.830000 | 10643.380000 | 7674.780000 | 11039.910000 | 7366.580000 | 8133.660000 | 8014.430000 | 8314.760000 | 9284.740000 | 13950.040000 | 628.560000 | 544.630000 | 516.910000 | 8432.990000 | 10936.730000 | 13980.060000 | 5900.660000 | 5490.280000 | 5681.540000 | 1023.210000 | 1265.790000 | 1390.880000 | 100.610000 | 370.130000 | 394.930000 | 10674.030000 | 11365.310000 | 14043.060000 | 6351.440000 | 5709.590000 | 4003.210000 | 4693.860000 | 4388.730000 | 5738.460000 | 1678.410000 | 1983.010000 | 1588.530000 | 6496.110000 | 6466.740000 | 5748.810000 | 5459.560000 | 5800.930000 | 4309.290000 | 4630.230000 | 3470.380000 | 5645.860000 | 1351.110000 | 1136.080000 | 1394.890000 | 5459.630000 | 6745.76000 | 5957.140000 | 6798.640000 | 7279.080000 | 5990.710000 | 19.760000 | 21.330000 | 6.230000 | 3965.690000 | 4747.910000 | 4100.380000 | 1344.140000 | 1495.940000 | 1209.860000 | 307.000000 | 138.000000 | 196.000000 | 35190.000000 | 40335.000000 | 45320.000000 | 4010.000000 | 4010.000000 | 4449.000000 | 4010.000000 | 4010.000000 | 4449.000000 | 10285.900000 | 7873.550000 | 11117.610000 | 45735.400000 | 28144.120000 | 30036.060000 | 4321.000000 | 12916.220000 | 9165.600000 | 11166.210000 | 37762.500000 |
categorical_columns = data.dtypes[data.dtypes == 'category'].index.values
print('Mode : ')
data[categorical_columns].mode().T
Mode :
0 | |
---|---|
monthly_2g_6 | 0 |
monthly_2g_7 | 0 |
monthly_2g_8 | 0 |
sachet_2g_6 | 0 |
sachet_2g_7 | 0 |
sachet_2g_8 | 0 |
monthly_3g_6 | 0 |
monthly_3g_7 | 0 |
monthly_3g_8 | 0 |
sachet_3g_6 | 0 |
sachet_3g_7 | 0 |
sachet_3g_8 | 0 |
Churn | 0 |
churned_customers = data[data['Churn'] == 1]
non_churned_customers = data[data['Churn'] == 0]
plt.figure(figsize=(12,8))
sns.violinplot(x='aon', y='Churn', data=data)
plt.title('Age on Network vs Churn')
plt.show()
# function for numerical variable univariate analysis
from tabulate import tabulate
def num_univariate_analysis(column_names,scale='linear') :
# boxplot for column vs target
fig = plt.figure(figsize=(16,8))
ax1 = fig.add_subplot(1,3,1)
sns.violinplot(x='Churn', y = column_names[0], data = data, ax=ax1)
title = ''.join(column_names[0]) +' vs Churn'
ax1.set(title=title)
if scale == 'log' :
plt.yscale('log')
ax1.set(ylabel= column_names[0] + '(Log Scale)')
ax2 = fig.add_subplot(1,3,2)
sns.violinplot(x='Churn', y = column_names[1], data = data, ax=ax2)
title = ''.join(column_names[1]) +' vs Churn'
ax2.set(title=title)
if scale == 'log' :
plt.yscale('log')
ax2.set(ylabel= column_names[1] + '(Log Scale)')
ax3 = fig.add_subplot(1,3,3)
sns.violinplot(x='Churn', y = column_names[2], data = data, ax=ax3)
title = ''.join(column_names[2]) +' vs Churn'
ax3.set(title=title)
if scale == 'log' :
plt.yscale('log')
ax3.set(ylabel= column_names[2] + '(Log Scale)')
# summary statistic
print('Customers who churned (Churn : 1)')
print(churned_customers[column_names].describe())
print('\nCustomers who did not churn (Churn : 0)')
print(non_churned_customers[column_names].describe(),'\n')
# function for categorical variable univariate analysis
!pip install sidetable
import sidetable
def cat_univariate_analysis(column_names,figsize=(16,4)) :
# column vs target count plot
fig = plt.figure(figsize=figsize)
ax1 = fig.add_subplot(1,3,1)
sns.countplot(x=column_names[0],hue='Churn',data=data, ax=ax1)
title = column_names[0] + ' vs No of Churned Customers'
ax1.set(title= title)
ax1.legend(loc='upper right')
ax2 = fig.add_subplot(1,3,2)
sns.countplot(x=column_names[1],hue='Churn',data=data, ax=ax2)
title = column_names[1] + ' vs No of Churned Customers'
ax2.set(title= title)
ax2.legend(loc='upper right')
ax3 = fig.add_subplot(1,3,3)
sns.countplot(x=column_names[2],hue='Churn',data=data, ax=ax3)
title = column_names[2] + ' vs No of Churned Customers'
ax3.set(title= title)
ax3.legend(loc='upper right')
# Percentages
print('Customers who churned (Churn : 1)')
print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')
print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')
print(tabulate(pd.DataFrame(churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')
print('\nCustomers who did not churn (Churn : 0)')
print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[0]])), headers='keys', tablefmt='psql'),'\n')
print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[1]])), headers='keys', tablefmt='psql'),'\n')
print(tabulate(pd.DataFrame(non_churned_customers.stb.freq([column_names[2]])), headers='keys', tablefmt='psql'),'\n')
Requirement already satisfied: sidetable in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (0.7.0) Requirement already satisfied: pandas>=1.0 in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (from sidetable) (1.0.5) Requirement already satisfied: pytz>=2017.2 in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (from pandas>=1.0->sidetable) (2020.1) Requirement already satisfied: numpy>=1.13.3 in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (from pandas>=1.0->sidetable) (1.18.5) Requirement already satisfied: python-dateutil>=2.6.1 in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (from pandas>=1.0->sidetable) (2.8.1) Requirement already satisfied: six>=1.5 in /Users/UMAER/Documents/DataScience/anaconda3/lib/python3.8/site-packages (from python-dateutil>=2.6.1->pandas>=1.0->sidetable) (1.15.0)
columns = ['arpu_6','arpu_7','arpu_8']
num_univariate_analysis(columns,'log')
Customers who churned (Churn : 1) arpu_6 arpu_7 arpu_8 count 2593.000000 2593.000000 2593.000000 mean 678.716970 550.511946 243.063343 std 551.792864 517.241221 378.843531 min -209.465000 -158.963000 -37.887000 25% 396.507000 289.641000 0.000000 50% 573.396000 464.674000 101.894000 75% 819.460000 691.588000 351.028000 max 11505.508000 13224.119000 5228.826000 Customers who did not churn (Churn : 0) arpu_6 arpu_7 arpu_8 count 27418.000000 27418.000000 27418.000000 mean 578.637360 592.788162 562.453248 std 429.988265 457.265996 492.802655 min -2258.709000 -2014.045000 -945.808000 25% 362.218000 369.610500 319.118500 50% 489.324000 496.182500 471.024000 75% 690.891750 701.418000 690.921000 max 27731.088000 35145.834000 33543.624000
columns = ['total_og_mou_6', 'total_og_mou_7', 'total_og_mou_8']
num_univariate_analysis(columns)
Customers who churned (Churn : 1) total_og_mou_6 total_og_mou_7 total_og_mou_8 count 2593.000000 2593.000000 2593.000000 mean 867.961342 677.868909 225.083741 std 852.697688 786.961399 471.672718 min 0.000000 0.000000 0.000000 25% 277.880000 110.090000 0.000000 50% 658.360000 466.910000 0.000000 75% 1209.040000 926.760000 255.810000 max 8488.360000 8285.640000 5206.210000 Customers who did not churn (Churn : 0) total_og_mou_6 total_og_mou_7 total_og_mou_8 count 27418.000000 27418.000000 27418.000000 mean 669.554896 712.080684 661.480046 std 636.531612 674.580516 691.079113 min 0.000000 0.000000 0.000000 25% 265.682500 284.500000 227.970000 50% 500.410000 529.935000 470.475000 75% 872.070000 931.197500 866.045000 max 10674.030000 11365.310000 14043.060000
columns = ['total_ic_mou_6', 'total_ic_mou_7', 'total_ic_mou_8']
num_univariate_analysis(columns)
Customers who churned (Churn : 1) total_ic_mou_6 total_ic_mou_7 total_ic_mou_8 count 2593.000000 2593.000000 2593.000000 mean 241.954404 193.341076 68.807042 std 360.836586 318.183813 154.450340 min 0.000000 0.000000 0.000000 25% 49.460000 27.890000 0.000000 50% 137.330000 99.980000 0.000000 75% 289.510000 235.740000 70.290000 max 6633.180000 5137.560000 1859.280000 Customers who did not churn (Churn : 0) total_ic_mou_6 total_ic_mou_7 total_ic_mou_8 count 27418.000000 27418.000000 27418.000000 mean 313.712052 326.369333 316.858595 std 360.580253 372.112086 366.818717 min 0.000000 0.000000 0.000000 25% 94.460000 107.802500 98.265000 50% 212.160000 222.290000 212.360000 75% 401.602500 410.182500 402.270000 max 6798.640000 7279.080000 5990.710000
columns = ['vol_2g_mb_6', 'vol_2g_mb_7', 'vol_2g_mb_8']
num_univariate_analysis(columns, 'log')
Customers who churned (Churn : 1) vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_8 count 2593.000000 2593.000000 2593.000000 mean 60.775588 49.054393 15.283185 std 243.084276 219.485813 120.975111 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 0.000000 0.000000 0.000000 max 4017.160000 3430.730000 3349.190000 Customers who did not churn (Churn : 0) vol_2g_mb_6 vol_2g_mb_7 vol_2g_mb_8 count 27418.000000 27418.000000 27418.000000 mean 80.569210 80.925060 74.309036 std 280.420463 285.265125 277.889339 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 16.937500 18.267500 14.245000 max 10285.900000 7873.550000 11117.610000
columns = ['vol_3g_mb_6', 'vol_3g_mb_7', 'vol_3g_mb_8', 'monthly_3g_6']
num_univariate_analysis(columns, 'log')
Customers who churned (Churn : 1) vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_8 count 2593.000000 2593.000000 2593.000000 mean 188.395461 157.714254 56.776880 std 715.327843 690.773561 446.532769 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 0.000000 0.000000 0.000000 max 9400.120000 15115.510000 13440.720000 Customers who did not churn (Churn : 0) vol_3g_mb_6 vol_3g_mb_7 vol_3g_mb_8 count 27418.000000 27418.000000 27418.000000 mean 265.012522 289.478375 290.016390 std 878.846885 868.808831 885.821105 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 0.000000 35.855000 27.120000 max 45735.400000 28144.120000 30036.060000
columns = ['monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8']
cat_univariate_analysis(columns)
Customers who churned (Churn : 1) +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_2g_6 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 2454 | 94.6394 | 2454 | 94.6394 | | 1 | 1 | 126 | 4.85924 | 2580 | 99.4987 | | 2 | 2 | 11 | 0.424219 | 2591 | 99.9229 | | 3 | 4 | 2 | 0.0771307 | 2593 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ +----+----------------+---------+-----------+--------------------+----------------------+ | | monthly_2g_7 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-----------+--------------------+----------------------| | 0 | 0 | 2477 | 95.5264 | 2477 | 95.5264 | | 1 | 1 | 104 | 4.0108 | 2581 | 99.5372 | | 2 | 2 | 12 | 0.462784 | 2593 | 100 | +----+----------------+---------+-----------+--------------------+----------------------+ +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_2g_8 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 2555 | 98.5345 | 2555 | 98.5345 | | 1 | 1 | 37 | 1.42692 | 2592 | 99.9614 | | 2 | 2 | 1 | 0.0385654 | 2593 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ Customers who did not churn (Churn : 0) +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_2g_6 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 24228 | 88.3653 | 24228 | 88.3653 | | 1 | 1 | 2825 | 10.3035 | 27053 | 98.6688 | | 2 | 2 | 334 | 1.21818 | 27387 | 99.8869 | | 3 | 3 | 26 | 0.0948282 | 27413 | 99.9818 | | 4 | 4 | 5 | 0.0182362 | 27418 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ +----+----------------+---------+-------------+--------------------+----------------------+ | | monthly_2g_7 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-------------+--------------------+----------------------| | 0 | 0 | 24079 | 87.8219 | 24079 | 87.8219 | | 1 | 1 | 2909 | 10.6098 | 26988 | 98.4317 | | 2 | 2 | 394 | 1.43701 | 27382 | 99.8687 | | 3 | 3 | 29 | 0.10577 | 27411 | 99.9745 | | 4 | 4 | 5 | 0.0182362 | 27416 | 99.9927 | | 5 | 5 | 2 | 0.00729448 | 27418 | 100 | +----+----------------+---------+-------------+--------------------+----------------------+ +----+----------------+---------+-------------+--------------------+----------------------+ | | monthly_2g_8 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-------------+--------------------+----------------------| | 0 | 0 | 24383 | 88.9306 | 24383 | 88.9306 | | 1 | 1 | 2724 | 9.93508 | 27107 | 98.8657 | | 2 | 2 | 282 | 1.02852 | 27389 | 99.8942 | | 3 | 3 | 22 | 0.0802393 | 27411 | 99.9745 | | 4 | 4 | 5 | 0.0182362 | 27416 | 99.9927 | | 5 | 5 | 2 | 0.00729448 | 27418 | 100 | +----+----------------+---------+-------------+--------------------+----------------------+
columns = ['monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8']
cat_univariate_analysis(columns)
Customers who churned (Churn : 1) +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_3g_6 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 2352 | 90.7057 | 2352 | 90.7057 | | 1 | 1 | 170 | 6.55611 | 2522 | 97.2619 | | 2 | 2 | 49 | 1.8897 | 2571 | 99.1516 | | 3 | 3 | 13 | 0.50135 | 2584 | 99.6529 | | 4 | 5 | 4 | 0.154261 | 2588 | 99.8072 | | 5 | 4 | 4 | 0.154261 | 2592 | 99.9614 | | 6 | 6 | 1 | 0.0385654 | 2593 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_3g_7 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 2399 | 92.5183 | 2399 | 92.5183 | | 1 | 1 | 136 | 5.24489 | 2535 | 97.7632 | | 2 | 2 | 48 | 1.85114 | 2583 | 99.6143 | | 3 | 3 | 9 | 0.347088 | 2592 | 99.9614 | | 4 | 5 | 1 | 0.0385654 | 2593 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ +----+----------------+---------+------------+--------------------+----------------------+ | | monthly_3g_8 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+------------+--------------------+----------------------| | 0 | 0 | 2524 | 97.339 | 2524 | 97.339 | | 1 | 1 | 56 | 2.15966 | 2580 | 99.4987 | | 2 | 2 | 8 | 0.308523 | 2588 | 99.8072 | | 3 | 3 | 4 | 0.154261 | 2592 | 99.9614 | | 4 | 4 | 1 | 0.0385654 | 2593 | 100 | +----+----------------+---------+------------+--------------------+----------------------+ Customers who did not churn (Churn : 0) +----+----------------+---------+-------------+--------------------+----------------------+ | | monthly_3g_6 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-------------+--------------------+----------------------| | 0 | 0 | 24080 | 87.8255 | 24080 | 87.8255 | | 1 | 1 | 2371 | 8.6476 | 26451 | 96.4731 | | 2 | 2 | 648 | 2.36341 | 27099 | 98.8365 | | 3 | 3 | 194 | 0.707564 | 27293 | 99.5441 | | 4 | 4 | 70 | 0.255307 | 27363 | 99.7994 | | 5 | 5 | 28 | 0.102123 | 27391 | 99.9015 | | 6 | 6 | 10 | 0.0364724 | 27401 | 99.938 | | 7 | 7 | 9 | 0.0328252 | 27410 | 99.9708 | | 8 | 8 | 3 | 0.0109417 | 27413 | 99.9818 | | 9 | 11 | 2 | 0.00729448 | 27415 | 99.9891 | | 10 | 9 | 2 | 0.00729448 | 27417 | 99.9964 | | 11 | 14 | 1 | 0.00364724 | 27418 | 100 | +----+----------------+---------+-------------+--------------------+----------------------+ +----+----------------+---------+-------------+--------------------+----------------------+ | | monthly_3g_7 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-------------+--------------------+----------------------| | 0 | 0 | 23962 | 87.3951 | 23962 | 87.3951 | | 1 | 1 | 2330 | 8.49807 | 26292 | 95.8932 | | 2 | 2 | 774 | 2.82296 | 27066 | 98.7162 | | 3 | 3 | 198 | 0.722153 | 27264 | 99.4383 | | 4 | 4 | 68 | 0.248012 | 27332 | 99.6863 | | 5 | 5 | 38 | 0.138595 | 27370 | 99.8249 | | 6 | 6 | 23 | 0.0838865 | 27393 | 99.9088 | | 7 | 7 | 10 | 0.0364724 | 27403 | 99.9453 | | 8 | 8 | 5 | 0.0182362 | 27408 | 99.9635 | | 9 | 9 | 4 | 0.014589 | 27412 | 99.9781 | | 10 | 11 | 2 | 0.00729448 | 27414 | 99.9854 | | 11 | 16 | 1 | 0.00364724 | 27415 | 99.9891 | | 12 | 14 | 1 | 0.00364724 | 27416 | 99.9927 | | 13 | 12 | 1 | 0.00364724 | 27417 | 99.9964 | | 14 | 10 | 1 | 0.00364724 | 27418 | 100 | +----+----------------+---------+-------------+--------------------+----------------------+ +----+----------------+---------+-------------+--------------------+----------------------+ | | monthly_3g_8 | count | percent | cumulative_count | cumulative_percent | |----+----------------+---------+-------------+--------------------+----------------------| | 0 | 0 | 24002 | 87.541 | 24002 | 87.541 | | 1 | 1 | 2347 | 8.56007 | 26349 | 96.1011 | | 2 | 2 | 728 | 2.65519 | 27077 | 98.7563 | | 3 | 3 | 193 | 0.703917 | 27270 | 99.4602 | | 4 | 4 | 86 | 0.313663 | 27356 | 99.7739 | | 5 | 5 | 30 | 0.109417 | 27386 | 99.8833 | | 6 | 6 | 14 | 0.0510613 | 27400 | 99.9343 | | 7 | 7 | 9 | 0.0328252 | 27409 | 99.9672 | | 8 | 9 | 3 | 0.0109417 | 27412 | 99.9781 | | 9 | 8 | 3 | 0.0109417 | 27415 | 99.9891 | | 10 | 10 | 2 | 0.00729448 | 27417 | 99.9964 | | 11 | 16 | 1 | 0.00364724 | 27418 | 100 | +----+----------------+---------+-------------+--------------------+----------------------+
columns = ['sachet_3g_6', 'sachet_3g_7','sachet_3g_8']
print(data[columns].dtypes)
cat_univariate_analysis(columns)
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-1-43696c0d750b> in <module> 1 columns = ['sachet_3g_6', 'sachet_3g_7','sachet_3g_8'] ----> 2 print(data[columns].dtypes) 3 cat_univariate_analysis(columns) NameError: name 'data' is not defined
columns = [ 'vbc_3g_6', 'vbc_3g_7','vbc_3g_8']
num_univariate_analysis(columns, 'log')
Customers who churned (Churn : 1) vbc_3g_6 vbc_3g_7 vbc_3g_8 count 2593.000000 2593.000000 2593.000000 mean 81.564601 71.143880 32.610659 std 320.898511 284.882601 197.998246 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 0.000000 0.000000 0.000000 max 6931.810000 4908.270000 5738.740000 Customers who did not churn (Churn : 0) vbc_3g_6 vbc_3g_7 vbc_3g_8 count 27418.000000 27418.000000 27418.000000 mean 125.124167 141.178182 138.597023 std 395.413666 417.292310 402.761779 min 0.000000 0.000000 0.000000 25% 0.000000 0.000000 0.000000 50% 0.000000 0.000000 0.000000 75% 0.000000 9.940000 17.675000 max 11166.210000 9165.600000 12916.220000
data.head()
arpu_6 | arpu_7 | arpu_8 | onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | total_og_mou_6 | total_og_mou_7 | total_og_mou_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | total_ic_mou_6 | total_ic_mou_7 | total_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | total_rech_amt_6 | total_rech_amt_7 | total_rech_amt_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | vol_2g_mb_6 | vol_2g_mb_7 | vol_2g_mb_8 | vol_3g_mb_6 | vol_3g_mb_7 | vol_3g_mb_8 | monthly_2g_6 | monthly_2g_7 | monthly_2g_8 | sachet_2g_6 | sachet_2g_7 | sachet_2g_8 | monthly_3g_6 | monthly_3g_7 | monthly_3g_8 | sachet_3g_6 | sachet_3g_7 | sachet_3g_8 | aon | vbc_3g_8 | vbc_3g_7 | vbc_3g_6 | Average_rech_amt_6n7 | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 1069.180 | 1349.850 | 3171.480 | 57.84 | 54.68 | 52.29 | 453.43 | 567.16 | 325.91 | 16.23 | 33.49 | 31.64 | 23.74 | 12.59 | 38.06 | 51.39 | 31.38 | 40.28 | 308.63 | 447.38 | 162.28 | 62.13 | 55.14 | 53.23 | 0.0 | 0.0 | 0.00 | 422.16 | 533.91 | 255.79 | 4.30 | 23.29 | 12.01 | 49.89 | 31.76 | 49.14 | 6.66 | 20.08 | 16.68 | 60.86 | 75.14 | 77.84 | 0.0 | 0.18 | 10.01 | 4.50 | 0.00 | 6.50 | 0.00 | 0.0 | 0.0 | 487.53 | 609.24 | 350.16 | 58.14 | 32.26 | 27.31 | 217.56 | 221.49 | 121.19 | 152.16 | 101.46 | 39.53 | 427.88 | 355.23 | 188.04 | 36.89 | 11.83 | 30.39 | 91.44 | 126.99 | 141.33 | 52.19 | 34.24 | 22.21 | 180.54 | 173.08 | 193.94 | 626.46 | 558.04 | 428.74 | 0.21 | 0.0 | 0.0 | 2.06 | 14.53 | 31.59 | 15.74 | 15.19 | 15.14 | 5 | 5 | 7 | 1580 | 790 | 3638 | 1580 | 790 | 1580 | 0 | 0 | 779 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 802 | 57.74 | 19.38 | 18.74 | 1185.0 | 1 |
7001524846 | 378.721 | 492.223 | 137.362 | 413.69 | 351.03 | 35.08 | 94.66 | 80.63 | 136.48 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 297.13 | 217.59 | 12.49 | 80.96 | 70.58 | 50.54 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 7.15 | 378.09 | 288.18 | 63.04 | 116.56 | 133.43 | 22.58 | 13.69 | 10.04 | 75.69 | 0.00 | 0.00 | 0.00 | 130.26 | 143.48 | 98.28 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 10.23 | 0.00 | 0.0 | 0.0 | 508.36 | 431.66 | 171.56 | 23.84 | 9.84 | 0.31 | 57.58 | 13.98 | 15.48 | 0.00 | 0.00 | 0.00 | 81.43 | 23.83 | 15.79 | 0.00 | 0.58 | 0.10 | 22.43 | 4.08 | 0.65 | 0.00 | 0.00 | 0.00 | 22.43 | 4.66 | 0.75 | 103.86 | 28.49 | 16.54 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19 | 21 | 14 | 437 | 601 | 120 | 90 | 154 | 30 | 50 | 0 | 10 | 0.0 | 356.0 | 0.03 | 0.0 | 750.95 | 11.94 | 0 | 1 | 0 | 0 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 315 | 21.03 | 910.65 | 122.16 | 519.0 | 0 |
7002191713 | 492.846 | 205.671 | 593.260 | 501.76 | 108.39 | 534.24 | 413.31 | 119.28 | 482.46 | 23.53 | 144.24 | 72.11 | 7.98 | 35.26 | 1.44 | 49.63 | 6.19 | 36.01 | 151.13 | 47.28 | 294.46 | 4.54 | 0.00 | 23.51 | 0.0 | 0.0 | 0.49 | 205.31 | 53.48 | 353.99 | 446.41 | 85.98 | 498.23 | 255.36 | 52.94 | 156.94 | 0.00 | 0.00 | 0.00 | 701.78 | 138.93 | 655.18 | 0.0 | 0.00 | 1.29 | 0.00 | 0.00 | 4.78 | 0.00 | 0.0 | 0.0 | 907.09 | 192.41 | 1015.26 | 67.88 | 7.58 | 52.58 | 142.88 | 18.53 | 195.18 | 4.81 | 0.00 | 7.49 | 215.58 | 26.11 | 255.26 | 115.68 | 38.29 | 154.58 | 308.13 | 29.79 | 317.91 | 0.00 | 0.00 | 1.91 | 423.81 | 68.09 | 474.41 | 968.61 | 172.58 | 1144.53 | 0.45 | 0.0 | 0.0 | 245.28 | 62.11 | 393.39 | 83.48 | 16.24 | 21.44 | 6 | 4 | 11 | 507 | 253 | 717 | 110 | 110 | 130 | 110 | 50 | 0 | 0.0 | 0.0 | 0.02 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 2607 | 0.00 | 0.00 | 0.00 | 380.0 | 0 |
7000875565 | 430.975 | 299.869 | 187.894 | 50.51 | 74.01 | 70.61 | 296.29 | 229.74 | 162.76 | 0.00 | 2.83 | 0.00 | 0.00 | 17.74 | 0.00 | 42.61 | 65.16 | 67.38 | 273.29 | 145.99 | 128.28 | 0.00 | 4.48 | 10.26 | 0.0 | 0.0 | 0.00 | 315.91 | 215.64 | 205.93 | 7.89 | 2.58 | 3.23 | 22.99 | 64.51 | 18.29 | 0.00 | 0.00 | 0.00 | 30.89 | 67.09 | 21.53 | 0.0 | 0.00 | 0.00 | 0.00 | 3.26 | 5.91 | 0.00 | 0.0 | 0.0 | 346.81 | 286.01 | 233.38 | 41.33 | 71.44 | 28.89 | 226.81 | 149.69 | 150.16 | 8.71 | 8.68 | 32.71 | 276.86 | 229.83 | 211.78 | 68.79 | 78.64 | 6.33 | 18.68 | 73.08 | 73.93 | 0.51 | 0.00 | 2.18 | 87.99 | 151.73 | 82.44 | 364.86 | 381.56 | 294.46 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.23 | 0.00 | 0.00 | 0.00 | 10 | 6 | 2 | 570 | 348 | 160 | 110 | 110 | 130 | 100 | 100 | 130 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 511 | 0.00 | 2.45 | 21.89 | 459.0 | 0 |
7000187447 | 690.008 | 18.980 | 25.499 | 1185.91 | 9.28 | 7.79 | 61.64 | 0.00 | 5.54 | 0.00 | 4.76 | 4.81 | 0.00 | 8.46 | 13.34 | 38.99 | 0.00 | 0.00 | 58.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 97.54 | 0.00 | 0.00 | 1146.91 | 0.81 | 0.00 | 1.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1148.46 | 0.81 | 0.00 | 0.0 | 0.00 | 0.00 | 2.58 | 0.00 | 0.00 | 0.93 | 0.0 | 0.0 | 1249.53 | 0.81 | 0.00 | 34.54 | 0.00 | 0.00 | 47.41 | 2.31 | 0.00 | 0.00 | 0.00 | 0.00 | 81.96 | 2.31 | 0.00 | 8.63 | 0.00 | 0.00 | 1.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.91 | 0.00 | 0.00 | 91.88 | 2.31 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19 | 2 | 4 | 816 | 0 | 30 | 110 | 0 | 30 | 30 | 0 | 0 | 0.0 | 0.0 | 0.00 | 0.0 | 0.00 | 0.00 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 667 | 0.00 | 0.00 | 0.00 | 408.0 | 0 |
sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
<matplotlib.axes._subplots.AxesSubplot at 0x7ffc15cc7190>
sns.scatterplot(x=data['total_og_mou_6'],y=data['total_og_mou_8'],hue=data['Churn'])
<matplotlib.axes._subplots.AxesSubplot at 0x7ffc1a6f59a0>
sns.scatterplot(x=data['aon'],y=data['total_og_mou_8'],hue=data['Churn'])
<matplotlib.axes._subplots.AxesSubplot at 0x7ffc128bd790>
sns.scatterplot(x=data['aon'],y=data['total_ic_mou_8'],hue=data['Churn'])
<matplotlib.axes._subplots.AxesSubplot at 0x7ffc197fbdc0>
sns.scatterplot(x=data['max_rech_amt_6'],y=data['max_rech_amt_8'],hue=data['Churn'])
<matplotlib.axes._subplots.AxesSubplot at 0x7ffc1b2ad970>
# function to correlate variables
def correlation(dataframe) :
columnsForAnalysis = set(dataframe.columns.values) - {'Churn'}
cor0=dataframe[columnsForAnalysis].corr()
type(cor0)
cor0.where(np.triu(np.ones(cor0.shape),k=1).astype(np.bool))
cor0=cor0.unstack().reset_index()
cor0.columns=['VAR1','VAR2','CORR']
cor0.dropna(subset=['CORR'], inplace=True)
cor0.CORR=round(cor0['CORR'],2)
cor0.CORR=cor0.CORR.abs()
cor0.sort_values(by=['CORR'],ascending=False)
cor0=cor0[~(cor0['VAR1']==cor0['VAR2'])]
# removing duplicate correlations
cor0['pair'] = cor0[['VAR1', 'VAR2']].apply(lambda x: '{}-{}'.format(*sorted((x[0], x[1]))), axis=1)
cor0 = cor0.drop_duplicates(subset=['pair'], keep='first')
cor0 = cor0[['VAR1', 'VAR2','CORR']]
return pd.DataFrame(cor0.sort_values(by=['CORR'],ascending=False))
# Correlations for Churn : 0 - non churn customers
# Absolute values are reported
pd.set_option('precision', 2)
cor_0 = correlation(non_churned_customers)
# filtering for correlations >= 40%
condition = cor_0['CORR'] > 0.4
cor_0 = cor_0[condition]
cor_0.style.background_gradient(cmap='GnBu').hide_index()
VAR1 | VAR2 | CORR |
---|---|---|
isd_og_mou_8 | isd_og_mou_7 | 0.96 |
isd_og_mou_8 | isd_og_mou_6 | 0.95 |
arpu_8 | total_rech_amt_8 | 0.95 |
isd_og_mou_7 | isd_og_mou_6 | 0.95 |
arpu_6 | total_rech_amt_6 | 0.94 |
total_rech_amt_7 | arpu_7 | 0.94 |
Average_rech_amt_6n7 | arpu_7 | 0.91 |
total_rech_amt_7 | Average_rech_amt_6n7 | 0.91 |
total_ic_mou_6 | loc_ic_mou_6 | 0.90 |
Average_rech_amt_6n7 | total_rech_amt_6 | 0.90 |
arpu_6 | Average_rech_amt_6n7 | 0.89 |
loc_ic_mou_8 | total_ic_mou_8 | 0.89 |
loc_ic_mou_7 | total_ic_mou_7 | 0.88 |
loc_ic_mou_7 | loc_ic_mou_8 | 0.85 |
std_og_t2t_mou_8 | onnet_mou_8 | 0.85 |
loc_ic_mou_8 | loc_ic_t2m_mou_8 | 0.85 |
loc_ic_t2m_mou_6 | loc_ic_mou_6 | 0.85 |
std_og_t2m_mou_8 | offnet_mou_8 | 0.85 |
std_og_t2t_mou_7 | onnet_mou_7 | 0.84 |
std_og_mou_8 | total_og_mou_8 | 0.84 |
loc_og_mou_8 | loc_og_mou_7 | 0.84 |
std_ic_mou_8 | std_ic_t2m_mou_8 | 0.84 |
std_og_t2t_mou_6 | onnet_mou_6 | 0.84 |
offnet_mou_7 | std_og_t2m_mou_7 | 0.84 |
total_og_mou_7 | std_og_mou_7 | 0.83 |
loc_ic_mou_7 | loc_ic_mou_6 | 0.83 |
total_ic_mou_7 | total_ic_mou_8 | 0.83 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_7 | 0.83 |
loc_ic_mou_7 | loc_ic_t2m_mou_7 | 0.83 |
loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | 0.82 |
loc_og_t2f_mou_8 | loc_og_t2f_mou_7 | 0.82 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_7 | 0.82 |
onnet_mou_7 | onnet_mou_8 | 0.82 |
std_ic_t2m_mou_6 | std_ic_mou_6 | 0.82 |
std_og_t2t_mou_7 | std_og_t2t_mou_8 | 0.82 |
std_ic_t2m_mou_7 | std_ic_mou_7 | 0.81 |
loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | 0.81 |
std_og_mou_8 | std_og_mou_7 | 0.81 |
offnet_mou_6 | std_og_t2m_mou_6 | 0.81 |
total_ic_mou_6 | total_ic_mou_7 | 0.81 |
loc_ic_t2t_mou_8 | loc_ic_t2t_mou_7 | 0.81 |
total_og_mou_6 | std_og_mou_6 | 0.80 |
loc_og_mou_6 | loc_og_mou_7 | 0.80 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | 0.80 |
loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | 0.80 |
std_og_t2m_mou_8 | std_og_t2m_mou_7 | 0.79 |
loc_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.79 |
loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | 0.79 |
loc_og_mou_6 | loc_og_t2m_mou_6 | 0.79 |
total_rech_num_8 | total_rech_num_7 | 0.78 |
loc_og_t2m_mou_7 | loc_og_t2m_mou_6 | 0.78 |
arpu_8 | Average_rech_amt_6n7 | 0.78 |
offnet_mou_7 | offnet_mou_8 | 0.78 |
loc_og_t2t_mou_8 | loc_og_mou_8 | 0.77 |
arpu_7 | total_rech_amt_8 | 0.77 |
arpu_8 | arpu_7 | 0.77 |
arpu_8 | total_rech_amt_7 | 0.77 |
loc_og_t2m_mou_8 | loc_og_mou_8 | 0.77 |
total_og_mou_7 | total_og_mou_8 | 0.77 |
std_og_t2f_mou_7 | std_og_t2f_mou_8 | 0.77 |
loc_og_mou_7 | loc_og_t2t_mou_7 | 0.77 |
total_ic_mou_8 | loc_ic_t2m_mou_8 | 0.76 |
std_ic_mou_8 | std_ic_mou_7 | 0.76 |
loc_ic_t2m_mou_6 | total_ic_mou_6 | 0.76 |
isd_ic_mou_7 | isd_ic_mou_6 | 0.75 |
isd_ic_mou_8 | isd_ic_mou_7 | 0.75 |
loc_og_mou_6 | loc_og_t2t_mou_6 | 0.75 |
std_ic_mou_6 | std_ic_mou_7 | 0.75 |
loc_ic_mou_7 | total_ic_mou_8 | 0.75 |
loc_og_t2m_mou_7 | loc_og_mou_7 | 0.75 |
loc_ic_mou_8 | loc_ic_mou_6 | 0.75 |
total_ic_mou_7 | loc_ic_mou_8 | 0.75 |
Average_rech_amt_6n7 | total_rech_amt_8 | 0.75 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | 0.75 |
vol_3g_mb_8 | vol_3g_mb_7 | 0.75 |
std_og_t2m_mou_6 | std_og_t2m_mou_7 | 0.75 |
std_og_mou_8 | std_og_t2m_mou_8 | 0.75 |
std_og_t2m_mou_6 | std_og_mou_6 | 0.74 |
std_og_mou_8 | std_og_t2t_mou_8 | 0.74 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_6 | 0.74 |
std_og_t2m_mou_7 | std_og_mou_7 | 0.74 |
loc_ic_mou_7 | total_ic_mou_6 | 0.74 |
loc_ic_t2m_mou_7 | total_ic_mou_7 | 0.74 |
std_og_mou_6 | std_og_t2t_mou_6 | 0.74 |
std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | 0.74 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_7 | 0.73 |
loc_og_t2f_mou_8 | loc_og_t2f_mou_6 | 0.73 |
std_og_t2t_mou_7 | std_og_t2t_mou_6 | 0.73 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | 0.73 |
onnet_mou_7 | onnet_mou_6 | 0.73 |
total_rech_amt_7 | total_rech_amt_8 | 0.73 |
loc_og_mou_6 | loc_og_mou_8 | 0.73 |
total_ic_mou_6 | total_ic_mou_8 | 0.73 |
std_og_t2t_mou_7 | std_og_mou_7 | 0.73 |
std_og_mou_6 | std_og_mou_7 | 0.73 |
total_ic_mou_7 | loc_ic_mou_6 | 0.72 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | 0.72 |
offnet_mou_6 | offnet_mou_7 | 0.72 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_8 | 0.72 |
loc_ic_t2m_mou_7 | loc_ic_mou_8 | 0.72 |
total_og_mou_8 | offnet_mou_8 | 0.72 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_6 | 0.72 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_6 | 0.72 |
total_og_mou_8 | onnet_mou_8 | 0.71 |
vbc_3g_8 | vbc_3g_7 | 0.71 |
vol_3g_mb_6 | vol_3g_mb_7 | 0.71 |
std_og_t2f_mou_6 | std_og_t2f_mou_7 | 0.71 |
total_og_mou_7 | onnet_mou_7 | 0.71 |
ic_others_8 | ic_others_7 | 0.71 |
total_og_mou_6 | offnet_mou_6 | 0.70 |
total_rech_amt_6 | arpu_7 | 0.70 |
total_og_mou_7 | offnet_mou_7 | 0.70 |
std_ic_t2t_mou_7 | std_ic_mou_7 | 0.70 |
arpu_6 | arpu_7 | 0.70 |
total_og_mou_6 | onnet_mou_6 | 0.70 |
loc_ic_t2t_mou_6 | loc_ic_t2t_mou_8 | 0.70 |
loc_ic_mou_7 | loc_ic_t2m_mou_8 | 0.70 |
last_day_rch_amt_8 | max_rech_amt_8 | 0.69 |
std_og_t2t_mou_7 | onnet_mou_8 | 0.69 |
vbc_3g_7 | vbc_3g_6 | 0.69 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_6 | 0.69 |
loc_ic_t2m_mou_7 | loc_ic_mou_6 | 0.69 |
total_rech_num_6 | total_rech_num_7 | 0.69 |
vol_2g_mb_7 | vol_2g_mb_8 | 0.69 |
std_og_t2t_mou_8 | onnet_mou_7 | 0.69 |
loc_ic_mou_7 | loc_ic_t2m_mou_6 | 0.68 |
arpu_6 | total_rech_amt_7 | 0.68 |
loc_ic_mou_7 | loc_ic_t2t_mou_7 | 0.68 |
total_ic_mou_6 | loc_ic_mou_8 | 0.68 |
std_ic_t2f_mou_8 | std_ic_t2f_mou_6 | 0.67 |
ic_others_7 | ic_others_6 | 0.67 |
loc_ic_t2t_mou_6 | loc_ic_mou_6 | 0.67 |
loc_ic_t2t_mou_8 | loc_ic_mou_8 | 0.67 |
vol_2g_mb_7 | vol_2g_mb_6 | 0.67 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_8 | 0.67 |
total_og_mou_6 | total_og_mou_7 | 0.67 |
vol_3g_mb_8 | vol_3g_mb_6 | 0.67 |
std_ic_t2t_mou_6 | std_ic_mou_6 | 0.67 |
total_ic_mou_8 | loc_ic_mou_6 | 0.66 |
std_ic_t2t_mou_8 | std_ic_mou_8 | 0.66 |
total_og_mou_7 | std_og_mou_8 | 0.66 |
std_og_t2m_mou_8 | offnet_mou_7 | 0.66 |
std_ic_mou_8 | std_ic_mou_6 | 0.66 |
vbc_3g_7 | vol_3g_mb_7 | 0.65 |
max_rech_amt_6 | last_day_rch_amt_6 | 0.65 |
std_og_t2m_mou_7 | offnet_mou_8 | 0.65 |
std_og_t2f_mou_6 | std_og_t2f_mou_8 | 0.65 |
loc_og_t2t_mou_8 | loc_og_mou_7 | 0.65 |
arpu_6 | total_rech_amt_8 | 0.64 |
arpu_6 | arpu_8 | 0.64 |
total_og_mou_8 | std_og_mou_7 | 0.64 |
std_og_t2m_mou_8 | total_og_mou_8 | 0.64 |
loc_ic_t2m_mou_6 | loc_ic_mou_8 | 0.64 |
roam_og_mou_6 | roam_ic_mou_6 | 0.64 |
std_ic_t2m_mou_7 | std_ic_mou_8 | 0.64 |
loc_og_mou_8 | loc_og_t2t_mou_7 | 0.64 |
loc_ic_t2m_mou_7 | total_ic_mou_8 | 0.64 |
total_rech_amt_7 | total_rech_amt_6 | 0.64 |
total_ic_mou_7 | loc_ic_t2m_mou_8 | 0.63 |
vol_3g_mb_6 | vbc_3g_6 | 0.63 |
vbc_3g_8 | vol_3g_mb_8 | 0.63 |
total_ic_mou_6 | loc_ic_t2m_mou_7 | 0.63 |
roam_ic_mou_7 | roam_og_mou_7 | 0.63 |
onnet_mou_8 | onnet_mou_6 | 0.63 |
loc_og_t2m_mou_8 | loc_og_mou_7 | 0.63 |
arpu_8 | total_rech_amt_6 | 0.63 |
std_og_t2t_mou_8 | std_og_t2t_mou_6 | 0.63 |
loc_ic_t2m_mou_8 | loc_ic_mou_6 | 0.63 |
loc_og_t2t_mou_6 | loc_og_mou_7 | 0.63 |
total_og_mou_7 | std_og_t2m_mou_7 | 0.63 |
ic_others_8 | ic_others_6 | 0.63 |
std_ic_t2m_mou_7 | std_ic_mou_6 | 0.63 |
loc_og_mou_8 | loc_og_t2m_mou_7 | 0.63 |
std_ic_t2m_mou_6 | std_ic_t2m_mou_8 | 0.63 |
total_rech_amt_6 | total_rech_amt_8 | 0.63 |
isd_ic_mou_8 | isd_ic_mou_6 | 0.62 |
std_og_t2t_mou_8 | std_og_mou_7 | 0.62 |
total_og_mou_8 | std_og_t2t_mou_8 | 0.61 |
loc_og_t2m_mou_6 | loc_og_mou_7 | 0.61 |
onnet_mou_7 | std_og_t2t_mou_6 | 0.61 |
vbc_3g_8 | vbc_3g_6 | 0.61 |
loc_og_mou_6 | loc_og_t2m_mou_7 | 0.61 |
std_og_mou_8 | onnet_mou_8 | 0.61 |
std_ic_t2m_mou_8 | std_ic_mou_7 | 0.61 |
last_day_rch_amt_7 | max_rech_amt_7 | 0.61 |
std_og_t2m_mou_6 | offnet_mou_7 | 0.61 |
std_og_mou_8 | std_og_mou_6 | 0.61 |
max_rech_amt_6 | max_rech_amt_8 | 0.60 |
std_og_mou_8 | std_og_t2m_mou_7 | 0.60 |
total_rech_num_8 | total_rech_num_6 | 0.60 |
total_ic_mou_7 | loc_ic_t2t_mou_7 | 0.60 |
loc_ic_t2m_mou_6 | total_ic_mou_7 | 0.60 |
roam_og_mou_8 | roam_ic_mou_8 | 0.60 |
std_og_t2t_mou_7 | onnet_mou_6 | 0.60 |
total_og_mou_7 | std_og_t2t_mou_7 | 0.60 |
std_og_mou_8 | std_og_t2t_mou_7 | 0.60 |
loc_og_mou_6 | loc_og_t2t_mou_7 | 0.60 |
total_og_mou_6 | std_og_t2m_mou_6 | 0.60 |
std_og_mou_8 | offnet_mou_8 | 0.60 |
std_og_t2m_mou_8 | std_og_t2m_mou_6 | 0.60 |
std_og_mou_6 | onnet_mou_6 | 0.59 |
loc_ic_t2t_mou_8 | total_ic_mou_8 | 0.59 |
onnet_mou_7 | std_og_mou_7 | 0.59 |
total_ic_mou_6 | loc_ic_t2t_mou_6 | 0.59 |
std_og_t2m_mou_8 | std_og_mou_7 | 0.59 |
offnet_mou_6 | offnet_mou_8 | 0.59 |
std_ic_t2t_mou_6 | std_ic_t2t_mou_8 | 0.59 |
loc_ic_mou_7 | loc_ic_t2t_mou_8 | 0.59 |
total_og_mou_7 | onnet_mou_8 | 0.58 |
std_ic_t2m_mou_6 | std_ic_mou_7 | 0.58 |
total_og_mou_6 | std_og_t2t_mou_6 | 0.58 |
offnet_mou_6 | std_og_t2m_mou_7 | 0.58 |
roam_og_mou_8 | roam_og_mou_7 | 0.58 |
total_ic_mou_6 | loc_ic_t2m_mou_8 | 0.58 |
spl_og_mou_7 | spl_og_mou_8 | 0.57 |
total_og_mou_7 | std_og_mou_6 | 0.57 |
offnet_mou_7 | std_og_mou_7 | 0.57 |
loc_ic_mou_7 | loc_ic_t2t_mou_6 | 0.57 |
loc_og_t2t_mou_8 | loc_og_mou_6 | 0.57 |
std_og_t2m_mou_6 | std_og_mou_7 | 0.56 |
max_rech_amt_7 | max_rech_amt_8 | 0.56 |
loc_ic_t2m_mou_6 | total_ic_mou_8 | 0.56 |
spl_og_mou_7 | spl_og_mou_6 | 0.56 |
roam_ic_mou_7 | roam_ic_mou_8 | 0.56 |
std_ic_t2m_mou_8 | std_ic_mou_6 | 0.56 |
loc_og_mou_8 | loc_og_t2t_mou_6 | 0.56 |
loc_ic_mou_8 | loc_ic_t2t_mou_7 | 0.56 |
loc_og_mou_8 | loc_og_t2m_mou_6 | 0.56 |
total_og_mou_8 | onnet_mou_7 | 0.56 |
total_og_mou_6 | total_og_mou_8 | 0.55 |
loc_og_t2m_mou_8 | loc_og_mou_6 | 0.55 |
std_ic_t2t_mou_6 | std_ic_mou_7 | 0.55 |
loc_ic_t2t_mou_7 | loc_ic_mou_6 | 0.55 |
total_og_mou_7 | offnet_mou_8 | 0.54 |
std_og_t2t_mou_7 | std_og_mou_6 | 0.54 |
std_ic_mou_8 | std_ic_t2m_mou_6 | 0.54 |
offnet_mou_6 | std_og_mou_6 | 0.54 |
std_og_mou_6 | std_og_t2m_mou_7 | 0.54 |
total_og_mou_8 | offnet_mou_7 | 0.54 |
spl_og_mou_7 | loc_og_t2c_mou_7 | 0.53 |
std_og_t2t_mou_6 | std_og_mou_7 | 0.53 |
total_og_mou_6 | std_og_mou_7 | 0.53 |
std_og_t2t_mou_6 | onnet_mou_8 | 0.53 |
std_ic_t2t_mou_8 | std_ic_mou_7 | 0.53 |
loc_og_t2c_mou_8 | loc_og_t2c_mou_7 | 0.53 |
Average_rech_amt_6n7 | isd_og_mou_7 | 0.53 |
std_og_t2t_mou_8 | onnet_mou_6 | 0.52 |
vol_2g_mb_8 | vol_2g_mb_6 | 0.52 |
isd_og_mou_7 | arpu_7 | 0.52 |
loc_ic_t2t_mou_8 | loc_ic_mou_6 | 0.51 |
arpu_8 | total_og_mou_8 | 0.51 |
total_ic_mou_7 | loc_ic_t2t_mou_8 | 0.51 |
vol_3g_mb_7 | vbc_3g_6 | 0.51 |
vbc_3g_8 | vol_3g_mb_7 | 0.51 |
total_og_mou_6 | arpu_6 | 0.51 |
total_rech_amt_7 | isd_og_mou_7 | 0.50 |
roam_og_mou_7 | roam_og_mou_6 | 0.50 |
onnet_mou_8 | std_og_mou_7 | 0.50 |
Average_rech_amt_6n7 | isd_og_mou_6 | 0.50 |
loc_ic_t2t_mou_6 | total_ic_mou_7 | 0.50 |
loc_ic_t2t_mou_6 | loc_ic_mou_8 | 0.50 |
std_ic_t2t_mou_7 | std_ic_mou_6 | 0.50 |
max_rech_amt_6 | max_rech_amt_7 | 0.50 |
isd_og_mou_8 | Average_rech_amt_6n7 | 0.50 |
std_ic_mou_8 | std_ic_t2t_mou_7 | 0.50 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_6 | 0.50 |
total_og_mou_6 | total_rech_amt_6 | 0.49 |
loc_ic_t2t_mou_7 | total_ic_mou_8 | 0.49 |
total_og_mou_7 | std_og_t2m_mou_8 | 0.49 |
isd_og_mou_8 | total_rech_amt_8 | 0.49 |
total_og_mou_7 | std_og_t2t_mou_8 | 0.49 |
total_og_mou_8 | total_rech_amt_8 | 0.49 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_7 | 0.49 |
total_ic_mou_6 | loc_ic_t2t_mou_7 | 0.49 |
vbc_3g_7 | vol_3g_mb_8 | 0.49 |
std_og_mou_8 | onnet_mou_7 | 0.49 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_8 | 0.49 |
total_og_mou_7 | onnet_mou_6 | 0.48 |
total_og_mou_7 | offnet_mou_6 | 0.48 |
std_og_t2m_mou_6 | offnet_mou_8 | 0.48 |
total_og_mou_7 | arpu_7 | 0.48 |
max_rech_amt_8 | total_rech_amt_8 | 0.48 |
isd_og_mou_8 | arpu_7 | 0.48 |
total_og_mou_8 | std_og_t2m_mou_7 | 0.48 |
arpu_6 | isd_og_mou_6 | 0.48 |
total_og_mou_6 | onnet_mou_7 | 0.48 |
isd_og_mou_7 | total_rech_amt_8 | 0.48 |
isd_og_mou_8 | arpu_8 | 0.48 |
loc_og_t2c_mou_6 | spl_og_mou_6 | 0.48 |
offnet_mou_6 | std_og_t2m_mou_8 | 0.47 |
vol_3g_mb_8 | vbc_3g_6 | 0.47 |
total_rech_amt_6 | isd_og_mou_6 | 0.47 |
onnet_mou_7 | loc_og_t2t_mou_7 | 0.47 |
std_og_t2t_mou_8 | std_og_mou_6 | 0.47 |
arpu_6 | isd_og_mou_7 | 0.47 |
total_og_mou_6 | offnet_mou_7 | 0.47 |
offnet_mou_6 | loc_og_t2m_mou_6 | 0.47 |
std_og_t2t_mou_7 | total_og_mou_8 | 0.47 |
loc_og_t2t_mou_6 | onnet_mou_6 | 0.47 |
arpu_8 | offnet_mou_8 | 0.47 |
roam_ic_mou_7 | roam_ic_mou_6 | 0.46 |
arpu_7 | isd_og_mou_6 | 0.46 |
offnet_mou_8 | total_rech_amt_8 | 0.46 |
total_og_mou_7 | total_rech_amt_7 | 0.46 |
spl_og_mou_8 | loc_og_t2c_mou_8 | 0.46 |
isd_og_mou_8 | arpu_6 | 0.46 |
isd_og_mou_7 | total_rech_amt_6 | 0.46 |
std_og_mou_8 | std_og_t2m_mou_6 | 0.46 |
arpu_8 | isd_og_mou_7 | 0.46 |
std_ic_mou_8 | total_ic_mou_8 | 0.46 |
total_rech_amt_7 | max_rech_amt_7 | 0.46 |
vbc_3g_7 | vol_3g_mb_6 | 0.46 |
total_og_mou_8 | std_og_mou_6 | 0.46 |
loc_og_t2t_mou_8 | onnet_mou_8 | 0.46 |
arpu_6 | offnet_mou_6 | 0.46 |
isd_og_mou_8 | total_rech_amt_7 | 0.46 |
offnet_mou_6 | total_rech_amt_6 | 0.45 |
total_ic_mou_6 | loc_ic_t2t_mou_8 | 0.45 |
total_ic_mou_7 | std_ic_mou_7 | 0.45 |
std_og_mou_8 | offnet_mou_7 | 0.45 |
isd_og_mou_8 | total_rech_amt_6 | 0.45 |
total_og_mou_7 | std_og_t2m_mou_6 | 0.45 |
std_og_mou_8 | std_og_t2t_mou_6 | 0.45 |
loc_og_mou_6 | loc_ic_mou_6 | 0.45 |
total_ic_mou_6 | std_ic_mou_6 | 0.44 |
loc_og_t2m_mou_8 | loc_ic_mou_8 | 0.44 |
loc_og_t2m_mou_8 | offnet_mou_8 | 0.44 |
total_og_mou_6 | loc_og_mou_6 | 0.44 |
total_og_mou_6 | std_og_mou_8 | 0.44 |
isd_og_mou_6 | total_rech_amt_8 | 0.44 |
std_og_mou_7 | offnet_mou_8 | 0.44 |
std_og_t2m_mou_8 | std_og_mou_6 | 0.44 |
loc_og_t2m_mou_6 | loc_ic_mou_6 | 0.44 |
std_ic_t2t_mou_6 | std_ic_mou_8 | 0.44 |
loc_og_mou_8 | loc_ic_mou_8 | 0.44 |
offnet_mou_7 | arpu_7 | 0.44 |
arpu_8 | isd_og_mou_6 | 0.43 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_8 | 0.43 |
max_rech_amt_6 | total_rech_amt_6 | 0.43 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.43 |
total_rech_amt_7 | isd_og_mou_6 | 0.43 |
vbc_3g_8 | vol_3g_mb_6 | 0.43 |
total_rech_amt_7 | offnet_mou_7 | 0.43 |
loc_ic_t2t_mou_6 | total_ic_mou_8 | 0.43 |
onnet_mou_7 | std_og_mou_6 | 0.43 |
loc_og_mou_8 | total_og_mou_8 | 0.42 |
loc_ic_t2m_mou_7 | loc_og_t2m_mou_6 | 0.42 |
total_og_mou_6 | Average_rech_amt_6n7 | 0.42 |
loc_ic_mou_7 | loc_og_mou_7 | 0.42 |
loc_ic_mou_7 | loc_og_t2m_mou_7 | 0.42 |
total_og_mou_7 | Average_rech_amt_6n7 | 0.42 |
last_day_rch_amt_6 | max_rech_amt_8 | 0.42 |
std_ic_t2t_mou_8 | std_ic_mou_6 | 0.42 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_7 | 0.42 |
loc_og_t2m_mou_7 | offnet_mou_7 | 0.42 |
total_og_mou_6 | onnet_mou_8 | 0.42 |
spl_og_mou_8 | spl_og_mou_6 | 0.41 |
last_day_rch_amt_8 | max_rech_amt_7 | 0.41 |
offnet_mou_6 | Average_rech_amt_6n7 | 0.41 |
max_rech_amt_6 | last_day_rch_amt_8 | 0.41 |
loc_ic_t2m_mou_6 | loc_og_mou_6 | 0.41 |
total_og_mou_8 | onnet_mou_6 | 0.41 |
# Correlations for Churn : 1 - churned customers
# Absolute values are reported
pd.set_option('precision', 2)
cor_1 = correlation(churned_customers)
# filtering for correlations >= 40%
condition = cor_1['CORR'] > 0.4
cor_1 = cor_1[condition]
cor_1.style.background_gradient(cmap='GnBu').hide_index()
VAR1 | VAR2 | CORR |
---|---|---|
og_others_7 | og_others_8 | 1.00 |
arpu_8 | total_rech_amt_8 | 0.96 |
arpu_6 | total_rech_amt_6 | 0.95 |
std_og_mou_8 | total_og_mou_8 | 0.95 |
total_rech_amt_7 | arpu_7 | 0.95 |
std_og_t2t_mou_7 | onnet_mou_7 | 0.95 |
total_og_mou_7 | std_og_mou_7 | 0.94 |
og_others_8 | loc_og_t2f_mou_6 | 0.93 |
std_og_t2t_mou_8 | onnet_mou_8 | 0.93 |
loc_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.93 |
og_others_7 | loc_og_t2f_mou_6 | 0.93 |
total_og_mou_6 | std_og_mou_6 | 0.92 |
offnet_mou_6 | std_og_t2m_mou_6 | 0.92 |
offnet_mou_7 | std_og_t2m_mou_7 | 0.92 |
std_og_t2t_mou_6 | onnet_mou_6 | 0.92 |
std_ic_mou_8 | std_ic_t2m_mou_8 | 0.92 |
loc_og_t2f_mou_7 | og_others_8 | 0.91 |
loc_og_t2f_mou_7 | og_others_7 | 0.91 |
loc_ic_mou_8 | loc_ic_t2m_mou_8 | 0.90 |
loc_ic_t2m_mou_6 | loc_ic_mou_6 | 0.90 |
loc_ic_mou_8 | total_ic_mou_8 | 0.89 |
loc_og_t2m_mou_8 | loc_og_mou_8 | 0.88 |
total_ic_mou_6 | loc_ic_mou_6 | 0.87 |
std_og_t2m_mou_8 | offnet_mou_8 | 0.87 |
loc_ic_mou_7 | total_ic_mou_7 | 0.86 |
loc_ic_mou_7 | loc_ic_t2m_mou_7 | 0.84 |
loc_og_t2m_mou_7 | loc_og_mou_7 | 0.84 |
std_ic_t2m_mou_7 | std_ic_mou_7 | 0.82 |
total_ic_mou_8 | loc_ic_t2m_mou_8 | 0.81 |
std_og_mou_8 | std_og_t2t_mou_8 | 0.79 |
std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | 0.78 |
Average_rech_amt_6n7 | arpu_7 | 0.77 |
loc_og_mou_6 | loc_og_t2m_mou_6 | 0.77 |
loc_ic_t2m_mou_6 | total_ic_mou_6 | 0.77 |
std_ic_t2m_mou_6 | std_ic_mou_6 | 0.77 |
total_rech_amt_7 | Average_rech_amt_6n7 | 0.76 |
Average_rech_amt_6n7 | total_rech_amt_6 | 0.76 |
loc_og_mou_6 | loc_og_t2t_mou_6 | 0.75 |
total_og_mou_8 | std_og_t2t_mou_8 | 0.75 |
std_og_t2m_mou_7 | std_og_mou_7 | 0.74 |
std_og_mou_8 | onnet_mou_8 | 0.74 |
total_og_mou_8 | onnet_mou_8 | 0.74 |
arpu_6 | Average_rech_amt_6n7 | 0.73 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_7 | 0.73 |
loc_ic_t2t_mou_8 | loc_ic_mou_8 | 0.73 |
loc_ic_t2t_mou_6 | loc_ic_mou_6 | 0.72 |
max_rech_amt_6 | last_day_rch_amt_6 | 0.72 |
std_og_t2m_mou_6 | std_og_mou_6 | 0.72 |
std_ic_t2t_mou_6 | std_ic_mou_6 | 0.72 |
roam_ic_mou_7 | roam_ic_mou_8 | 0.72 |
total_og_mou_7 | offnet_mou_7 | 0.72 |
loc_ic_t2m_mou_7 | total_ic_mou_7 | 0.72 |
std_og_mou_8 | std_og_t2m_mou_8 | 0.71 |
total_og_mou_8 | offnet_mou_8 | 0.70 |
last_day_rch_amt_8 | max_rech_amt_8 | 0.70 |
loc_og_mou_7 | loc_og_t2t_mou_7 | 0.69 |
std_og_t2t_mou_7 | std_og_mou_7 | 0.69 |
total_og_mou_7 | std_og_t2m_mou_7 | 0.69 |
loc_ic_mou_7 | loc_ic_t2t_mou_7 | 0.69 |
loc_og_t2t_mou_8 | loc_og_mou_8 | 0.68 |
total_og_mou_6 | offnet_mou_6 | 0.68 |
std_og_mou_6 | std_og_t2t_mou_6 | 0.68 |
std_og_t2m_mou_8 | total_og_mou_8 | 0.68 |
max_rech_amt_8 | total_rech_amt_8 | 0.68 |
spl_og_mou_7 | loc_og_t2c_mou_7 | 0.68 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | 0.67 |
vol_3g_mb_8 | vol_3g_mb_7 | 0.67 |
std_og_t2t_mou_7 | std_og_t2t_mou_6 | 0.67 |
total_og_mou_6 | std_og_t2m_mou_6 | 0.67 |
offnet_mou_7 | std_og_mou_7 | 0.66 |
total_og_mou_7 | onnet_mou_7 | 0.66 |
onnet_mou_7 | std_og_mou_7 | 0.65 |
loc_og_mou_8 | loc_ic_mou_8 | 0.65 |
std_ic_t2t_mou_7 | std_ic_mou_7 | 0.65 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_8 | 0.65 |
roam_og_mou_8 | roam_og_mou_7 | 0.65 |
total_og_mou_6 | onnet_mou_6 | 0.65 |
total_og_mou_7 | std_og_t2t_mou_7 | 0.64 |
loc_ic_t2t_mou_8 | total_ic_mou_8 | 0.64 |
onnet_mou_7 | onnet_mou_6 | 0.64 |
loc_og_mou_8 | loc_ic_t2m_mou_8 | 0.64 |
onnet_mou_7 | std_og_t2t_mou_6 | 0.63 |
loc_og_mou_8 | loc_og_mou_7 | 0.63 |
total_ic_mou_6 | loc_ic_t2t_mou_6 | 0.63 |
offnet_mou_6 | std_og_mou_6 | 0.63 |
roam_og_mou_6 | roam_ic_mou_6 | 0.63 |
std_og_mou_8 | offnet_mou_8 | 0.63 |
roam_ic_mou_7 | roam_ic_mou_6 | 0.62 |
arpu_8 | max_rech_amt_8 | 0.62 |
std_og_t2m_mou_6 | std_og_t2m_mou_7 | 0.62 |
total_og_mou_6 | std_og_t2t_mou_6 | 0.62 |
vol_3g_mb_6 | vbc_3g_6 | 0.62 |
onnet_mou_7 | onnet_mou_8 | 0.62 |
loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | 0.62 |
loc_og_mou_8 | total_ic_mou_8 | 0.61 |
vbc_3g_8 | vbc_3g_7 | 0.61 |
loc_og_t2m_mou_7 | loc_og_t2m_mou_6 | 0.61 |
std_og_t2t_mou_7 | onnet_mou_6 | 0.61 |
roam_og_mou_7 | roam_og_mou_6 | 0.61 |
std_og_t2t_mou_7 | std_og_t2t_mou_8 | 0.61 |
std_og_mou_6 | onnet_mou_6 | 0.61 |
std_og_t2t_mou_8 | onnet_mou_7 | 0.60 |
std_ic_mou_6 | std_ic_mou_7 | 0.60 |
loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | 0.60 |
arpu_8 | total_og_mou_8 | 0.60 |
std_og_t2f_mou_7 | std_og_t2f_mou_8 | 0.60 |
isd_og_mou_8 | isd_og_mou_7 | 0.60 |
loc_og_mou_6 | loc_og_mou_7 | 0.59 |
loc_og_t2m_mou_8 | loc_og_t2m_mou_7 | 0.59 |
last_day_rch_amt_7 | max_rech_amt_7 | 0.59 |
arpu_8 | offnet_mou_8 | 0.59 |
std_og_mou_8 | std_og_mou_7 | 0.58 |
total_og_mou_8 | total_rech_amt_8 | 0.58 |
loc_og_t2m_mou_7 | loc_ic_t2m_mou_7 | 0.58 |
loc_og_t2m_mou_8 | loc_ic_mou_8 | 0.58 |
loc_ic_mou_7 | loc_ic_mou_8 | 0.58 |
std_og_t2m_mou_8 | std_og_t2m_mou_7 | 0.58 |
total_ic_mou_7 | loc_ic_t2t_mou_7 | 0.58 |
offnet_mou_8 | total_rech_amt_8 | 0.58 |
std_og_mou_6 | std_og_mou_7 | 0.58 |
spl_og_mou_8 | loc_og_t2c_mou_8 | 0.57 |
loc_ic_mou_7 | loc_ic_mou_6 | 0.57 |
isd_ic_mou_7 | isd_ic_mou_6 | 0.57 |
offnet_mou_6 | offnet_mou_7 | 0.57 |
offnet_mou_7 | offnet_mou_8 | 0.57 |
vol_3g_mb_6 | vol_3g_mb_7 | 0.57 |
isd_og_mou_7 | arpu_7 | 0.57 |
loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | 0.57 |
total_rech_num_8 | total_og_mou_8 | 0.57 |
loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | 0.56 |
std_og_t2t_mou_7 | onnet_mou_8 | 0.56 |
total_og_mou_7 | total_og_mou_8 | 0.56 |
vbc_3g_7 | vol_3g_mb_7 | 0.56 |
total_rech_amt_7 | isd_og_mou_7 | 0.56 |
std_ic_mou_8 | total_ic_mou_8 | 0.56 |
loc_ic_t2t_mou_8 | loc_ic_t2t_mou_7 | 0.56 |
loc_og_t2c_mou_6 | spl_og_mou_6 | 0.56 |
std_og_t2m_mou_6 | offnet_mou_7 | 0.55 |
ic_others_7 | ic_others_6 | 0.55 |
total_og_mou_7 | std_og_mou_8 | 0.55 |
total_ic_mou_6 | total_ic_mou_7 | 0.55 |
total_rech_num_8 | total_rech_amt_8 | 0.55 |
arpu_8 | total_rech_num_8 | 0.54 |
loc_ic_t2m_mou_7 | loc_ic_mou_6 | 0.54 |
std_ic_t2t_mou_8 | std_ic_mou_8 | 0.54 |
loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | 0.54 |
std_og_t2m_mou_8 | offnet_mou_7 | 0.54 |
total_rech_num_8 | total_rech_num_7 | 0.54 |
total_ic_mou_7 | std_ic_mou_7 | 0.54 |
std_ic_t2t_mou_7 | std_ic_mou_6 | 0.54 |
loc_ic_mou_7 | loc_ic_t2m_mou_6 | 0.54 |
offnet_mou_6 | std_og_t2m_mou_7 | 0.54 |
std_og_mou_8 | total_rech_num_8 | 0.54 |
loc_og_t2m_mou_8 | total_ic_mou_8 | 0.54 |
total_ic_mou_7 | total_ic_mou_8 | 0.54 |
std_ic_t2t_mou_8 | std_ic_t2t_mou_7 | 0.54 |
total_ic_mou_6 | std_ic_mou_6 | 0.53 |
vol_2g_mb_7 | vol_2g_mb_6 | 0.53 |
vbc_3g_7 | vbc_3g_6 | 0.53 |
arpu_6 | isd_og_mou_6 | 0.53 |
total_og_mou_8 | std_og_mou_7 | 0.52 |
std_ic_mou_8 | std_ic_mou_7 | 0.52 |
total_rech_amt_6 | isd_og_mou_6 | 0.52 |
loc_og_mou_8 | loc_og_t2m_mou_7 | 0.52 |
arpu_8 | std_og_mou_8 | 0.51 |
loc_og_t2m_mou_8 | loc_og_mou_7 | 0.51 |
loc_ic_t2m_mou_7 | loc_og_mou_7 | 0.51 |
loc_og_t2m_mou_6 | loc_og_mou_7 | 0.51 |
arpu_8 | total_rech_amt_7 | 0.51 |
total_og_mou_7 | arpu_7 | 0.51 |
total_og_mou_6 | total_og_mou_7 | 0.51 |
roam_ic_mou_7 | roam_og_mou_7 | 0.51 |
loc_ic_t2m_mou_7 | loc_ic_mou_8 | 0.51 |
total_og_mou_7 | std_og_mou_6 | 0.51 |
loc_ic_t2m_mou_6 | loc_og_t2m_mou_6 | 0.50 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | 0.50 |
std_ic_t2m_mou_7 | std_ic_t2m_mou_6 | 0.50 |
total_og_mou_6 | std_og_mou_7 | 0.50 |
total_rech_amt_7 | max_rech_amt_7 | 0.50 |
std_og_t2m_mou_7 | offnet_mou_8 | 0.50 |
arpu_8 | onnet_mou_8 | 0.50 |
onnet_mou_8 | total_rech_amt_8 | 0.50 |
loc_ic_mou_7 | loc_og_mou_7 | 0.50 |
total_ic_mou_7 | loc_ic_mou_8 | 0.50 |
std_og_mou_8 | total_rech_amt_8 | 0.50 |
loc_ic_mou_7 | loc_ic_t2m_mou_8 | 0.50 |
last_day_rch_amt_8 | total_rech_amt_8 | 0.50 |
std_og_t2f_mou_7 | loc_og_t2f_mou_7 | 0.50 |
loc_og_t2t_mou_8 | loc_og_mou_7 | 0.50 |
arpu_8 | arpu_7 | 0.50 |
loc_ic_mou_7 | total_ic_mou_6 | 0.49 |
std_ic_t2t_mou_6 | std_ic_t2t_mou_8 | 0.49 |
loc_ic_mou_7 | loc_og_t2m_mou_7 | 0.49 |
loc_ic_mou_7 | total_ic_mou_8 | 0.49 |
vol_2g_mb_7 | vol_2g_mb_8 | 0.49 |
vbc_3g_8 | vol_3g_mb_8 | 0.49 |
loc_ic_mou_8 | loc_ic_t2f_mou_8 | 0.49 |
arpu_7 | total_rech_amt_8 | 0.48 |
std_og_t2f_mou_7 | og_others_8 | 0.48 |
loc_og_t2t_mou_8 | loc_ic_t2t_mou_8 | 0.48 |
vol_3g_mb_8 | vol_3g_mb_6 | 0.48 |
std_ic_t2t_mou_6 | std_ic_mou_7 | 0.48 |
isd_og_mou_7 | isd_ic_mou_7 | 0.48 |
isd_ic_mou_8 | isd_ic_mou_7 | 0.48 |
std_ic_t2m_mou_8 | total_ic_mou_8 | 0.48 |
total_og_mou_7 | total_rech_amt_7 | 0.48 |
std_og_t2f_mou_7 | og_others_7 | 0.48 |
std_og_t2f_mou_7 | loc_og_t2f_mou_6 | 0.48 |
loc_og_mou_8 | total_og_mou_8 | 0.47 |
total_rech_num_8 | onnet_mou_8 | 0.47 |
total_ic_mou_6 | loc_ic_t2m_mou_7 | 0.47 |
loc_ic_mou_7 | loc_ic_t2t_mou_8 | 0.47 |
total_og_mou_6 | arpu_6 | 0.47 |
std_og_mou_8 | std_og_t2t_mou_7 | 0.46 |
Average_rech_amt_6n7 | isd_og_mou_7 | 0.46 |
spl_og_mou_7 | spl_og_mou_8 | 0.46 |
loc_ic_t2f_mou_6 | loc_ic_t2f_mou_8 | 0.46 |
roam_og_mou_8 | last_day_rch_amt_8 | 0.46 |
total_og_mou_7 | total_rech_num_7 | 0.46 |
total_og_mou_6 | total_rech_amt_6 | 0.46 |
total_ic_mou_7 | loc_ic_mou_6 | 0.46 |
std_ic_t2m_mou_7 | std_ic_mou_8 | 0.46 |
loc_og_mou_8 | loc_og_t2t_mou_7 | 0.46 |
max_rech_amt_6 | total_rech_amt_6 | 0.46 |
arpu_8 | roam_og_mou_8 | 0.45 |
loc_og_t2m_mou_6 | loc_ic_mou_6 | 0.45 |
total_rech_num_8 | std_og_t2t_mou_8 | 0.45 |
loc_og_mou_6 | loc_og_t2t_mou_7 | 0.45 |
std_ic_t2m_mou_8 | std_ic_mou_7 | 0.45 |
std_ic_t2m_mou_7 | total_ic_mou_7 | 0.45 |
total_rech_num_6 | total_rech_num_7 | 0.45 |
offnet_mou_7 | arpu_7 | 0.45 |
loc_og_mou_6 | loc_ic_mou_6 | 0.45 |
std_ic_t2f_mou_7 | std_ic_t2f_mou_6 | 0.45 |
max_rech_amt_7 | max_rech_amt_8 | 0.45 |
total_rech_amt_7 | total_rech_amt_8 | 0.45 |
loc_og_mou_6 | loc_og_t2m_mou_7 | 0.45 |
std_og_t2m_mou_8 | std_og_mou_7 | 0.44 |
arpu_8 | Average_rech_amt_6n7 | 0.44 |
total_ic_mou_7 | loc_og_mou_7 | 0.44 |
std_og_mou_8 | onnet_mou_7 | 0.44 |
loc_og_t2c_mou_8 | loc_og_t2c_mou_7 | 0.44 |
roam_og_mou_8 | roam_ic_mou_8 | 0.44 |
loc_ic_t2m_mou_6 | total_ic_mou_7 | 0.44 |
roam_og_mou_8 | total_rech_amt_8 | 0.44 |
arpu_8 | last_day_rch_amt_8 | 0.44 |
ic_others_6 | isd_ic_mou_6 | 0.44 |
loc_og_t2m_mou_8 | offnet_mou_8 | 0.44 |
loc_ic_t2m_mou_7 | total_ic_mou_8 | 0.43 |
std_og_t2t_mou_8 | std_og_mou_7 | 0.43 |
loc_og_mou_8 | offnet_mou_8 | 0.43 |
total_og_mou_6 | total_rech_num_6 | 0.43 |
total_og_mou_8 | onnet_mou_7 | 0.43 |
std_ic_t2m_mou_6 | total_ic_mou_6 | 0.43 |
total_rech_num_8 | offnet_mou_8 | 0.43 |
spl_og_mou_7 | spl_og_mou_6 | 0.43 |
total_ic_mou_8 | loc_ic_t2f_mou_8 | 0.43 |
std_ic_t2f_mou_7 | std_og_t2f_mou_7 | 0.43 |
loc_og_t2t_mou_8 | loc_ic_mou_8 | 0.42 |
loc_ic_t2m_mou_6 | loc_og_mou_6 | 0.42 |
loc_ic_mou_7 | loc_ic_t2f_mou_7 | 0.42 |
total_ic_mou_7 | loc_ic_t2m_mou_8 | 0.42 |
max_rech_amt_6 | max_rech_amt_8 | 0.42 |
loc_og_mou_8 | loc_ic_t2t_mou_8 | 0.42 |
std_og_t2t_mou_7 | std_og_mou_6 | 0.42 |
std_ic_t2m_mou_6 | std_ic_mou_7 | 0.42 |
arpu_8 | total_ic_mou_8 | 0.42 |
loc_og_t2m_mou_8 | loc_ic_t2m_mou_7 | 0.42 |
last_day_rch_amt_7 | max_rech_amt_8 | 0.42 |
arpu_6 | offnet_mou_6 | 0.42 |
total_rech_amt_7 | offnet_mou_7 | 0.42 |
loc_og_t2m_mou_7 | total_ic_mou_7 | 0.42 |
total_og_mou_7 | std_og_t2m_mou_8 | 0.42 |
Average_rech_amt_6n7 | total_rech_amt_8 | 0.42 |
std_og_mou_7 | arpu_7 | 0.41 |
std_og_mou_6 | std_og_t2m_mou_7 | 0.41 |
total_rech_num_7 | std_og_mou_7 | 0.41 |
total_og_mou_7 | offnet_mou_8 | 0.41 |
spl_ic_mou_8 | spl_ic_mou_6 | 0.41 |
std_og_t2t_mou_6 | std_og_mou_7 | 0.41 |
offnet_mou_6 | total_rech_amt_6 | 0.41 |
loc_og_t2t_mou_8 | loc_og_t2t_mou_6 | 0.41 |
std_og_t2t_mou_7 | total_og_mou_8 | 0.41 |
total_og_mou_8 | total_ic_mou_8 | 0.41 |
std_og_t2m_mou_6 | std_og_mou_7 | 0.41 |
# Derived variables to measure change in usage
# Usage
data['delta_vol_2g'] = data['vol_2g_mb_8'] - data['vol_2g_mb_6'].add(data['vol_2g_mb_7']).div(2)
data['delta_vol_3g'] = data['vol_3g_mb_8'] - data['vol_3g_mb_6'].add(data['vol_3g_mb_7']).div(2)
data['delta_total_og_mou'] = data['total_og_mou_8'] - data['total_og_mou_6'].add(data['total_og_mou_7']).div(2)
data['delta_total_ic_mou'] = data['total_ic_mou_8'] - data['total_ic_mou_6'].add(data['total_ic_mou_7']).div(2)
data['delta_vbc_3g'] = data['vbc_3g_8'] - data['vbc_3g_6'].add(data['vbc_3g_7']).div(2)
# Revenue
data['delta_arpu'] = data['arpu_8'] - data['arpu_6'].add(data['arpu_7']).div(2)
data['delta_total_rech_amt'] = data['total_rech_amt_8'] - data['total_rech_amt_6'].add(data['total_rech_amt_7']).div(2)
# Removing variables used for derivation :
data.drop(columns=[
'vol_2g_mb_8', 'vol_2g_mb_6', 'vol_2g_mb_7',
'vol_3g_mb_8' , 'vol_3g_mb_6', 'vol_3g_mb_7' ,
'total_og_mou_8','total_og_mou_6', 'total_og_mou_7',
'total_ic_mou_8','total_ic_mou_6', 'total_ic_mou_7',
'vbc_3g_8','vbc_3g_6','vbc_3g_7',
'arpu_8','arpu_6','arpu_7',
'total_rech_amt_8', 'total_rech_amt_6', 'total_rech_amt_7'
], inplace=True)
# Looking at quantiles from 0.90 to 1.
data.quantile(np.arange(0.9,1.01,0.01)).style.bar()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.9 | 794.98 | 824.38 | 723.61 | 915.58 | 935.69 | 853.79 | 32.73 | 18.36 | 18.68 | 64.48 | 41.20 | 37.11 | 207.93 | 207.84 | 196.91 | 435.16 | 437.49 | 416.66 | 18.38 | 18.66 | 16.96 | 4.04 | 4.84 | 4.45 | 661.74 | 657.38 | 633.34 | 630.53 | 663.79 | 567.34 | 604.41 | 645.88 | 531.26 | 2.20 | 2.18 | 1.73 | 1140.93 | 1177.18 | 1057.29 | 0.00 | 0.00 | 0.00 | 15.93 | 19.51 | 18.04 | 2.26 | 0.00 | 0.00 | 154.88 | 156.61 | 148.14 | 368.54 | 364.54 | 360.54 | 39.23 | 41.04 | 37.19 | 559.28 | 558.99 | 549.79 | 34.73 | 36.01 | 32.14 | 73.38 | 75.28 | 68.58 | 4.36 | 4.58 | 3.94 | 115.91 | 118.66 | 108.38 | 0.28 | 0.00 | 0.00 | 15.01 | 18.30 | 15.33 | 1.16 | 1.59 | 1.23 | 23.00 | 23.00 | 21.00 | 297.00 | 300.00 | 252.00 | 250.00 | 250.00 | 225.00 | 2846.00 | 1118.00 | 29.84 | 170.07 | 345.07 | 147.30 | 69.83 | 257.31 | 319.00 |
0.91 | 848.97 | 878.35 | 783.49 | 966.74 | 984.02 | 899.29 | 39.69 | 23.28 | 23.39 | 78.43 | 50.01 | 46.44 | 225.96 | 224.87 | 213.83 | 461.10 | 461.81 | 441.84 | 20.28 | 20.68 | 18.84 | 4.68 | 5.51 | 5.11 | 703.11 | 692.67 | 669.63 | 686.26 | 722.84 | 622.13 | 658.47 | 695.77 | 583.42 | 2.91 | 2.80 | 2.28 | 1195.61 | 1244.40 | 1125.28 | 0.00 | 0.00 | 0.00 | 17.54 | 21.28 | 19.69 | 2.54 | 0.00 | 0.00 | 165.79 | 168.03 | 159.84 | 390.64 | 387.11 | 382.20 | 43.59 | 45.39 | 41.21 | 593.13 | 589.65 | 580.54 | 38.21 | 39.91 | 35.93 | 80.41 | 81.93 | 75.54 | 5.21 | 5.49 | 4.71 | 125.98 | 129.29 | 118.24 | 0.30 | 0.00 | 0.00 | 18.34 | 21.84 | 18.83 | 1.44 | 1.94 | 1.51 | 24.00 | 24.00 | 22.00 | 325.00 | 330.00 | 289.00 | 250.00 | 250.00 | 250.00 | 2910.10 | 1156.00 | 39.88 | 227.15 | 377.46 | 161.80 | 95.33 | 278.90 | 345.50 |
0.92 | 909.05 | 941.99 | 848.96 | 1031.39 | 1038.09 | 953.35 | 48.71 | 29.68 | 29.64 | 93.60 | 60.97 | 57.59 | 247.94 | 244.78 | 232.33 | 490.63 | 488.04 | 468.83 | 22.56 | 23.14 | 20.93 | 5.45 | 6.26 | 5.86 | 742.96 | 735.69 | 711.57 | 750.31 | 786.39 | 680.10 | 713.49 | 760.98 | 640.57 | 3.74 | 3.71 | 3.01 | 1268.83 | 1315.08 | 1201.29 | 0.13 | 0.05 | 0.00 | 19.26 | 23.39 | 21.78 | 2.86 | 0.00 | 0.00 | 180.18 | 181.49 | 173.59 | 415.89 | 412.03 | 405.97 | 48.65 | 50.66 | 46.19 | 629.64 | 624.36 | 614.45 | 42.73 | 44.58 | 39.99 | 88.27 | 90.41 | 83.44 | 6.33 | 6.61 | 5.75 | 138.32 | 142.16 | 130.55 | 0.33 | 0.00 | 0.03 | 22.58 | 26.94 | 23.58 | 1.78 | 2.38 | 1.86 | 25.00 | 25.00 | 23.00 | 350.00 | 350.00 | 330.00 | 250.00 | 250.00 | 250.00 | 2981.20 | 1202.00 | 53.66 | 289.30 | 419.97 | 177.35 | 127.50 | 303.51 | 375.00 |
0.93 | 990.48 | 1016.15 | 920.96 | 1094.77 | 1103.93 | 1017.35 | 60.42 | 37.28 | 37.90 | 111.15 | 75.00 | 72.45 | 275.51 | 271.70 | 254.64 | 523.56 | 519.80 | 500.38 | 25.25 | 26.00 | 23.51 | 6.34 | 7.15 | 6.79 | 794.01 | 786.73 | 759.45 | 812.08 | 856.34 | 753.44 | 777.69 | 828.18 | 706.23 | 4.89 | 4.70 | 4.00 | 1358.41 | 1404.59 | 1283.20 | 0.33 | 0.25 | 0.00 | 21.33 | 25.84 | 24.07 | 3.23 | 0.00 | 0.00 | 195.66 | 196.99 | 188.25 | 444.94 | 439.30 | 434.35 | 54.59 | 57.50 | 51.92 | 671.69 | 667.07 | 654.41 | 48.03 | 50.82 | 45.64 | 98.00 | 99.68 | 93.47 | 7.84 | 8.08 | 7.08 | 153.30 | 158.86 | 146.87 | 0.36 | 0.00 | 0.11 | 28.18 | 33.33 | 29.95 | 2.20 | 2.93 | 2.38 | 27.00 | 27.00 | 25.00 | 350.00 | 398.00 | 350.00 | 252.00 | 252.00 | 250.00 | 3055.30 | 1257.00 | 71.44 | 361.57 | 466.79 | 195.08 | 166.36 | 331.83 | 408.50 |
0.9400000000000001 | 1066.85 | 1097.12 | 1007.56 | 1168.09 | 1186.36 | 1096.62 | 73.96 | 49.44 | 48.29 | 137.40 | 94.73 | 90.27 | 307.64 | 305.87 | 282.50 | 563.70 | 559.41 | 537.10 | 29.13 | 29.75 | 26.89 | 7.36 | 8.45 | 7.84 | 855.97 | 849.42 | 817.15 | 888.23 | 933.58 | 842.48 | 856.37 | 907.95 | 786.16 | 6.23 | 6.11 | 5.26 | 1456.41 | 1503.09 | 1385.91 | 0.63 | 0.51 | 0.23 | 23.92 | 28.49 | 26.83 | 3.64 | 0.00 | 0.00 | 216.28 | 214.47 | 207.73 | 478.56 | 478.91 | 472.15 | 62.96 | 65.76 | 58.84 | 717.00 | 711.74 | 704.89 | 55.49 | 58.04 | 52.58 | 110.61 | 113.20 | 105.54 | 9.66 | 9.89 | 8.77 | 174.34 | 179.84 | 165.92 | 0.40 | 0.00 | 0.21 | 35.80 | 40.97 | 36.57 | 2.81 | 3.73 | 2.98 | 28.00 | 28.00 | 26.00 | 400.00 | 455.00 | 398.00 | 252.00 | 252.00 | 252.00 | 3107.00 | 1317.70 | 97.46 | 463.36 | 524.65 | 218.73 | 217.42 | 366.26 | 447.20 |
0.9500000000000001 | 1153.97 | 1208.17 | 1115.66 | 1271.47 | 1286.28 | 1188.46 | 94.59 | 63.34 | 62.80 | 168.46 | 119.34 | 114.80 | 348.62 | 346.90 | 324.14 | 614.99 | 608.01 | 585.06 | 33.59 | 34.09 | 31.31 | 8.69 | 9.95 | 9.33 | 935.51 | 920.12 | 883.25 | 986.24 | 1029.29 | 936.49 | 960.80 | 1004.26 | 886.56 | 8.16 | 7.92 | 7.18 | 1558.50 | 1624.81 | 1518.82 | 1.10 | 1.01 | 0.55 | 26.81 | 32.15 | 30.23 | 4.14 | 0.00 | 0.00 | 243.94 | 238.62 | 232.50 | 520.55 | 518.65 | 516.67 | 72.61 | 76.05 | 67.56 | 773.27 | 781.18 | 767.31 | 64.94 | 66.55 | 61.56 | 126.66 | 130.41 | 121.88 | 12.24 | 12.31 | 10.98 | 200.64 | 205.16 | 191.95 | 0.43 | 0.06 | 0.25 | 46.45 | 51.98 | 46.48 | 3.63 | 4.83 | 3.93 | 30.00 | 30.00 | 28.00 | 500.00 | 500.00 | 455.00 | 252.00 | 274.00 | 252.00 | 3179.00 | 1406.00 | 129.68 | 562.66 | 604.55 | 245.97 | 284.40 | 404.58 | 499.00 |
0.9600000000000001 | 1282.78 | 1344.04 | 1256.34 | 1406.07 | 1407.78 | 1305.32 | 120.08 | 83.43 | 82.12 | 211.03 | 153.97 | 145.54 | 411.69 | 412.46 | 380.74 | 674.30 | 670.99 | 646.48 | 39.84 | 40.05 | 37.61 | 10.61 | 11.86 | 11.45 | 1025.57 | 1016.38 | 975.80 | 1099.72 | 1146.44 | 1066.04 | 1101.07 | 1136.32 | 998.28 | 11.43 | 10.87 | 9.59 | 1707.60 | 1766.85 | 1672.44 | 2.20 | 2.28 | 1.09 | 31.43 | 37.11 | 34.33 | 4.78 | 0.00 | 0.00 | 276.09 | 270.15 | 265.31 | 578.33 | 574.86 | 573.49 | 85.30 | 89.25 | 77.93 | 847.56 | 854.66 | 848.82 | 77.15 | 81.35 | 74.18 | 151.66 | 153.16 | 144.74 | 15.67 | 15.88 | 14.48 | 237.74 | 241.25 | 224.12 | 0.46 | 0.13 | 0.25 | 60.59 | 67.46 | 61.77 | 4.98 | 6.51 | 5.31 | 32.00 | 32.00 | 31.00 | 505.00 | 550.00 | 500.00 | 330.00 | 339.00 | 300.00 | 3264.00 | 1508.50 | 185.21 | 705.70 | 704.27 | 282.57 | 356.70 | 458.40 | 555.30 |
0.9700000000000001 | 1444.23 | 1497.25 | 1441.53 | 1578.82 | 1585.02 | 1481.57 | 155.13 | 117.54 | 112.07 | 270.52 | 203.66 | 188.86 | 508.01 | 500.20 | 458.64 | 758.99 | 749.79 | 734.08 | 49.38 | 49.09 | 46.36 | 13.04 | 14.68 | 14.14 | 1163.16 | 1143.62 | 1101.16 | 1243.36 | 1308.19 | 1235.72 | 1262.71 | 1308.35 | 1162.84 | 16.44 | 15.26 | 13.64 | 1904.26 | 1950.83 | 1877.19 | 5.01 | 5.36 | 2.73 | 37.80 | 44.44 | 40.55 | 5.56 | 0.00 | 0.00 | 320.64 | 322.92 | 308.49 | 655.65 | 646.22 | 642.58 | 103.65 | 109.78 | 96.64 | 959.39 | 962.29 | 941.13 | 97.62 | 101.69 | 94.54 | 184.59 | 187.18 | 176.96 | 20.81 | 21.78 | 19.62 | 290.90 | 295.51 | 279.29 | 0.53 | 0.20 | 0.36 | 82.75 | 91.07 | 85.44 | 7.03 | 9.05 | 7.58 | 35.00 | 36.00 | 34.00 | 550.00 | 550.00 | 550.00 | 398.00 | 398.00 | 379.00 | 3424.70 | 1633.85 | 262.23 | 895.25 | 843.24 | 334.26 | 461.60 | 529.37 | 644.00 |
0.9800000000000001 | 1694.68 | 1772.62 | 1700.24 | 1837.93 | 1838.39 | 1739.01 | 221.26 | 166.28 | 165.81 | 363.12 | 282.49 | 266.53 | 668.59 | 660.28 | 596.48 | 885.83 | 868.35 | 853.83 | 62.69 | 63.06 | 60.11 | 17.15 | 19.23 | 18.67 | 1372.78 | 1338.79 | 1306.65 | 1458.71 | 1520.56 | 1463.19 | 1518.64 | 1558.41 | 1413.14 | 24.58 | 23.23 | 21.17 | 2174.34 | 2312.91 | 2165.26 | 12.99 | 13.21 | 7.75 | 48.05 | 56.14 | 51.15 | 6.84 | 0.00 | 0.00 | 414.27 | 407.34 | 392.53 | 775.61 | 765.13 | 748.24 | 132.06 | 143.85 | 125.21 | 1136.29 | 1136.25 | 1114.04 | 132.11 | 138.47 | 131.28 | 245.19 | 253.11 | 234.79 | 30.02 | 31.45 | 28.10 | 367.71 | 388.61 | 362.84 | 0.58 | 0.30 | 0.50 | 129.13 | 135.43 | 127.01 | 10.84 | 13.99 | 11.54 | 40.00 | 40.00 | 39.00 | 655.00 | 750.00 | 619.00 | 500.00 | 500.00 | 500.00 | 3632.00 | 1834.70 | 392.11 | 1207.69 | 1051.14 | 431.92 | 621.75 | 649.14 | 779.30 |
0.9900000000000001 | 2166.37 | 2220.37 | 2188.50 | 2326.29 | 2410.10 | 2211.64 | 349.35 | 292.54 | 288.49 | 543.71 | 448.13 | 432.74 | 1076.24 | 1059.88 | 956.50 | 1147.05 | 1112.66 | 1092.59 | 90.88 | 91.06 | 86.68 | 24.86 | 28.24 | 28.87 | 1806.94 | 1761.43 | 1689.07 | 1885.20 | 1919.19 | 1938.13 | 1955.61 | 2112.66 | 1905.81 | 44.39 | 43.89 | 38.88 | 2744.49 | 2874.65 | 2800.87 | 41.25 | 40.43 | 31.24 | 71.36 | 79.87 | 74.11 | 9.31 | 0.00 | 0.00 | 625.35 | 648.79 | 621.67 | 1026.44 | 1009.29 | 976.09 | 197.17 | 205.25 | 185.62 | 1484.99 | 1515.87 | 1459.55 | 215.64 | 231.15 | 215.20 | 393.73 | 408.58 | 372.61 | 53.39 | 56.59 | 49.41 | 577.89 | 616.89 | 563.89 | 0.68 | 0.51 | 0.61 | 239.60 | 240.13 | 249.89 | 20.71 | 25.26 | 21.53 | 48.00 | 48.00 | 46.00 | 1000.00 | 1000.00 | 951.00 | 655.00 | 655.00 | 619.00 | 3651.00 | 2216.30 | 654.31 | 1878.12 | 1465.10 | 619.69 | 929.64 | 864.34 | 1036.40 |
1.0 | 7376.71 | 8157.78 | 10752.56 | 8362.36 | 9667.13 | 14007.34 | 2613.31 | 3813.29 | 4169.81 | 3775.11 | 2812.04 | 5337.04 | 6431.33 | 7400.66 | 10752.56 | 4729.74 | 4557.14 | 4961.33 | 1466.03 | 1196.43 | 928.49 | 342.86 | 569.71 | 351.83 | 10643.38 | 7674.78 | 11039.91 | 7366.58 | 8133.66 | 8014.43 | 8314.76 | 9284.74 | 13950.04 | 628.56 | 544.63 | 516.91 | 8432.99 | 10936.73 | 13980.06 | 5900.66 | 5490.28 | 5681.54 | 1023.21 | 1265.79 | 1390.88 | 100.61 | 370.13 | 394.93 | 6351.44 | 5709.59 | 4003.21 | 4693.86 | 4388.73 | 5738.46 | 1678.41 | 1983.01 | 1588.53 | 6496.11 | 6466.74 | 5748.81 | 5459.56 | 5800.93 | 4309.29 | 4630.23 | 3470.38 | 5645.86 | 1351.11 | 1136.08 | 1394.89 | 5459.63 | 6745.76 | 5957.14 | 19.76 | 21.33 | 6.23 | 3965.69 | 4747.91 | 4100.38 | 1344.14 | 1495.94 | 1209.86 | 307.00 | 138.00 | 196.00 | 4010.00 | 4010.00 | 4449.00 | 4010.00 | 4010.00 | 4449.00 | 4321.00 | 37762.50 | 8062.30 | 15646.39 | 12768.70 | 4862.62 | 8254.62 | 12808.62 | 14344.50 |
# Looking at percentage change in quantiles from 0.90 to 1.
data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).style.bar()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.9 | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan | nan |
0.91 | 6.79 | 6.55 | 8.27 | 5.59 | 5.17 | 5.33 | 21.27 | 26.80 | 25.22 | 21.64 | 21.39 | 25.13 | 8.67 | 8.20 | 8.59 | 5.96 | 5.56 | 6.04 | 10.34 | 10.83 | 11.08 | 15.84 | 13.88 | 14.88 | 6.25 | 5.37 | 5.73 | 8.84 | 8.90 | 9.66 | 8.94 | 7.72 | 9.82 | 32.27 | 28.44 | 31.79 | 4.79 | 5.71 | 6.43 | nan | nan | nan | 10.11 | 9.09 | 9.16 | 12.39 | nan | nan | 7.05 | 7.29 | 7.90 | 6.00 | 6.19 | 6.01 | 11.11 | 10.60 | 10.81 | 6.05 | 5.48 | 5.59 | 10.03 | 10.84 | 11.79 | 9.58 | 8.84 | 10.15 | 19.50 | 19.89 | 19.54 | 8.69 | 8.96 | 9.10 | 7.14 | nan | nan | 22.19 | 19.35 | 22.84 | 24.14 | 22.01 | 22.76 | 4.35 | 4.35 | 4.76 | 9.43 | 10.00 | 14.68 | 0.00 | 0.00 | 11.11 | 2.25 | 3.40 | 33.68 | 33.56 | 9.39 | 9.84 | 36.51 | 8.39 | 8.31 |
0.92 | 7.08 | 7.25 | 8.36 | 6.69 | 5.49 | 6.01 | 22.72 | 27.49 | 26.73 | 19.34 | 21.90 | 24.03 | 9.73 | 8.85 | 8.65 | 6.41 | 5.68 | 6.11 | 11.24 | 11.91 | 11.09 | 16.45 | 13.57 | 14.71 | 5.67 | 6.21 | 6.26 | 9.33 | 8.79 | 9.32 | 8.36 | 9.37 | 9.79 | 28.52 | 32.50 | 32.02 | 6.12 | 5.68 | 6.76 | inf | inf | nan | 9.81 | 9.92 | 10.60 | 12.60 | nan | nan | 8.68 | 8.01 | 8.60 | 6.46 | 6.44 | 6.22 | 11.60 | 11.61 | 12.08 | 6.15 | 5.89 | 5.84 | 11.83 | 11.70 | 11.30 | 9.77 | 10.35 | 10.46 | 21.50 | 20.38 | 22.12 | 9.79 | 9.95 | 10.41 | 10.00 | nan | inf | 23.12 | 23.35 | 25.20 | 23.61 | 22.68 | 23.18 | 4.17 | 4.17 | 4.55 | 7.69 | 6.06 | 14.19 | 0.00 | 0.00 | 0.00 | 2.44 | 3.98 | 34.55 | 27.36 | 11.26 | 9.61 | 33.76 | 8.82 | 8.54 |
0.93 | 8.96 | 7.87 | 8.48 | 6.14 | 6.34 | 6.71 | 24.03 | 25.62 | 27.86 | 18.75 | 23.02 | 25.80 | 11.12 | 11.00 | 9.60 | 6.71 | 6.51 | 6.73 | 11.91 | 12.32 | 12.33 | 16.33 | 14.27 | 15.79 | 6.87 | 6.94 | 6.73 | 8.23 | 8.89 | 10.78 | 9.00 | 8.83 | 10.25 | 30.75 | 26.77 | 32.76 | 7.06 | 6.81 | 6.82 | 153.85 | 400.00 | nan | 10.76 | 10.46 | 10.50 | 12.94 | nan | nan | 8.59 | 8.54 | 8.44 | 6.99 | 6.62 | 6.99 | 12.21 | 13.49 | 12.40 | 6.68 | 6.84 | 6.50 | 12.40 | 13.98 | 14.13 | 11.02 | 10.25 | 12.01 | 23.85 | 22.24 | 23.09 | 10.83 | 11.75 | 12.50 | 9.09 | nan | 266.67 | 24.79 | 23.73 | 27.02 | 23.60 | 23.24 | 27.96 | 8.00 | 8.00 | 8.70 | 0.00 | 13.71 | 6.06 | 0.80 | 0.80 | 0.00 | 2.49 | 4.58 | 33.13 | 24.98 | 11.15 | 10.00 | 30.47 | 9.33 | 8.93 |
0.9400000000000001 | 7.71 | 7.97 | 9.40 | 6.70 | 7.47 | 7.79 | 22.41 | 32.61 | 27.41 | 23.62 | 26.30 | 24.59 | 11.66 | 12.58 | 10.94 | 7.67 | 7.62 | 7.34 | 15.38 | 14.43 | 14.39 | 16.09 | 18.10 | 15.52 | 7.80 | 7.97 | 7.60 | 9.38 | 9.02 | 11.82 | 10.12 | 9.63 | 11.32 | 27.40 | 29.92 | 31.63 | 7.21 | 7.01 | 8.00 | 90.91 | 104.00 | inf | 12.12 | 10.26 | 11.49 | 12.69 | nan | nan | 10.54 | 8.87 | 10.35 | 7.56 | 9.02 | 8.70 | 15.33 | 14.36 | 13.33 | 6.75 | 6.70 | 7.71 | 15.53 | 14.22 | 15.21 | 12.87 | 13.57 | 12.92 | 23.21 | 22.45 | 23.84 | 13.72 | 13.21 | 12.97 | 11.11 | nan | 90.91 | 27.04 | 22.91 | 22.11 | 27.73 | 27.17 | 25.21 | 3.70 | 3.70 | 4.00 | 14.29 | 14.32 | 13.71 | 0.00 | 0.00 | 0.80 | 1.69 | 4.83 | 36.42 | 28.15 | 12.40 | 12.12 | 30.69 | 10.38 | 9.47 |
0.9500000000000001 | 8.17 | 10.12 | 10.73 | 8.85 | 8.42 | 8.38 | 27.89 | 28.10 | 30.03 | 22.61 | 25.97 | 27.18 | 13.32 | 13.41 | 14.74 | 9.10 | 8.69 | 8.93 | 15.33 | 14.58 | 16.43 | 18.07 | 17.72 | 18.94 | 9.29 | 8.32 | 8.09 | 11.03 | 10.25 | 11.16 | 12.19 | 10.61 | 12.77 | 30.98 | 29.62 | 36.50 | 7.01 | 8.10 | 9.59 | 74.60 | 98.04 | 139.13 | 12.09 | 12.85 | 12.67 | 13.74 | nan | nan | 12.79 | 11.26 | 11.92 | 8.77 | 8.30 | 9.43 | 15.33 | 15.65 | 14.82 | 7.85 | 9.76 | 8.85 | 17.04 | 14.66 | 17.08 | 14.51 | 15.21 | 15.48 | 26.71 | 24.42 | 25.23 | 15.09 | 14.08 | 15.69 | 7.50 | inf | 19.05 | 29.73 | 26.89 | 27.12 | 29.18 | 29.49 | 31.88 | 7.14 | 7.14 | 7.69 | 25.00 | 9.89 | 14.32 | 0.00 | 8.73 | 0.00 | 2.32 | 6.70 | 33.06 | 21.43 | 15.23 | 12.45 | 30.81 | 10.46 | 11.58 |
0.9600000000000001 | 11.16 | 11.25 | 12.61 | 10.59 | 9.45 | 9.83 | 26.95 | 31.73 | 30.77 | 25.27 | 29.02 | 26.77 | 18.09 | 18.90 | 17.46 | 9.64 | 10.36 | 10.50 | 18.58 | 17.51 | 20.13 | 22.09 | 19.26 | 22.68 | 9.63 | 10.46 | 10.48 | 11.51 | 11.38 | 13.83 | 14.60 | 13.15 | 12.60 | 40.07 | 37.27 | 33.57 | 9.57 | 8.74 | 10.11 | 99.64 | 125.74 | 98.55 | 17.23 | 15.43 | 13.57 | 15.46 | nan | nan | 13.18 | 13.22 | 14.11 | 11.10 | 10.84 | 11.00 | 17.48 | 17.37 | 15.35 | 9.61 | 9.41 | 10.62 | 18.80 | 22.23 | 20.50 | 19.74 | 17.44 | 18.76 | 28.04 | 28.98 | 31.88 | 18.49 | 17.59 | 16.76 | 6.98 | 116.67 | 0.00 | 30.45 | 29.77 | 32.87 | 37.19 | 34.78 | 35.01 | 6.67 | 6.67 | 10.71 | 1.00 | 10.00 | 9.89 | 30.95 | 23.72 | 19.05 | 2.67 | 7.29 | 42.82 | 25.42 | 16.49 | 14.88 | 25.42 | 13.30 | 11.28 |
0.9700000000000001 | 12.59 | 11.40 | 14.74 | 12.29 | 12.59 | 13.50 | 29.19 | 40.89 | 36.48 | 28.19 | 32.27 | 29.77 | 23.40 | 21.27 | 20.46 | 12.56 | 11.74 | 13.55 | 23.95 | 22.57 | 23.25 | 22.90 | 23.79 | 23.54 | 13.42 | 12.52 | 12.85 | 13.06 | 14.11 | 15.92 | 14.68 | 15.14 | 16.48 | 43.87 | 40.36 | 42.23 | 11.52 | 10.41 | 12.24 | 128.14 | 135.09 | 150.00 | 20.27 | 19.74 | 18.11 | 16.32 | nan | nan | 16.14 | 19.53 | 16.28 | 13.37 | 12.41 | 12.05 | 21.51 | 22.99 | 24.01 | 13.19 | 12.59 | 10.87 | 26.52 | 25.01 | 27.45 | 21.71 | 22.21 | 22.26 | 32.77 | 37.18 | 35.52 | 22.36 | 22.49 | 24.61 | 15.22 | 53.85 | 44.00 | 36.58 | 35.00 | 38.32 | 41.18 | 39.03 | 42.86 | 9.38 | 12.50 | 9.68 | 8.91 | 0.00 | 10.00 | 20.61 | 17.40 | 26.33 | 4.92 | 8.31 | 41.59 | 26.86 | 19.73 | 18.29 | 29.41 | 15.48 | 15.97 |
0.9800000000000001 | 17.34 | 18.39 | 17.95 | 16.41 | 15.99 | 17.38 | 42.63 | 41.47 | 47.95 | 34.23 | 38.71 | 41.12 | 31.61 | 32.00 | 30.06 | 16.71 | 15.81 | 16.31 | 26.97 | 28.45 | 29.66 | 31.50 | 30.95 | 32.04 | 18.02 | 17.07 | 18.66 | 17.32 | 16.23 | 18.41 | 20.27 | 19.11 | 21.53 | 49.46 | 52.20 | 55.19 | 14.18 | 18.56 | 15.35 | 159.20 | 146.46 | 183.88 | 27.11 | 26.34 | 26.15 | 23.02 | nan | nan | 29.20 | 26.14 | 27.24 | 18.30 | 18.40 | 16.44 | 27.41 | 31.04 | 29.57 | 18.44 | 18.08 | 18.37 | 35.34 | 36.17 | 38.87 | 32.83 | 35.22 | 32.68 | 44.29 | 44.39 | 43.18 | 26.40 | 31.51 | 29.92 | 9.43 | 50.00 | 38.89 | 56.05 | 48.71 | 48.66 | 54.17 | 54.57 | 52.27 | 14.29 | 11.11 | 14.71 | 19.09 | 36.36 | 12.55 | 25.63 | 25.63 | 31.93 | 6.05 | 12.29 | 49.53 | 34.90 | 24.65 | 29.22 | 34.70 | 22.63 | 21.01 |
0.9900000000000001 | 27.83 | 25.26 | 28.72 | 26.57 | 31.10 | 27.18 | 57.89 | 75.93 | 73.99 | 49.73 | 58.63 | 62.36 | 60.97 | 60.52 | 60.36 | 29.49 | 28.14 | 27.96 | 44.96 | 44.40 | 44.20 | 44.96 | 46.86 | 54.64 | 31.63 | 31.57 | 29.27 | 29.24 | 26.22 | 32.46 | 28.77 | 35.57 | 34.86 | 80.59 | 88.95 | 83.68 | 26.22 | 24.29 | 29.35 | 217.65 | 206.02 | 303.10 | 48.50 | 42.27 | 44.88 | 36.08 | nan | nan | 50.95 | 59.27 | 58.38 | 32.34 | 31.91 | 30.45 | 49.30 | 42.69 | 48.24 | 30.69 | 33.41 | 31.01 | 63.22 | 66.93 | 63.92 | 60.58 | 61.42 | 58.70 | 77.82 | 79.94 | 75.85 | 57.16 | 58.74 | 55.41 | 17.24 | 70.00 | 22.00 | 85.55 | 77.30 | 96.75 | 91.03 | 80.54 | 86.54 | 20.00 | 20.00 | 17.95 | 52.67 | 33.33 | 53.63 | 31.00 | 31.00 | 23.80 | 0.52 | 20.80 | 66.87 | 55.51 | 39.38 | 43.47 | 49.52 | 33.15 | 32.99 |
1.0 | 240.51 | 267.41 | 391.32 | 259.47 | 301.11 | 533.35 | 648.04 | 1203.51 | 1345.42 | 594.33 | 527.51 | 1133.30 | 497.57 | 598.26 | 1024.15 | 312.34 | 309.57 | 354.09 | 1513.24 | 1213.96 | 971.17 | 1279.33 | 1917.74 | 1118.63 | 489.03 | 335.71 | 553.61 | 290.76 | 323.81 | 313.51 | 325.17 | 339.48 | 631.98 | 1316.15 | 1141.04 | 1229.43 | 207.27 | 280.45 | 399.13 | 14204.63 | 13481.40 | 18086.75 | 1333.97 | 1484.83 | 1776.73 | 980.90 | inf | inf | 915.66 | 780.03 | 543.95 | 357.30 | 334.83 | 487.90 | 751.25 | 866.13 | 755.80 | 337.45 | 326.60 | 293.87 | 2431.76 | 2409.61 | 1902.47 | 1076.01 | 749.39 | 1415.23 | 2430.88 | 1907.56 | 2723.15 | 844.75 | 993.52 | 956.44 | 2805.88 | 4082.35 | 921.31 | 1555.13 | 1877.27 | 1540.89 | 6390.92 | 5822.64 | 5519.41 | 539.58 | 187.50 | 326.09 | 301.00 | 301.00 | 367.82 | 512.21 | 512.21 | 618.74 | 18.35 | 1603.85 | 1132.18 | 733.09 | 771.52 | 684.69 | 787.93 | 1381.89 | 1284.07 |
# Columns with outliers
pct_change_99_1 = data.quantile(np.arange(0.9,1.01,0.01)).pct_change().mul(100).iloc[-1]
outlier_condition = pct_change_99_1 > 100
columns_with_outliers = pct_change_99_1[outlier_condition].index.values
print('Columns with outliers :\n', columns_with_outliers)
Columns with outliers : ['onnet_mou_6' 'onnet_mou_7' 'onnet_mou_8' 'offnet_mou_6' 'offnet_mou_7' 'offnet_mou_8' 'roam_ic_mou_6' 'roam_ic_mou_7' 'roam_ic_mou_8' 'roam_og_mou_6' 'roam_og_mou_7' 'roam_og_mou_8' 'loc_og_t2t_mou_6' 'loc_og_t2t_mou_7' 'loc_og_t2t_mou_8' 'loc_og_t2m_mou_6' 'loc_og_t2m_mou_7' 'loc_og_t2m_mou_8' 'loc_og_t2f_mou_6' 'loc_og_t2f_mou_7' 'loc_og_t2f_mou_8' 'loc_og_t2c_mou_6' 'loc_og_t2c_mou_7' 'loc_og_t2c_mou_8' 'loc_og_mou_6' 'loc_og_mou_7' 'loc_og_mou_8' 'std_og_t2t_mou_6' 'std_og_t2t_mou_7' 'std_og_t2t_mou_8' 'std_og_t2m_mou_6' 'std_og_t2m_mou_7' 'std_og_t2m_mou_8' 'std_og_t2f_mou_6' 'std_og_t2f_mou_7' 'std_og_t2f_mou_8' 'std_og_mou_6' 'std_og_mou_7' 'std_og_mou_8' 'isd_og_mou_6' 'isd_og_mou_7' 'isd_og_mou_8' 'spl_og_mou_6' 'spl_og_mou_7' 'spl_og_mou_8' 'og_others_6' 'og_others_7' 'og_others_8' 'loc_ic_t2t_mou_6' 'loc_ic_t2t_mou_7' 'loc_ic_t2t_mou_8' 'loc_ic_t2m_mou_6' 'loc_ic_t2m_mou_7' 'loc_ic_t2m_mou_8' 'loc_ic_t2f_mou_6' 'loc_ic_t2f_mou_7' 'loc_ic_t2f_mou_8' 'loc_ic_mou_6' 'loc_ic_mou_7' 'loc_ic_mou_8' 'std_ic_t2t_mou_6' 'std_ic_t2t_mou_7' 'std_ic_t2t_mou_8' 'std_ic_t2m_mou_6' 'std_ic_t2m_mou_7' 'std_ic_t2m_mou_8' 'std_ic_t2f_mou_6' 'std_ic_t2f_mou_7' 'std_ic_t2f_mou_8' 'std_ic_mou_6' 'std_ic_mou_7' 'std_ic_mou_8' 'spl_ic_mou_6' 'spl_ic_mou_7' 'spl_ic_mou_8' 'isd_ic_mou_6' 'isd_ic_mou_7' 'isd_ic_mou_8' 'ic_others_6' 'ic_others_7' 'ic_others_8' 'total_rech_num_6' 'total_rech_num_7' 'total_rech_num_8' 'max_rech_amt_6' 'max_rech_amt_7' 'max_rech_amt_8' 'last_day_rch_amt_6' 'last_day_rch_amt_7' 'last_day_rch_amt_8' 'Average_rech_amt_6n7' 'delta_vol_2g' 'delta_vol_3g' 'delta_total_og_mou' 'delta_total_ic_mou' 'delta_vbc_3g' 'delta_arpu' 'delta_total_rech_amt']
# capping outliers to 99th percentile values
outlier_treatment = pd.DataFrame(columns=['Column', 'Outlier Threshold', 'Outliers replaced'])
for col in columns_with_outliers :
outlier_threshold = data[col].quantile(0.99)
condition = data[col] > outlier_threshold
outlier_treatment = outlier_treatment.append({'Column' : col , 'Outlier Threshold' : outlier_threshold, 'Outliers replaced' : data.loc[condition,col].shape[0] }, ignore_index=True)
data.loc[condition, col] = outlier_threshold
outlier_treatment
Column | Outlier Threshold | Outliers replaced | |
---|---|---|---|
0 | onnet_mou_6 | 2166.37 | 301 |
1 | onnet_mou_7 | 2220.37 | 301 |
2 | onnet_mou_8 | 2188.50 | 301 |
3 | offnet_mou_6 | 2326.29 | 301 |
4 | offnet_mou_7 | 2410.10 | 301 |
5 | offnet_mou_8 | 2211.64 | 301 |
6 | roam_ic_mou_6 | 349.35 | 301 |
7 | roam_ic_mou_7 | 292.54 | 301 |
8 | roam_ic_mou_8 | 288.49 | 301 |
9 | roam_og_mou_6 | 543.71 | 301 |
10 | roam_og_mou_7 | 448.13 | 301 |
11 | roam_og_mou_8 | 432.74 | 301 |
12 | loc_og_t2t_mou_6 | 1076.24 | 301 |
13 | loc_og_t2t_mou_7 | 1059.88 | 301 |
14 | loc_og_t2t_mou_8 | 956.50 | 301 |
15 | loc_og_t2m_mou_6 | 1147.05 | 301 |
16 | loc_og_t2m_mou_7 | 1112.66 | 301 |
17 | loc_og_t2m_mou_8 | 1092.59 | 301 |
18 | loc_og_t2f_mou_6 | 90.88 | 301 |
19 | loc_og_t2f_mou_7 | 91.06 | 301 |
20 | loc_og_t2f_mou_8 | 86.68 | 300 |
21 | loc_og_t2c_mou_6 | 24.86 | 301 |
22 | loc_og_t2c_mou_7 | 28.24 | 301 |
23 | loc_og_t2c_mou_8 | 28.87 | 301 |
24 | loc_og_mou_6 | 1806.94 | 301 |
25 | loc_og_mou_7 | 1761.43 | 301 |
26 | loc_og_mou_8 | 1689.07 | 301 |
27 | std_og_t2t_mou_6 | 1885.20 | 301 |
28 | std_og_t2t_mou_7 | 1919.19 | 301 |
29 | std_og_t2t_mou_8 | 1938.13 | 301 |
30 | std_og_t2m_mou_6 | 1955.61 | 301 |
31 | std_og_t2m_mou_7 | 2112.66 | 301 |
32 | std_og_t2m_mou_8 | 1905.81 | 301 |
33 | std_og_t2f_mou_6 | 44.39 | 301 |
34 | std_og_t2f_mou_7 | 43.89 | 301 |
35 | std_og_t2f_mou_8 | 38.88 | 301 |
36 | std_og_mou_6 | 2744.49 | 301 |
37 | std_og_mou_7 | 2874.65 | 301 |
38 | std_og_mou_8 | 2800.87 | 301 |
39 | isd_og_mou_6 | 41.25 | 301 |
40 | isd_og_mou_7 | 40.43 | 301 |
41 | isd_og_mou_8 | 31.24 | 300 |
42 | spl_og_mou_6 | 71.36 | 301 |
43 | spl_og_mou_7 | 79.87 | 301 |
44 | spl_og_mou_8 | 74.11 | 301 |
45 | og_others_6 | 9.31 | 301 |
46 | og_others_7 | 0.00 | 164 |
47 | og_others_8 | 0.00 | 180 |
48 | loc_ic_t2t_mou_6 | 625.35 | 301 |
49 | loc_ic_t2t_mou_7 | 648.79 | 301 |
50 | loc_ic_t2t_mou_8 | 621.67 | 301 |
51 | loc_ic_t2m_mou_6 | 1026.44 | 301 |
52 | loc_ic_t2m_mou_7 | 1009.29 | 301 |
53 | loc_ic_t2m_mou_8 | 976.09 | 301 |
54 | loc_ic_t2f_mou_6 | 197.17 | 301 |
55 | loc_ic_t2f_mou_7 | 205.25 | 301 |
56 | loc_ic_t2f_mou_8 | 185.62 | 301 |
57 | loc_ic_mou_6 | 1484.99 | 301 |
58 | loc_ic_mou_7 | 1515.87 | 301 |
59 | loc_ic_mou_8 | 1459.55 | 301 |
60 | std_ic_t2t_mou_6 | 215.64 | 301 |
61 | std_ic_t2t_mou_7 | 231.15 | 301 |
62 | std_ic_t2t_mou_8 | 215.20 | 301 |
63 | std_ic_t2m_mou_6 | 393.73 | 301 |
64 | std_ic_t2m_mou_7 | 408.58 | 301 |
65 | std_ic_t2m_mou_8 | 372.61 | 301 |
66 | std_ic_t2f_mou_6 | 53.39 | 301 |
67 | std_ic_t2f_mou_7 | 56.59 | 300 |
68 | std_ic_t2f_mou_8 | 49.41 | 301 |
69 | std_ic_mou_6 | 577.89 | 301 |
70 | std_ic_mou_7 | 616.89 | 301 |
71 | std_ic_mou_8 | 563.89 | 301 |
72 | spl_ic_mou_6 | 0.68 | 278 |
73 | spl_ic_mou_7 | 0.51 | 295 |
74 | spl_ic_mou_8 | 0.61 | 293 |
75 | isd_ic_mou_6 | 239.60 | 301 |
76 | isd_ic_mou_7 | 240.13 | 301 |
77 | isd_ic_mou_8 | 249.89 | 301 |
78 | ic_others_6 | 20.71 | 301 |
79 | ic_others_7 | 25.26 | 301 |
80 | ic_others_8 | 21.53 | 300 |
81 | total_rech_num_6 | 48.00 | 283 |
82 | total_rech_num_7 | 48.00 | 283 |
83 | total_rech_num_8 | 46.00 | 287 |
84 | max_rech_amt_6 | 1000.00 | 169 |
85 | max_rech_amt_7 | 1000.00 | 204 |
86 | max_rech_amt_8 | 951.00 | 289 |
87 | last_day_rch_amt_6 | 655.00 | 284 |
88 | last_day_rch_amt_7 | 655.00 | 300 |
89 | last_day_rch_amt_8 | 619.00 | 283 |
90 | Average_rech_amt_6n7 | 2216.30 | 301 |
91 | delta_vol_2g | 654.31 | 301 |
92 | delta_vol_3g | 1878.12 | 301 |
93 | delta_total_og_mou | 1465.10 | 301 |
94 | delta_total_ic_mou | 619.69 | 301 |
95 | delta_vbc_3g | 929.64 | 301 |
96 | delta_arpu | 864.34 | 301 |
97 | delta_total_rech_amt | 1036.40 | 301 |
categorical = data.dtypes == 'category'
categorical_vars = data.columns[categorical].to_list()
ind_categorical_vars = set(categorical_vars) - {'Churn'} #independent categorical variables
ind_categorical_vars
{'monthly_2g_6', 'monthly_2g_7', 'monthly_2g_8', 'monthly_3g_6', 'monthly_3g_7', 'monthly_3g_8', 'sachet_2g_6', 'sachet_2g_7', 'sachet_2g_8', 'sachet_3g_6', 'sachet_3g_7', 'sachet_3g_8'}
# Finding & Grouping categories with less than 1% contribution in each column into "Others"
for col in ind_categorical_vars :
category_counts = 100*data[col].value_counts(normalize=True)
print('\n',tabulate(pd.DataFrame(category_counts), headers='keys', tablefmt='psql'),'\n')
low_count_categories = category_counts[category_counts <= 1].index.to_list()
print(f"Replaced {low_count_categories} in {col} with category : Others")
data[col].replace(low_count_categories,'Others',inplace=True)
+----+---------------+ | | sachet_3g_6 | |----+---------------| | 0 | 93.4091 | | 1 | 4.35507 | | 2 | 1.04295 | | 3 | 0.396521 | | 4 | 0.219919 | | 5 | 0.123288 | | 6 | 0.089967 | | 7 | 0.0866349 | | 8 | 0.0499817 | | 9 | 0.0499817 | | 10 | 0.0366532 | | 11 | 0.0266569 | | 15 | 0.0166606 | | 12 | 0.0133284 | | 19 | 0.0133284 | | 13 | 0.00999633 | | 14 | 0.00999633 | | 18 | 0.00999633 | | 23 | 0.00999633 | | 16 | 0.00666422 | | 22 | 0.00666422 | | 29 | 0.00666422 | | 28 | 0.00333211 | | 17 | 0.00333211 | | 21 | 0.00333211 | +----+---------------+ Replaced [3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 12, 19, 13, 14, 18, 23, 16, 22, 29, 28, 17, 21] in sachet_3g_6 with category : Others +----+----------------+ | | monthly_2g_7 | |----+----------------| | 0 | 88.4876 | | 1 | 10.0397 | | 2 | 1.35284 | | 3 | 0.0966312 | | 4 | 0.0166606 | | 5 | 0.00666422 | +----+----------------+ Replaced [3, 4, 5] in monthly_2g_7 with category : Others +----+----------------+ | | monthly_2g_8 | |----+----------------| | 0 | 89.7604 | | 1 | 9.19996 | | 2 | 0.942988 | | 3 | 0.0733065 | | 4 | 0.0166606 | | 5 | 0.00666422 | +----+----------------+ Replaced [2, 3, 4, 5] in monthly_2g_8 with category : Others +----+---------------+ | | sachet_3g_8 | |----+---------------| | 0 | 94.2388 | | 1 | 3.52537 | | 2 | 0.839692 | | 3 | 0.429842 | | 4 | 0.243244 | | 5 | 0.219919 | | 6 | 0.0866349 | | 7 | 0.0766386 | | 8 | 0.0733065 | | 9 | 0.0399853 | | 12 | 0.0366532 | | 13 | 0.0333211 | | 10 | 0.0333211 | | 11 | 0.0199927 | | 14 | 0.0199927 | | 15 | 0.0166606 | | 16 | 0.00999633 | | 17 | 0.00666422 | | 18 | 0.00666422 | | 20 | 0.00666422 | | 21 | 0.00666422 | | 23 | 0.00666422 | | 38 | 0.00333211 | | 19 | 0.00333211 | | 25 | 0.00333211 | | 27 | 0.00333211 | | 29 | 0.00333211 | | 30 | 0.00333211 | | 41 | 0.00333211 | +----+---------------+ Replaced [2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 10, 11, 14, 15, 16, 17, 18, 20, 21, 23, 38, 19, 25, 27, 29, 30, 41] in sachet_3g_8 with category : Others +----+----------------+ | | monthly_3g_7 | |----+----------------| | 0 | 87.8378 | | 1 | 8.21699 | | 2 | 2.739 | | 3 | 0.689747 | | 4 | 0.226584 | | 5 | 0.129952 | | 6 | 0.0766386 | | 7 | 0.0333211 | | 8 | 0.0166606 | | 9 | 0.0133284 | | 11 | 0.00666422 | | 16 | 0.00333211 | | 14 | 0.00333211 | | 12 | 0.00333211 | | 10 | 0.00333211 | +----+----------------+ Replaced [3, 4, 5, 6, 7, 8, 9, 11, 16, 14, 12, 10] in monthly_3g_7 with category : Others +----+---------------+ | | sachet_2g_6 | |----+---------------| | 0 | 82.5631 | | 1 | 7.87378 | | 2 | 3.3621 | | 3 | 2.0126 | | 4 | 1.32951 | | 5 | 0.703076 | | 6 | 0.509813 | | 7 | 0.356536 | | 8 | 0.286562 | | 9 | 0.239912 | | 10 | 0.17327 | | 12 | 0.146613 | | 11 | 0.0999633 | | 13 | 0.0566459 | | 14 | 0.0533138 | | 15 | 0.0433175 | | 17 | 0.0366532 | | 18 | 0.029989 | | 19 | 0.029989 | | 16 | 0.0233248 | | 22 | 0.0133284 | | 20 | 0.00999633 | | 21 | 0.00999633 | | 24 | 0.00999633 | | 25 | 0.00999633 | | 39 | 0.00333211 | | 27 | 0.00333211 | | 30 | 0.00333211 | | 32 | 0.00333211 | | 34 | 0.00333211 | | 28 | 0 | | 42 | 0 | +----+---------------+ Replaced [5, 6, 7, 8, 9, 10, 12, 11, 13, 14, 15, 17, 18, 19, 16, 22, 20, 21, 24, 25, 39, 27, 30, 32, 34, 28, 42] in sachet_2g_6 with category : Others +----+----------------+ | | monthly_2g_6 | |----+----------------| | 0 | 88.9074 | | 1 | 9.83306 | | 2 | 1.14958 | | 3 | 0.0866349 | | 4 | 0.0233248 | +----+----------------+ Replaced [3, 4] in monthly_2g_6 with category : Others +----+---------------+ | | sachet_2g_7 | |----+---------------| | 0 | 81.8033 | | 1 | 7.24068 | | 2 | 3.34877 | | 3 | 1.96595 | | 4 | 1.50945 | | 5 | 1.20622 | | 6 | 0.843024 | | 7 | 0.543134 | | 8 | 0.403185 | | 10 | 0.239912 | | 9 | 0.219919 | | 11 | 0.159941 | | 12 | 0.0966312 | | 14 | 0.0799707 | | 13 | 0.0666422 | | 15 | 0.0499817 | | 16 | 0.0366532 | | 18 | 0.0333211 | | 17 | 0.029989 | | 20 | 0.0266569 | | 19 | 0.0233248 | | 21 | 0.00999633 | | 26 | 0.00999633 | | 27 | 0.00999633 | | 22 | 0.00666422 | | 23 | 0.00666422 | | 30 | 0.00666422 | | 42 | 0.00333211 | | 24 | 0.00333211 | | 25 | 0.00333211 | | 29 | 0.00333211 | | 32 | 0.00333211 | | 35 | 0.00333211 | | 48 | 0.00333211 | | 28 | 0 | +----+---------------+ Replaced [6, 7, 8, 10, 9, 11, 12, 14, 13, 15, 16, 18, 17, 20, 19, 21, 26, 27, 22, 23, 30, 42, 24, 25, 29, 32, 35, 48, 28] in sachet_2g_7 with category : Others +----+---------------+ | | sachet_3g_7 | |----+---------------| | 0 | 93.4757 | | 1 | 4.10849 | | 2 | 1.03962 | | 3 | 0.383193 | | 4 | 0.239912 | | 5 | 0.219919 | | 6 | 0.139949 | | 7 | 0.059978 | | 9 | 0.0533138 | | 8 | 0.0466496 | | 11 | 0.0433175 | | 10 | 0.0333211 | | 12 | 0.0333211 | | 15 | 0.0166606 | | 14 | 0.0166606 | | 13 | 0.0133284 | | 18 | 0.0133284 | | 19 | 0.00999633 | | 20 | 0.00999633 | | 22 | 0.00999633 | | 17 | 0.00666422 | | 21 | 0.00666422 | | 24 | 0.00666422 | | 33 | 0.00333211 | | 16 | 0.00333211 | | 31 | 0.00333211 | | 35 | 0.00333211 | +----+---------------+ Replaced [3, 4, 5, 6, 7, 9, 8, 11, 10, 12, 15, 14, 13, 18, 19, 20, 22, 17, 21, 24, 33, 16, 31, 35] in sachet_3g_7 with category : Others +----+----------------+ | | monthly_3g_8 | |----+----------------| | 0 | 88.3876 | | 1 | 8.00706 | | 2 | 2.45243 | | 3 | 0.656426 | | 4 | 0.289894 | | 5 | 0.0999633 | | 6 | 0.0466496 | | 7 | 0.029989 | | 9 | 0.00999633 | | 8 | 0.00999633 | | 10 | 0.00666422 | | 16 | 0.00333211 | +----+----------------+ Replaced [3, 4, 5, 6, 7, 9, 8, 10, 16] in monthly_3g_8 with category : Others +----+----------------+ | | monthly_3g_6 | |----+----------------| | 0 | 88.0744 | | 1 | 8.4669 | | 2 | 2.32248 | | 3 | 0.689747 | | 4 | 0.246576 | | 5 | 0.106628 | | 6 | 0.0366532 | | 7 | 0.029989 | | 8 | 0.00999633 | | 11 | 0.00666422 | | 9 | 0.00666422 | | 14 | 0.00333211 | +----+----------------+ Replaced [3, 4, 5, 6, 7, 8, 11, 9, 14] in monthly_3g_6 with category : Others +----+---------------+ | | sachet_2g_8 | |----+---------------| | 0 | 79.7274 | | 1 | 8.87008 | | 2 | 3.25881 | | 3 | 2.19253 | | 4 | 1.81267 | | 5 | 1.44947 | | 6 | 0.88301 | | 7 | 0.459831 | | 8 | 0.313218 | | 9 | 0.249908 | | 10 | 0.169938 | | 11 | 0.123288 | | 12 | 0.113292 | | 14 | 0.0766386 | | 15 | 0.0566459 | | 13 | 0.0499817 | | 16 | 0.0433175 | | 18 | 0.0266569 | | 17 | 0.0233248 | | 19 | 0.0233248 | | 20 | 0.0133284 | | 34 | 0.00666422 | | 29 | 0.00666422 | | 27 | 0.00666422 | | 24 | 0.00666422 | | 22 | 0.00666422 | | 21 | 0.00666422 | | 23 | 0.00333211 | | 25 | 0.00333211 | | 26 | 0.00333211 | | 31 | 0.00333211 | | 32 | 0.00333211 | | 33 | 0.00333211 | | 44 | 0.00333211 | +----+---------------+ Replaced [6, 7, 8, 9, 10, 11, 12, 14, 15, 13, 16, 18, 17, 19, 20, 34, 29, 27, 24, 22, 21, 23, 25, 26, 31, 32, 33, 44] in sachet_2g_8 with category : Others
dummy_vars = pd.get_dummies(data[ind_categorical_vars], drop_first=False, prefix=ind_categorical_vars, prefix_sep='_')
dummy_vars.head()
sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | sachet_3g_6_Others | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | monthly_2g_7_Others | monthly_2g_8_0 | monthly_2g_8_1 | monthly_2g_8_Others | sachet_3g_8_0 | sachet_3g_8_1 | sachet_3g_8_Others | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | monthly_3g_7_Others | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | sachet_2g_6_Others | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | monthly_2g_6_Others | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | sachet_2g_7_Others | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | sachet_3g_7_Others | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_8_Others | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | monthly_3g_6_Others | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | sachet_2g_8_Others | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
7001524846 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7002191713 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 |
7000875565 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
7000187447 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
reference_cols = dummy_vars.filter(regex='.*Others$').columns.to_list() # Using category 'Others' in each column as reference.
dummy_vars.drop(columns=reference_cols, inplace=True)
reference_cols
['sachet_3g_6_Others', 'monthly_2g_7_Others', 'monthly_2g_8_Others', 'sachet_3g_8_Others', 'monthly_3g_7_Others', 'sachet_2g_6_Others', 'monthly_2g_6_Others', 'sachet_2g_7_Others', 'sachet_3g_7_Others', 'monthly_3g_8_Others', 'monthly_3g_6_Others', 'sachet_2g_8_Others']
# concatenating dummy variables with original 'data'
data.drop(columns=ind_categorical_vars, inplace=True) # dropping original categorical columns
data = pd.concat([data, dummy_vars], axis=1)
data.head()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | Churn | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | monthly_2g_8_0 | monthly_2g_8_1 | sachet_3g_8_0 | sachet_3g_8_1 | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
7000701601 | 57.84 | 54.68 | 52.29 | 453.43 | 567.16 | 325.91 | 16.23 | 33.49 | 31.64 | 23.74 | 12.59 | 38.06 | 51.39 | 31.38 | 40.28 | 308.63 | 447.38 | 162.28 | 62.13 | 55.14 | 53.23 | 0.0 | 0.0 | 0.00 | 422.16 | 533.91 | 255.79 | 4.30 | 23.29 | 12.01 | 49.89 | 31.76 | 49.14 | 6.66 | 20.08 | 16.68 | 60.86 | 75.14 | 77.84 | 0.0 | 0.18 | 10.01 | 4.50 | 0.00 | 6.50 | 0.00 | 0.0 | 0.0 | 58.14 | 32.26 | 27.31 | 217.56 | 221.49 | 121.19 | 152.16 | 101.46 | 39.53 | 427.88 | 355.23 | 188.04 | 36.89 | 11.83 | 30.39 | 91.44 | 126.99 | 141.33 | 52.19 | 34.24 | 22.21 | 180.54 | 173.08 | 193.94 | 0.21 | 0.0 | 0.0 | 2.06 | 14.53 | 31.59 | 15.74 | 15.19 | 15.14 | 5.0 | 5.0 | 7.0 | 1000.0 | 790.0 | 951.0 | 0.0 | 0.0 | 619.0 | 802 | 1185.0 | 1 | 0.00 | 0.00 | -198.22 | -163.51 | 38.68 | 864.34 | 1036.4 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
7001524846 | 413.69 | 351.03 | 35.08 | 94.66 | 80.63 | 136.48 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 297.13 | 217.59 | 12.49 | 80.96 | 70.58 | 50.54 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 7.15 | 378.09 | 288.18 | 63.04 | 116.56 | 133.43 | 22.58 | 13.69 | 10.04 | 75.69 | 0.00 | 0.00 | 0.00 | 130.26 | 143.48 | 98.28 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 10.23 | 0.00 | 0.0 | 0.0 | 23.84 | 9.84 | 0.31 | 57.58 | 13.98 | 15.48 | 0.00 | 0.00 | 0.00 | 81.43 | 23.83 | 15.79 | 0.00 | 0.58 | 0.10 | 22.43 | 4.08 | 0.65 | 0.00 | 0.00 | 0.00 | 22.43 | 4.66 | 0.75 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.0 | 21.0 | 14.0 | 90.0 | 154.0 | 30.0 | 50.0 | 0.0 | 10.0 | 315 | 519.0 | 0 | -177.97 | -363.54 | -298.45 | -49.63 | -495.38 | -298.11 | -399.0 | 1 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
7002191713 | 501.76 | 108.39 | 534.24 | 413.31 | 119.28 | 482.46 | 23.53 | 144.24 | 72.11 | 7.98 | 35.26 | 1.44 | 49.63 | 6.19 | 36.01 | 151.13 | 47.28 | 294.46 | 4.54 | 0.00 | 23.51 | 0.0 | 0.0 | 0.49 | 205.31 | 53.48 | 353.99 | 446.41 | 85.98 | 498.23 | 255.36 | 52.94 | 156.94 | 0.00 | 0.00 | 0.00 | 701.78 | 138.93 | 655.18 | 0.0 | 0.00 | 1.29 | 0.00 | 0.00 | 4.78 | 0.00 | 0.0 | 0.0 | 67.88 | 7.58 | 52.58 | 142.88 | 18.53 | 195.18 | 4.81 | 0.00 | 7.49 | 215.58 | 26.11 | 255.26 | 115.68 | 38.29 | 154.58 | 308.13 | 29.79 | 317.91 | 0.00 | 0.00 | 1.91 | 423.81 | 68.09 | 474.41 | 0.45 | 0.0 | 0.0 | 239.60 | 62.11 | 249.89 | 20.71 | 16.24 | 21.44 | 6.0 | 4.0 | 11.0 | 110.0 | 110.0 | 130.0 | 110.0 | 50.0 | 0.0 | 2607 | 380.0 | 0 | 0.02 | 0.00 | 465.51 | 573.93 | 0.00 | 244.00 | 337.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
7000875565 | 50.51 | 74.01 | 70.61 | 296.29 | 229.74 | 162.76 | 0.00 | 2.83 | 0.00 | 0.00 | 17.74 | 0.00 | 42.61 | 65.16 | 67.38 | 273.29 | 145.99 | 128.28 | 0.00 | 4.48 | 10.26 | 0.0 | 0.0 | 0.00 | 315.91 | 215.64 | 205.93 | 7.89 | 2.58 | 3.23 | 22.99 | 64.51 | 18.29 | 0.00 | 0.00 | 0.00 | 30.89 | 67.09 | 21.53 | 0.0 | 0.00 | 0.00 | 0.00 | 3.26 | 5.91 | 0.00 | 0.0 | 0.0 | 41.33 | 71.44 | 28.89 | 226.81 | 149.69 | 150.16 | 8.71 | 8.68 | 32.71 | 276.86 | 229.83 | 211.78 | 68.79 | 78.64 | 6.33 | 18.68 | 73.08 | 73.93 | 0.51 | 0.00 | 2.18 | 87.99 | 151.73 | 82.44 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.23 | 0.00 | 0.00 | 0.00 | 10.0 | 6.0 | 2.0 | 110.0 | 110.0 | 130.0 | 100.0 | 100.0 | 130.0 | 511 | 459.0 | 0 | 0.00 | 0.00 | -83.03 | -78.75 | -12.17 | -177.53 | -299.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
7000187447 | 1185.91 | 9.28 | 7.79 | 61.64 | 0.00 | 5.54 | 0.00 | 4.76 | 4.81 | 0.00 | 8.46 | 13.34 | 38.99 | 0.00 | 0.00 | 58.54 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 97.54 | 0.00 | 0.00 | 1146.91 | 0.81 | 0.00 | 1.55 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 1148.46 | 0.81 | 0.00 | 0.0 | 0.00 | 0.00 | 2.58 | 0.00 | 0.00 | 0.93 | 0.0 | 0.0 | 34.54 | 0.00 | 0.00 | 47.41 | 2.31 | 0.00 | 0.00 | 0.00 | 0.00 | 81.96 | 2.31 | 0.00 | 8.63 | 0.00 | 0.00 | 1.28 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 9.91 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 19.0 | 2.0 | 4.0 | 110.0 | 0.0 | 30.0 | 30.0 | 0.0 | 0.0 | 667 | 408.0 | 0 | 0.00 | 0.00 | -625.17 | -47.09 | 0.00 | -329.00 | -378.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
dummy_cols = dummy_vars.columns.to_list()
data[dummy_cols] = data[dummy_cols].astype('category')
data.shape
(30011, 142)
----joint
This following section contains
y = data.pop('Churn') # Predicted / Target Variable
X = data # Predictor variables
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, train_size=0.7, random_state=42)
y.value_counts(normalize=True).to_frame()
Churn | |
---|---|
0 | 0.913598 |
1 | 0.086402 |
# Ratio of classes
class_0 = y[y == 0].count()
class_1 = y[y == 1].count()
print(f'Class Imbalance Ratio : {round(class_1/class_0,3)}')
Class Imbalance Ratio : 0.095
#!pip install imblearn
from imblearn.over_sampling import SMOTE
smt = SMOTE(random_state=42, k_neighbors=5)
# Resampling Train set to account for class imbalance
X_train_resampled, y_train_resampled= smt.fit_resample(X_train, y_train)
X_train_resampled.head()
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | monthly_2g_8_0 | monthly_2g_8_1 | sachet_3g_8_0 | sachet_3g_8_1 | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 53.01 | 52.64 | 37.48 | 316.01 | 195.74 | 68.36 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 53.01 | 52.64 | 37.48 | 282.38 | 171.64 | 44.51 | 31.59 | 17.38 | 19.43 | 0.0 | 0.0 | 0.00 | 366.99 | 241.68 | 101.43 | 0.00 | 0.00 | 0.00 | 0.00 | 2.11 | 0.00 | 2.03 | 4.59 | 4.41 | 2.03 | 6.71 | 4.41 | 0.00 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 18.41 | 40.79 | 11.79 | 292.99 | 191.98 | 85.89 | 6.26 | 1.21 | 10.39 | 317.68 | 233.99 | 108.09 | 0.00 | 0.00 | 0.00 | 0.66 | 0.00 | 0.00 | 5.61 | 1.53 | 2.76 | 6.28 | 1.53 | 2.76 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 9.55 | 0.00 | 0.00 | 0.00 | 6.0 | 5.0 | 4.0 | 198.0 | 198.0 | 198.0 | 110.0 | 130.0 | 130.0 | 1423 | 483.0 | -791.7700 | 1077.750 | -202.870 | -159.335 | 71.085 | -172.4995 | -155.0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
1 | 91.39 | 216.14 | 150.58 | 504.19 | 301.98 | 434.41 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 40.36 | 36.21 | 27.73 | 37.26 | 36.73 | 59.61 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.58 | 77.63 | 72.94 | 87.34 | 51.03 | 179.93 | 122.84 | 465.96 | 265.24 | 356.44 | 0.00 | 0.00 | 0.00 | 516.99 | 445.18 | 479.29 | 0.96 | 0.0 | 3.89 | 0.0 | 0.0 | 14.45 | 0.0 | 0.0 | 0.0 | 104.39 | 31.98 | 35.83 | 154.11 | 147.88 | 243.53 | 0.00 | 0.76 | 0.00 | 258.51 | 180.63 | 279.36 | 4.03 | 2.99 | 0.46 | 6.36 | 12.31 | 3.91 | 0.00 | 0.00 | 0.00 | 10.39 | 15.31 | 4.38 | 0.58 | 0.0 | 0.25 | 19.66 | 21.96 | 86.63 | 0.23 | 0.56 | 1.04 | 8.0 | 11.0 | 12.0 | 110.0 | 130.0 | 130.0 | 0.0 | 130.0 | 0.0 | 189 | 454.0 | 0.0000 | 0.000 | 28.130 | 117.745 | 0.000 | 48.6160 | -94.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
2 | 11.96 | 14.13 | 0.40 | 1.51 | 0.00 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 11.96 | 14.13 | 0.40 | 1.51 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 | 0.00 | 13.48 | 14.13 | 0.40 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 20.58 | 20.39 | 97.66 | 36.84 | 21.58 | 18.66 | 5.48 | 0.73 | 1.43 | 62.91 | 42.71 | 117.76 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.0 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 5.0 | 3.0 | 4.0 | 252.0 | 252.0 | 252.0 | 252.0 | 0.0 | 252.0 | 2922 | 403.0 | -44.6300 | -5.525 | -13.405 | 64.950 | 0.000 | 75.3940 | 151.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
3 | 532.66 | 537.31 | 738.21 | 49.03 | 71.64 | 39.43 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 24.46 | 19.79 | 37.74 | 41.26 | 47.86 | 39.43 | 1.19 | 4.04 | 0.00 | 0.0 | 0.0 | 0.00 | 66.93 | 71.71 | 77.18 | 508.19 | 517.51 | 700.46 | 6.56 | 18.24 | 0.00 | 0.00 | 1.48 | 0.00 | 514.76 | 537.24 | 700.46 | 0.00 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 19.86 | 28.81 | 20.24 | 66.08 | 94.18 | 67.54 | 51.74 | 68.16 | 50.08 | 137.69 | 191.16 | 137.88 | 18.83 | 14.56 | 1.28 | 1.08 | 20.89 | 6.83 | 0.00 | 3.08 | 3.05 | 19.91 | 38.54 | 11.16 | 0.00 | 0.0 | 0.00 | 0.00 | 5.28 | 7.49 | 0.00 | 0.00 | 0.00 | 10.0 | 13.0 | 12.0 | 145.0 | 150.0 | 145.0 | 0.0 | 150.0 | 0.0 | 1128 | 521.0 | -10.1500 | -108.195 | 182.315 | -39.760 | 0.000 | 192.8075 | 207.0 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
4 | 122.68 | 105.51 | 149.33 | 302.23 | 211.44 | 264.11 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 122.68 | 105.51 | 149.33 | 301.04 | 194.06 | 257.14 | 0.00 | 0.66 | 0.51 | 0.0 | 0.0 | 0.00 | 423.73 | 300.24 | 406.99 | 0.00 | 0.00 | 0.00 | 1.18 | 15.75 | 6.44 | 0.00 | 0.96 | 0.00 | 1.18 | 16.71 | 6.44 | 0.00 | 0.0 | 0.00 | 0.0 | 0.0 | 0.00 | 0.0 | 0.0 | 0.0 | 228.54 | 198.24 | 231.13 | 412.99 | 392.98 | 353.86 | 81.76 | 89.69 | 88.74 | 723.31 | 680.93 | 673.74 | 0.00 | 0.00 | 1.05 | 8.14 | 5.33 | 0.70 | 11.83 | 6.58 | 10.44 | 19.98 | 11.91 | 12.19 | 0.00 | 0.0 | 0.00 | 0.43 | 0.00 | 0.48 | 0.00 | 0.00 | 0.00 | 5.0 | 5.0 | 4.0 | 325.0 | 154.0 | 164.0 | 325.0 | 154.0 | 164.0 | 2453 | 721.0 | 654.3125 | -686.915 | 42.505 | -31.855 | -433.700 | -55.1110 | -105.0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
# columns with numerical data
condition1 = data.dtypes == 'int'
condition2 = data.dtypes == 'float'
numerical_vars = data.columns[condition1 | condition2].to_list()
# Standard scaling
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
# Fit and transform train set
X_train_resampled[numerical_vars] = scaler.fit_transform(X_train_resampled[numerical_vars])
# Transform test set
X_test[numerical_vars] = scaler.transform(X_test[numerical_vars])
# summary statistics of standardized variables
round(X_train_resampled.describe(),2)
onnet_mou_6 | onnet_mou_7 | onnet_mou_8 | offnet_mou_6 | offnet_mou_7 | offnet_mou_8 | roam_ic_mou_6 | roam_ic_mou_7 | roam_ic_mou_8 | roam_og_mou_6 | roam_og_mou_7 | roam_og_mou_8 | loc_og_t2t_mou_6 | loc_og_t2t_mou_7 | loc_og_t2t_mou_8 | loc_og_t2m_mou_6 | loc_og_t2m_mou_7 | loc_og_t2m_mou_8 | loc_og_t2f_mou_6 | loc_og_t2f_mou_7 | loc_og_t2f_mou_8 | loc_og_t2c_mou_6 | loc_og_t2c_mou_7 | loc_og_t2c_mou_8 | loc_og_mou_6 | loc_og_mou_7 | loc_og_mou_8 | std_og_t2t_mou_6 | std_og_t2t_mou_7 | std_og_t2t_mou_8 | std_og_t2m_mou_6 | std_og_t2m_mou_7 | std_og_t2m_mou_8 | std_og_t2f_mou_6 | std_og_t2f_mou_7 | std_og_t2f_mou_8 | std_og_mou_6 | std_og_mou_7 | std_og_mou_8 | isd_og_mou_6 | isd_og_mou_7 | isd_og_mou_8 | spl_og_mou_6 | spl_og_mou_7 | spl_og_mou_8 | og_others_6 | og_others_7 | og_others_8 | loc_ic_t2t_mou_6 | loc_ic_t2t_mou_7 | loc_ic_t2t_mou_8 | loc_ic_t2m_mou_6 | loc_ic_t2m_mou_7 | loc_ic_t2m_mou_8 | loc_ic_t2f_mou_6 | loc_ic_t2f_mou_7 | loc_ic_t2f_mou_8 | loc_ic_mou_6 | loc_ic_mou_7 | loc_ic_mou_8 | std_ic_t2t_mou_6 | std_ic_t2t_mou_7 | std_ic_t2t_mou_8 | std_ic_t2m_mou_6 | std_ic_t2m_mou_7 | std_ic_t2m_mou_8 | std_ic_t2f_mou_6 | std_ic_t2f_mou_7 | std_ic_t2f_mou_8 | std_ic_mou_6 | std_ic_mou_7 | std_ic_mou_8 | spl_ic_mou_6 | spl_ic_mou_7 | spl_ic_mou_8 | isd_ic_mou_6 | isd_ic_mou_7 | isd_ic_mou_8 | ic_others_6 | ic_others_7 | ic_others_8 | total_rech_num_6 | total_rech_num_7 | total_rech_num_8 | max_rech_amt_6 | max_rech_amt_7 | max_rech_amt_8 | last_day_rch_amt_6 | last_day_rch_amt_7 | last_day_rch_amt_8 | aon | Average_rech_amt_6n7 | delta_vol_2g | delta_vol_3g | delta_total_og_mou | delta_total_ic_mou | delta_vbc_3g | delta_arpu | delta_total_rech_amt | sachet_3g_6_0 | sachet_3g_6_1 | sachet_3g_6_2 | monthly_2g_7_0 | monthly_2g_7_1 | monthly_2g_7_2 | monthly_2g_8_0 | monthly_2g_8_1 | sachet_3g_8_0 | sachet_3g_8_1 | monthly_3g_7_0 | monthly_3g_7_1 | monthly_3g_7_2 | sachet_2g_6_0 | sachet_2g_6_1 | sachet_2g_6_2 | sachet_2g_6_3 | sachet_2g_6_4 | monthly_2g_6_0 | monthly_2g_6_1 | monthly_2g_6_2 | sachet_2g_7_0 | sachet_2g_7_1 | sachet_2g_7_2 | sachet_2g_7_3 | sachet_2g_7_4 | sachet_2g_7_5 | sachet_3g_7_0 | sachet_3g_7_1 | sachet_3g_7_2 | monthly_3g_8_0 | monthly_3g_8_1 | monthly_3g_8_2 | monthly_3g_6_0 | monthly_3g_6_1 | monthly_3g_6_2 | sachet_2g_8_0 | sachet_2g_8_1 | sachet_2g_8_2 | sachet_2g_8_3 | sachet_2g_8_4 | sachet_2g_8_5 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.0 | 38374.0 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 | 38374.00 |
mean | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | -0.00 | -0.00 | 0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | -0.00 | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | 0.0 | 0.0 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | -0.00 | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.00 | 0.00 | -0.00 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | -0.00 | 0.00 | -0.00 | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | -0.00 | -0.00 | -0.00 | 0.00 | 0.00 | -0.00 | -0.00 | 0.00 | -0.00 | -0.00 | 0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | -0.00 | 0.00 | -0.00 | 0.00 | 0.00 |
std | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 0.0 | 0.0 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
min | -0.73 | -0.68 | -0.53 | -0.94 | -0.89 | -0.70 | -0.31 | -0.32 | -0.33 | -0.33 | -0.36 | -0.36 | -0.50 | -0.49 | -0.42 | -0.75 | -0.73 | -0.59 | -0.38 | -0.38 | -0.33 | -0.37 | -0.37 | -0.31 | -0.76 | -0.74 | -0.60 | -0.57 | -0.54 | -0.40 | -0.60 | -0.57 | -0.43 | -0.22 | -0.22 | -0.19 | -0.79 | -0.74 | -0.53 | -0.20 | -0.18 | -0.15 | -0.51 | -0.53 | -0.43 | -0.44 | 0.0 | 0.0 | -0.61 | -0.57 | -0.48 | -0.77 | -0.75 | -0.61 | -0.40 | -0.39 | -0.35 | -0.80 | -0.77 | -0.63 | -0.45 | -0.43 | -0.33 | -0.51 | -0.47 | -0.39 | -0.26 | -0.26 | -0.22 | -0.56 | -0.52 | -0.41 | -0.47 | -0.21 | -0.20 | -0.28 | -0.27 | -0.22 | -0.27 | -0.25 | -0.22 | -1.50 | -1.37 | -1.02 | -1.08 | -1.04 | -0.86 | -0.93 | -0.85 | -0.64 | -1.01 | -0.91 | -28.14 | -27.16 | -11.65 | -14.66 | -22.68 | -14.96 | -14.68 | -3.43 | -0.16 | -0.08 | -3.10 | -0.25 | -0.09 | -3.78 | -0.23 | -4.62 | -0.14 | -2.78 | -0.23 | -0.13 | -1.95 | -0.22 | -0.14 | -0.11 | -0.09 | -2.99 | -0.24 | -0.08 | -1.99 | -0.21 | -0.14 | -0.10 | -0.09 | -0.08 | -3.44 | -0.16 | -0.07 | -3.33 | -0.22 | -0.12 | -2.66 | -0.23 | -0.12 | -2.11 | -0.23 | -0.14 | -0.11 | -0.10 | -0.09 |
25% | -0.63 | -0.60 | -0.52 | -0.66 | -0.65 | -0.66 | -0.31 | -0.32 | -0.33 | -0.33 | -0.36 | -0.36 | -0.45 | -0.45 | -0.42 | -0.63 | -0.63 | -0.59 | -0.38 | -0.38 | -0.33 | -0.37 | -0.37 | -0.31 | -0.63 | -0.63 | -0.60 | -0.57 | -0.54 | -0.40 | -0.59 | -0.56 | -0.43 | -0.22 | -0.22 | -0.19 | -0.77 | -0.72 | -0.53 | -0.20 | -0.18 | -0.15 | -0.51 | -0.53 | -0.43 | -0.44 | 0.0 | 0.0 | -0.53 | -0.51 | -0.48 | -0.62 | -0.61 | -0.61 | -0.40 | -0.39 | -0.35 | -0.63 | -0.62 | -0.63 | -0.45 | -0.43 | -0.33 | -0.49 | -0.46 | -0.39 | -0.26 | -0.26 | -0.22 | -0.51 | -0.48 | -0.41 | -0.47 | -0.21 | -0.20 | -0.28 | -0.27 | -0.22 | -0.27 | -0.25 | -0.22 | -0.68 | -0.66 | -0.63 | -0.40 | -0.45 | -0.70 | -0.64 | -0.65 | -0.64 | -0.73 | -0.68 | 0.12 | 0.11 | -0.41 | -0.22 | 0.08 | -0.51 | -0.48 | 0.29 | -0.16 | -0.08 | 0.32 | -0.25 | -0.09 | 0.26 | -0.23 | 0.22 | -0.14 | 0.36 | -0.23 | -0.13 | 0.51 | -0.22 | -0.14 | -0.11 | -0.09 | 0.33 | -0.24 | -0.08 | 0.50 | -0.21 | -0.14 | -0.10 | -0.09 | -0.08 | 0.29 | -0.16 | -0.07 | 0.30 | -0.22 | -0.12 | 0.38 | -0.23 | -0.12 | 0.47 | -0.23 | -0.14 | -0.11 | -0.10 | -0.09 |
50% | -0.42 | -0.41 | -0.40 | -0.33 | -0.33 | -0.36 | -0.31 | -0.32 | -0.33 | -0.33 | -0.36 | -0.36 | -0.32 | -0.32 | -0.35 | -0.37 | -0.37 | -0.43 | -0.38 | -0.38 | -0.33 | -0.37 | -0.37 | -0.31 | -0.36 | -0.37 | -0.43 | -0.50 | -0.48 | -0.40 | -0.45 | -0.45 | -0.41 | -0.22 | -0.22 | -0.19 | -0.42 | -0.45 | -0.49 | -0.20 | -0.18 | -0.15 | -0.43 | -0.43 | -0.43 | -0.44 | 0.0 | 0.0 | -0.34 | -0.33 | -0.37 | -0.34 | -0.35 | -0.40 | -0.36 | -0.35 | -0.35 | -0.34 | -0.35 | -0.40 | -0.38 | -0.36 | -0.33 | -0.34 | -0.34 | -0.35 | -0.26 | -0.26 | -0.22 | -0.33 | -0.33 | -0.35 | -0.47 | -0.21 | -0.20 | -0.28 | -0.27 | -0.22 | -0.27 | -0.25 | -0.22 | -0.25 | -0.30 | -0.32 | -0.33 | -0.25 | -0.07 | -0.16 | -0.34 | -0.43 | -0.39 | -0.35 | 0.15 | 0.11 | 0.23 | 0.13 | 0.08 | 0.08 | 0.03 | 0.29 | -0.16 | -0.08 | 0.32 | -0.25 | -0.09 | 0.26 | -0.23 | 0.22 | -0.14 | 0.36 | -0.23 | -0.13 | 0.51 | -0.22 | -0.14 | -0.11 | -0.09 | 0.33 | -0.24 | -0.08 | 0.50 | -0.21 | -0.14 | -0.10 | -0.09 | -0.08 | 0.29 | -0.16 | -0.07 | 0.30 | -0.22 | -0.12 | 0.38 | -0.23 | -0.12 | 0.47 | -0.23 | -0.14 | -0.11 | -0.10 | -0.09 |
75% | 0.20 | 0.15 | 0.01 | 0.27 | 0.26 | 0.23 | -0.27 | -0.25 | -0.22 | -0.28 | -0.21 | -0.20 | 0.01 | 0.00 | -0.04 | 0.23 | 0.22 | 0.16 | -0.13 | -0.13 | -0.20 | -0.24 | -0.21 | -0.31 | 0.24 | 0.24 | 0.17 | 0.08 | 0.01 | -0.20 | 0.10 | 0.07 | -0.10 | -0.22 | -0.22 | -0.19 | 0.45 | 0.39 | 0.07 | -0.20 | -0.18 | -0.15 | 0.05 | 0.07 | -0.05 | -0.11 | 0.0 | 0.0 | 0.09 | 0.06 | 0.03 | 0.20 | 0.20 | 0.17 | -0.09 | -0.11 | -0.15 | 0.23 | 0.21 | 0.20 | -0.02 | -0.04 | -0.14 | 0.02 | -0.00 | -0.07 | -0.25 | -0.26 | -0.22 | 0.05 | 0.03 | -0.04 | -0.14 | -0.21 | -0.20 | -0.26 | -0.26 | -0.22 | -0.21 | -0.23 | -0.22 | 0.37 | 0.40 | 0.28 | 0.06 | 0.02 | 0.22 | 0.14 | 0.34 | 0.58 | 0.39 | 0.30 | 0.15 | 0.11 | 0.50 | 0.36 | 0.08 | 0.58 | 0.58 | 0.29 | -0.16 | -0.08 | 0.32 | -0.25 | -0.09 | 0.26 | -0.23 | 0.22 | -0.14 | 0.36 | -0.23 | -0.13 | 0.51 | -0.22 | -0.14 | -0.11 | -0.09 | 0.33 | -0.24 | -0.08 | 0.50 | -0.21 | -0.14 | -0.10 | -0.09 | -0.08 | 0.29 | -0.16 | -0.07 | 0.30 | -0.22 | -0.12 | 0.38 | -0.23 | -0.12 | 0.47 | -0.23 | -0.14 | -0.11 | -0.10 | -0.09 |
max | 4.09 | 4.46 | 5.67 | 4.02 | 4.45 | 5.24 | 6.11 | 6.09 | 6.19 | 5.41 | 5.44 | 5.67 | 7.06 | 7.45 | 7.71 | 5.26 | 5.34 | 5.79 | 7.25 | 7.30 | 7.57 | 6.32 | 6.47 | 7.42 | 5.35 | 5.53 | 5.83 | 4.02 | 4.36 | 5.89 | 4.04 | 4.64 | 5.99 | 8.74 | 8.89 | 9.37 | 3.56 | 3.93 | 5.21 | 7.82 | 8.52 | 10.18 | 6.06 | 5.90 | 6.82 | 5.40 | 0.0 | 0.0 | 6.65 | 6.93 | 7.46 | 5.33 | 5.53 | 5.87 | 7.24 | 7.12 | 7.65 | 5.24 | 5.53 | 5.85 | 6.78 | 7.35 | 8.26 | 6.96 | 7.26 | 7.84 | 8.18 | 8.35 | 9.00 | 6.70 | 7.11 | 7.69 | 4.48 | 8.02 | 7.72 | 7.97 | 7.96 | 9.30 | 8.11 | 8.48 | 9.21 | 4.09 | 4.28 | 4.93 | 5.68 | 5.62 | 6.02 | 5.45 | 5.54 | 5.69 | 3.66 | 4.46 | 4.05 | 4.24 | 2.96 | 3.13 | 4.48 | 2.84 | 2.84 | 0.29 | 6.28 | 12.99 | 0.32 | 4.07 | 11.56 | 0.26 | 4.32 | 0.22 | 7.05 | 0.36 | 4.35 | 7.92 | 0.51 | 4.53 | 7.29 | 9.20 | 11.42 | 0.33 | 4.12 | 12.50 | 0.50 | 4.78 | 7.36 | 9.71 | 10.80 | 11.92 | 0.29 | 6.43 | 13.42 | 0.30 | 4.57 | 8.36 | 0.38 | 4.28 | 8.04 | 0.47 | 4.29 | 7.35 | 9.30 | 10.13 | 10.99 |
from sklearn.linear_model import LogisticRegression
baseline_model = LogisticRegression(random_state=100, class_weight='balanced') # `weight of class` balancing technique used
baseline_model = baseline_model.fit(X_train, y_train)
y_train_pred = baseline_model.predict_proba(X_train)[:,1]
y_test_pred = baseline_model.predict_proba(X_test)[:,1]
y_train_pred = pd.Series(y_train_pred,index = X_train.index, ) # converting test and train to a series to preserve index
y_test_pred = pd.Series(y_test_pred,index = X_test.index)
Baseline Performance
# Function for Baseline Performance Metrics
import math
def model_metrics(matrix) :
TN = matrix[0][0]
TP = matrix[1][1]
FP = matrix[0][1]
FN = matrix[1][0]
accuracy = round((TP + TN)/float(TP+TN+FP+FN),3)
print('Accuracy :' ,accuracy )
sensitivity = round(TP/float(FN + TP),3)
print('Sensitivity / True Positive Rate / Recall :', sensitivity)
specificity = round(TN/float(TN + FP),3)
print('Specificity / True Negative Rate : ', specificity)
precision = round(TP/float(TP + FP),3)
print('Precision / Positive Predictive Value :', precision)
print('F1-score :', round(2*precision*sensitivity/(precision + sensitivity),3))
# Prediction at threshold of 0.5
classification_threshold = 0.5
y_train_pred_classified = y_train_pred.map(lambda x : 1 if x > classification_threshold else 0)
y_test_pred_classified = y_test_pred.map(lambda x : 1 if x > classification_threshold else 0)
from sklearn.metrics import confusion_matrix
train_matrix = confusion_matrix(y_train, y_train_pred_classified)
print('Confusion Matrix for train:\n', train_matrix)
test_matrix = confusion_matrix(y_test, y_test_pred_classified)
print('\nConfusion Matrix for test: \n', test_matrix)
Confusion Matrix for train: [[16001 3186] [ 326 1494]] Confusion Matrix for test: [[6090 2141] [ 149 624]]
# Baseline Model Performance :
print('Train Performance : \n')
model_metrics(train_matrix)
print('\n\nTest Performance : \n')
model_metrics(test_matrix)
Train Performance : Accuracy : 0.833 Sensitivity / True Positive Rate / Recall : 0.821 Specificity / True Negative Rate : 0.834 Precision / Positive Predictive Value : 0.319 F1-score : 0.459 Test Performance : Accuracy : 0.746 Sensitivity / True Positive Rate / Recall : 0.807 Specificity / True Negative Rate : 0.74 Precision / Positive Predictive Value : 0.226 F1-score : 0.353
Baseline Performance - Finding Optimum Probability Cutoff
# Specificity / Sensitivity Tradeoff
# Classification at probability thresholds between 0 and 1
y_train_pred_thres = pd.DataFrame(index=X_train.index)
thresholds = [float(x)/10 for x in range(10)]
def thresholder(x, thresh) :
if x > thresh :
return 1
else :
return 0
for i in thresholds:
y_train_pred_thres[i]= y_train_pred.map(lambda x : thresholder(x,i))
y_train_pred_thres.head()
0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
---|---|---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||||
7000166926 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 |
7001343085 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7001863283 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7002275981 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
7001086221 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
# # sensitivity, specificity, accuracy for each threshold
metrics_df = pd.DataFrame(columns=['sensitivity', 'specificity', 'accuracy'])
# Function for calculation of metrics for each threshold
def model_metrics_thres(matrix) :
TN = matrix[0][0]
TP = matrix[1][1]
FP = matrix[0][1]
FN = matrix[1][0]
accuracy = round((TP + TN)/float(TP+TN+FP+FN),3)
sensitivity = round(TP/float(FN + TP),3)
specificity = round(TN/float(TN + FP),3)
return sensitivity,specificity,accuracy
# generating a data frame for metrics for each threshold
for thres,column in zip(thresholds,y_train_pred_thres.columns.to_list()) :
confusion = confusion_matrix(y_train, y_train_pred_thres.loc[:,column])
sensitivity,specificity,accuracy = model_metrics_thres(confusion)
metrics_df = metrics_df.append({
'sensitivity' :sensitivity,
'specificity' : specificity,
'accuracy' : accuracy
}, ignore_index = True)
metrics_df.index = thresholds
metrics_df
sensitivity | specificity | accuracy | |
---|---|---|---|
0.0 | 1.000 | 0.000 | 0.087 |
0.1 | 0.974 | 0.345 | 0.399 |
0.2 | 0.947 | 0.523 | 0.560 |
0.3 | 0.910 | 0.658 | 0.680 |
0.4 | 0.868 | 0.763 | 0.772 |
0.5 | 0.821 | 0.834 | 0.833 |
0.6 | 0.770 | 0.883 | 0.873 |
0.7 | 0.677 | 0.921 | 0.899 |
0.8 | 0.493 | 0.953 | 0.913 |
0.9 | 0.234 | 0.981 | 0.916 |
metrics_df.plot(kind='line', figsize=(24,8), grid=True, xticks=np.arange(0,1,0.02),
title='Specificity-Sensitivity TradeOff');
optimum_cutoff = 0.49
y_train_pred_final = y_train_pred.map(lambda x : 1 if x > optimum_cutoff else 0)
y_test_pred_final = y_test_pred.map(lambda x : 1 if x > optimum_cutoff else 0)
train_matrix = confusion_matrix(y_train, y_train_pred_final)
print('Confusion Matrix for train:\n', train_matrix)
test_matrix = confusion_matrix(y_test, y_test_pred_final)
print('\nConfusion Matrix for test: \n', test_matrix)
Confusion Matrix for train: [[15888 3299] [ 318 1502]] Confusion Matrix for test: [[1329 6902] [ 16 757]]
print('Train Performance: \n')
model_metrics(train_matrix)
print('\n\nTest Performance : \n')
model_metrics(test_matrix)
Train Performance: Accuracy : 0.828 Sensitivity / True Positive Rate / Recall : 0.825 Specificity / True Negative Rate : 0.828 Precision / Positive Predictive Value : 0.313 F1-score : 0.454 Test Performance : Accuracy : 0.232 Sensitivity / True Positive Rate / Recall : 0.979 Specificity / True Negative Rate : 0.161 Precision / Positive Predictive Value : 0.099 F1-score : 0.18
# ROC_AUC score
from sklearn.metrics import roc_auc_score
print('ROC AUC score for Train : ',round(roc_auc_score(y_train, y_train_pred),3), '\n' )
print('ROC AUC score for Test : ',round(roc_auc_score(y_test, y_test_pred),3) )
ROC AUC score for Train : 0.891 ROC AUC score for Test : 0.838
from sklearn.feature_selection import RFE
from sklearn.linear_model import LogisticRegression
lr = LogisticRegression(random_state=100 , class_weight='balanced')
rfe = RFE(lr, 15)
results = rfe.fit(X_train,y_train)
results.support_
array([False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, False, False, False, False, False, False, False, False, True, True, False, False, False, False, False, False, False, False, False, False, False, False, True, False, True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, True, False, False, True, False, False, True, False, False, False, True, False, False, True, False, False, False, False, True, False, False, True, False, False, False, False, False, False, False, False, True, False, False, False, False, False, True, False, False, False, False, False])
# DataFrame with features supported by RFE
rfe_support = pd.DataFrame({'Column' : X.columns.to_list(), 'Rank' : rfe.ranking_,
'Support' : rfe.support_}).sort_values(by=
'Rank', ascending=True)
rfe_support
Column | Rank | Support | |
---|---|---|---|
99 | sachet_3g_6_0 | 1 | True |
120 | sachet_2g_7_0 | 1 | True |
102 | monthly_2g_7_0 | 1 | True |
135 | sachet_2g_8_0 | 1 | True |
81 | total_rech_num_6 | 1 | True |
129 | monthly_3g_8_0 | 1 | True |
105 | monthly_2g_8_0 | 1 | True |
83 | total_rech_num_8 | 1 | True |
117 | monthly_2g_6_0 | 1 | True |
68 | std_ic_t2f_mou_8 | 1 | True |
67 | std_ic_t2f_mou_7 | 1 | True |
112 | sachet_2g_6_0 | 1 | True |
109 | monthly_3g_7_0 | 1 | True |
56 | loc_ic_t2f_mou_8 | 1 | True |
35 | std_og_t2f_mou_8 | 1 | True |
40 | isd_og_mou_7 | 2 | False |
53 | loc_ic_t2m_mou_8 | 3 | False |
19 | loc_og_t2f_mou_7 | 4 | False |
62 | std_ic_t2t_mou_8 | 5 | False |
61 | std_ic_t2t_mou_7 | 6 | False |
107 | sachet_3g_8_0 | 7 | False |
41 | isd_og_mou_8 | 8 | False |
89 | last_day_rch_amt_8 | 9 | False |
11 | roam_og_mou_8 | 10 | False |
132 | monthly_3g_6_0 | 11 | False |
39 | isd_og_mou_6 | 12 | False |
79 | ic_others_7 | 13 | False |
50 | loc_ic_t2t_mou_8 | 14 | False |
7 | roam_ic_mou_7 | 15 | False |
58 | loc_ic_mou_7 | 16 | False |
71 | std_ic_mou_8 | 17 | False |
75 | isd_ic_mou_6 | 18 | False |
33 | std_og_t2f_mou_6 | 19 | False |
38 | std_og_mou_8 | 20 | False |
66 | std_ic_t2f_mou_6 | 21 | False |
29 | std_og_t2t_mou_8 | 22 | False |
32 | std_og_t2m_mou_8 | 23 | False |
78 | ic_others_6 | 24 | False |
44 | spl_og_mou_8 | 25 | False |
97 | delta_arpu | 26 | False |
85 | max_rech_amt_7 | 27 | False |
70 | std_ic_mou_7 | 28 | False |
64 | std_ic_t2m_mou_7 | 29 | False |
30 | std_og_t2m_mou_6 | 30 | False |
42 | spl_og_mou_6 | 31 | False |
27 | std_og_t2t_mou_6 | 32 | False |
18 | loc_og_t2f_mou_6 | 33 | False |
60 | std_ic_t2t_mou_6 | 34 | False |
36 | std_og_mou_6 | 35 | False |
51 | loc_ic_t2m_mou_6 | 36 | False |
15 | loc_og_t2m_mou_6 | 37 | False |
94 | delta_total_og_mou | 38 | False |
69 | std_ic_mou_6 | 39 | False |
65 | std_ic_t2m_mou_8 | 40 | False |
2 | onnet_mou_8 | 41 | False |
55 | loc_ic_t2f_mou_7 | 42 | False |
28 | std_og_t2t_mou_7 | 43 | False |
13 | loc_og_t2t_mou_7 | 44 | False |
1 | onnet_mou_7 | 45 | False |
9 | roam_og_mou_6 | 46 | False |
21 | loc_og_t2c_mou_6 | 47 | False |
14 | loc_og_t2t_mou_8 | 48 | False |
84 | max_rech_amt_6 | 49 | False |
26 | loc_og_mou_8 | 50 | False |
8 | roam_ic_mou_8 | 51 | False |
10 | roam_og_mou_7 | 52 | False |
48 | loc_ic_t2t_mou_6 | 53 | False |
57 | loc_ic_mou_6 | 54 | False |
6 | roam_ic_mou_6 | 55 | False |
106 | monthly_2g_8_1 | 56 | False |
87 | last_day_rch_amt_6 | 57 | False |
49 | loc_ic_t2t_mou_7 | 58 | False |
98 | delta_total_rech_amt | 59 | False |
88 | last_day_rch_amt_7 | 60 | False |
34 | std_og_t2f_mou_7 | 61 | False |
126 | sachet_3g_7_0 | 62 | False |
23 | loc_og_t2c_mou_8 | 63 | False |
103 | monthly_2g_7_1 | 64 | False |
118 | monthly_2g_6_1 | 65 | False |
92 | delta_vol_2g | 66 | False |
16 | loc_og_t2m_mou_7 | 67 | False |
4 | offnet_mou_7 | 68 | False |
43 | spl_og_mou_7 | 69 | False |
130 | monthly_3g_8_1 | 70 | False |
20 | loc_og_t2f_mou_8 | 71 | False |
17 | loc_og_t2m_mou_8 | 72 | False |
63 | std_ic_t2m_mou_6 | 73 | False |
93 | delta_vol_3g | 74 | False |
76 | isd_ic_mou_7 | 75 | False |
24 | loc_og_mou_6 | 76 | False |
12 | loc_og_t2t_mou_6 | 77 | False |
54 | loc_ic_t2f_mou_6 | 78 | False |
0 | onnet_mou_6 | 79 | False |
3 | offnet_mou_6 | 80 | False |
77 | isd_ic_mou_8 | 81 | False |
5 | offnet_mou_8 | 82 | False |
22 | loc_og_t2c_mou_7 | 83 | False |
95 | delta_total_ic_mou | 84 | False |
52 | loc_ic_t2m_mou_7 | 85 | False |
59 | loc_ic_mou_8 | 86 | False |
90 | aon | 87 | False |
74 | spl_ic_mou_8 | 88 | False |
136 | sachet_2g_8_1 | 89 | False |
121 | sachet_2g_7_1 | 90 | False |
113 | sachet_2g_6_1 | 91 | False |
108 | sachet_3g_8_1 | 92 | False |
80 | ic_others_8 | 93 | False |
137 | sachet_2g_8_2 | 94 | False |
138 | sachet_2g_8_3 | 95 | False |
114 | sachet_2g_6_2 | 96 | False |
123 | sachet_2g_7_3 | 97 | False |
133 | monthly_3g_6_1 | 98 | False |
125 | sachet_2g_7_5 | 99 | False |
131 | monthly_3g_8_2 | 100 | False |
119 | monthly_2g_6_2 | 101 | False |
25 | loc_og_mou_7 | 102 | False |
104 | monthly_2g_7_2 | 103 | False |
110 | monthly_3g_7_1 | 104 | False |
100 | sachet_3g_6_1 | 105 | False |
139 | sachet_2g_8_4 | 106 | False |
134 | monthly_3g_6_2 | 107 | False |
111 | monthly_3g_7_2 | 108 | False |
37 | std_og_mou_7 | 109 | False |
31 | std_og_t2m_mou_7 | 110 | False |
140 | sachet_2g_8_5 | 111 | False |
101 | sachet_3g_6_2 | 112 | False |
72 | spl_ic_mou_6 | 113 | False |
86 | max_rech_amt_8 | 114 | False |
73 | spl_ic_mou_7 | 115 | False |
96 | delta_vbc_3g | 116 | False |
82 | total_rech_num_7 | 117 | False |
115 | sachet_2g_6_3 | 118 | False |
124 | sachet_2g_7_4 | 119 | False |
127 | sachet_3g_7_1 | 120 | False |
91 | Average_rech_amt_6n7 | 121 | False |
45 | og_others_6 | 122 | False |
116 | sachet_2g_6_4 | 123 | False |
128 | sachet_3g_7_2 | 124 | False |
122 | sachet_2g_7_2 | 125 | False |
47 | og_others_8 | 126 | False |
46 | og_others_7 | 127 | False |
# RFE Selected columns
rfe_selected_columns = rfe_support.loc[rfe_support['Rank'] == 1,'Column'].to_list()
rfe_selected_columns
['sachet_3g_6_0', 'sachet_2g_7_0', 'monthly_2g_7_0', 'sachet_2g_8_0', 'total_rech_num_6', 'monthly_3g_8_0', 'monthly_2g_8_0', 'total_rech_num_8', 'monthly_2g_6_0', 'std_ic_t2f_mou_8', 'std_ic_t2f_mou_7', 'sachet_2g_6_0', 'monthly_3g_7_0', 'loc_ic_t2f_mou_8', 'std_og_t2f_mou_8']
# Logistic Regression Model with RFE columns
import statsmodels.api as sm
# Note that the SMOTE resampled Train set is used with statsmodels.api.GLM since it doesnot support class_weight
logr = sm.GLM(y_train_resampled,(sm.add_constant(X_train_resampled[rfe_selected_columns])), family = sm.families.Binomial())
logr_fit = logr.fit()
logr_fit.summary()
Dep. Variable: | Churn | No. Observations: | 38374 |
---|---|---|---|
Model: | GLM | Df Residuals: | 38358 |
Model Family: | Binomial | Df Model: | 15 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -19485. |
Date: | Mon, 30 Nov 2020 | Deviance: | 38969. |
Time: | 21:57:09 | Pearson chi2: | 2.80e+05 |
No. Iterations: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -0.2334 | 0.015 | -15.657 | 0.000 | -0.263 | -0.204 |
sachet_3g_6_0 | -0.0396 | 0.014 | -2.886 | 0.004 | -0.066 | -0.013 |
sachet_2g_7_0 | -0.0980 | 0.016 | -6.201 | 0.000 | -0.129 | -0.067 |
monthly_2g_7_0 | 0.0096 | 0.016 | 0.594 | 0.552 | -0.022 | 0.041 |
sachet_2g_8_0 | 0.0489 | 0.015 | 3.359 | 0.001 | 0.020 | 0.077 |
total_rech_num_6 | 0.6047 | 0.017 | 35.547 | 0.000 | 0.571 | 0.638 |
monthly_3g_8_0 | 0.3993 | 0.017 | 23.439 | 0.000 | 0.366 | 0.433 |
monthly_2g_8_0 | 0.3697 | 0.018 | 21.100 | 0.000 | 0.335 | 0.404 |
total_rech_num_8 | -1.2013 | 0.019 | -62.378 | 0.000 | -1.239 | -1.164 |
monthly_2g_6_0 | -0.0194 | 0.015 | -1.262 | 0.207 | -0.050 | 0.011 |
std_ic_t2f_mou_8 | -0.3364 | 0.026 | -12.792 | 0.000 | -0.388 | -0.285 |
std_ic_t2f_mou_7 | 0.1535 | 0.019 | 8.148 | 0.000 | 0.117 | 0.190 |
sachet_2g_6_0 | -0.1117 | 0.016 | -6.847 | 0.000 | -0.144 | -0.080 |
monthly_3g_7_0 | -0.2094 | 0.017 | -12.602 | 0.000 | -0.242 | -0.177 |
loc_ic_t2f_mou_8 | -1.2743 | 0.038 | -33.599 | 0.000 | -1.349 | -1.200 |
std_og_t2f_mou_8 | -0.2476 | 0.021 | -11.621 | 0.000 | -0.289 | -0.206 |
# Using P-value and vif for manual feature elimination
from statsmodels.stats.outliers_influence import variance_inflation_factor
def vif(X_train_resampled, logr_fit, selected_columns) :
vif = pd.DataFrame()
vif['Features'] = rfe_selected_columns
vif['VIF'] = [variance_inflation_factor(X_train_resampled[selected_columns].values, i) for i in range(X_train_resampled[selected_columns].shape[1])]
vif['VIF'] = round(vif['VIF'], 2)
vif = vif.set_index('Features')
vif['P-value'] = round(logr_fit.pvalues,4)
vif = vif.sort_values(by = ["VIF",'P-value'], ascending = [False,False])
return vif
vif(X_train_resampled, logr_fit, rfe_selected_columns)
VIF | P-value | |
---|---|---|
Features | ||
std_ic_t2f_mou_8 | 1.66 | 0.0000 |
sachet_2g_6_0 | 1.64 | 0.0000 |
sachet_2g_7_0 | 1.57 | 0.0000 |
std_ic_t2f_mou_7 | 1.56 | 0.0000 |
monthly_2g_7_0 | 1.54 | 0.5524 |
monthly_3g_7_0 | 1.54 | 0.0000 |
monthly_3g_8_0 | 1.52 | 0.0000 |
monthly_2g_8_0 | 1.43 | 0.0000 |
monthly_2g_6_0 | 1.38 | 0.2069 |
sachet_2g_8_0 | 1.36 | 0.0008 |
total_rech_num_6 | 1.27 | 0.0000 |
total_rech_num_8 | 1.25 | 0.0000 |
std_og_t2f_mou_8 | 1.20 | 0.0000 |
sachet_3g_6_0 | 1.12 | 0.0039 |
loc_ic_t2f_mou_8 | 1.09 | 0.0000 |
selected_columns = rfe_selected_columns
selected_columns.remove('monthly_2g_7_0')
selected_columns
['sachet_3g_6_0', 'sachet_2g_7_0', 'sachet_2g_8_0', 'total_rech_num_6', 'monthly_3g_8_0', 'monthly_2g_8_0', 'total_rech_num_8', 'monthly_2g_6_0', 'std_ic_t2f_mou_8', 'std_ic_t2f_mou_7', 'sachet_2g_6_0', 'monthly_3g_7_0', 'loc_ic_t2f_mou_8', 'std_og_t2f_mou_8']
logr2 = sm.GLM(y_train_resampled,(sm.add_constant(X_train_resampled[selected_columns])), family = sm.families.Binomial())
logr2_fit = logr2.fit()
logr2_fit.summary()
Dep. Variable: | Churn | No. Observations: | 38374 |
---|---|---|---|
Model: | GLM | Df Residuals: | 38359 |
Model Family: | Binomial | Df Model: | 14 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -19485. |
Date: | Mon, 30 Nov 2020 | Deviance: | 38970. |
Time: | 21:57:09 | Pearson chi2: | 2.80e+05 |
No. Iterations: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -0.2335 | 0.015 | -15.662 | 0.000 | -0.263 | -0.204 |
sachet_3g_6_0 | -0.0395 | 0.014 | -2.881 | 0.004 | -0.066 | -0.013 |
sachet_2g_7_0 | -0.0982 | 0.016 | -6.217 | 0.000 | -0.129 | -0.067 |
sachet_2g_8_0 | 0.0491 | 0.015 | 3.372 | 0.001 | 0.021 | 0.078 |
total_rech_num_6 | 0.6049 | 0.017 | 35.566 | 0.000 | 0.572 | 0.638 |
monthly_3g_8_0 | 0.4000 | 0.017 | 23.521 | 0.000 | 0.367 | 0.433 |
monthly_2g_8_0 | 0.3733 | 0.016 | 22.696 | 0.000 | 0.341 | 0.406 |
total_rech_num_8 | -1.2012 | 0.019 | -62.375 | 0.000 | -1.239 | -1.163 |
monthly_2g_6_0 | -0.0163 | 0.014 | -1.128 | 0.259 | -0.045 | 0.012 |
std_ic_t2f_mou_8 | -0.3361 | 0.026 | -12.784 | 0.000 | -0.388 | -0.285 |
std_ic_t2f_mou_7 | 0.1532 | 0.019 | 8.136 | 0.000 | 0.116 | 0.190 |
sachet_2g_6_0 | -0.1111 | 0.016 | -6.823 | 0.000 | -0.143 | -0.079 |
monthly_3g_7_0 | -0.2098 | 0.017 | -12.633 | 0.000 | -0.242 | -0.177 |
loc_ic_t2f_mou_8 | -1.2749 | 0.038 | -33.622 | 0.000 | -1.349 | -1.201 |
std_og_t2f_mou_8 | -0.2476 | 0.021 | -11.620 | 0.000 | -0.289 | -0.206 |
# vif and p-values
vif(X_train_resampled, logr2_fit, selected_columns)
VIF | P-value | |
---|---|---|
Features | ||
std_ic_t2f_mou_8 | 1.66 | 0.0000 |
sachet_2g_6_0 | 1.63 | 0.0000 |
sachet_2g_7_0 | 1.57 | 0.0000 |
std_ic_t2f_mou_7 | 1.56 | 0.0000 |
monthly_3g_7_0 | 1.54 | 0.0000 |
monthly_3g_8_0 | 1.52 | 0.0000 |
sachet_2g_8_0 | 1.36 | 0.0007 |
total_rech_num_6 | 1.27 | 0.0000 |
total_rech_num_8 | 1.25 | 0.0000 |
monthly_2g_8_0 | 1.23 | 0.0000 |
monthly_2g_6_0 | 1.21 | 0.2595 |
std_og_t2f_mou_8 | 1.20 | 0.0000 |
sachet_3g_6_0 | 1.12 | 0.0040 |
loc_ic_t2f_mou_8 | 1.09 | 0.0000 |
selected_columns.remove('monthly_2g_6_0')
selected_columns
['sachet_3g_6_0', 'sachet_2g_7_0', 'sachet_2g_8_0', 'total_rech_num_6', 'monthly_3g_8_0', 'monthly_2g_8_0', 'total_rech_num_8', 'std_ic_t2f_mou_8', 'std_ic_t2f_mou_7', 'sachet_2g_6_0', 'monthly_3g_7_0', 'loc_ic_t2f_mou_8', 'std_og_t2f_mou_8']
logr3 = sm.GLM(y_train_resampled,(sm.add_constant(X_train_resampled[selected_columns])), family = sm.families.Binomial())
logr3_fit = logr3.fit()
logr3_fit.summary()
Dep. Variable: | Churn | No. Observations: | 38374 |
---|---|---|---|
Model: | GLM | Df Residuals: | 38360 |
Model Family: | Binomial | Df Model: | 13 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -19486. |
Date: | Mon, 30 Nov 2020 | Deviance: | 38971. |
Time: | 21:57:10 | Pearson chi2: | 2.79e+05 |
No. Iterations: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -0.2336 | 0.015 | -15.667 | 0.000 | -0.263 | -0.204 |
sachet_3g_6_0 | -0.0399 | 0.014 | -2.916 | 0.004 | -0.067 | -0.013 |
sachet_2g_7_0 | -0.0987 | 0.016 | -6.249 | 0.000 | -0.130 | -0.068 |
sachet_2g_8_0 | 0.0488 | 0.015 | 3.354 | 0.001 | 0.020 | 0.077 |
total_rech_num_6 | 0.6053 | 0.017 | 35.581 | 0.000 | 0.572 | 0.639 |
monthly_3g_8_0 | 0.3994 | 0.017 | 23.494 | 0.000 | 0.366 | 0.433 |
monthly_2g_8_0 | 0.3666 | 0.015 | 23.953 | 0.000 | 0.337 | 0.397 |
total_rech_num_8 | -1.2033 | 0.019 | -62.720 | 0.000 | -1.241 | -1.166 |
std_ic_t2f_mou_8 | -0.3363 | 0.026 | -12.788 | 0.000 | -0.388 | -0.285 |
std_ic_t2f_mou_7 | 0.1532 | 0.019 | 8.137 | 0.000 | 0.116 | 0.190 |
sachet_2g_6_0 | -0.1108 | 0.016 | -6.810 | 0.000 | -0.143 | -0.079 |
monthly_3g_7_0 | -0.2099 | 0.017 | -12.640 | 0.000 | -0.242 | -0.177 |
loc_ic_t2f_mou_8 | -1.2736 | 0.038 | -33.621 | 0.000 | -1.348 | -1.199 |
std_og_t2f_mou_8 | -0.2474 | 0.021 | -11.617 | 0.000 | -0.289 | -0.206 |
# vif and p-values
vif(X_train_resampled, logr3_fit, selected_columns)
VIF | P-value | |
---|---|---|
Features | ||
std_ic_t2f_mou_8 | 1.66 | 0.0000 |
sachet_2g_6_0 | 1.63 | 0.0000 |
sachet_2g_7_0 | 1.57 | 0.0000 |
std_ic_t2f_mou_7 | 1.56 | 0.0000 |
monthly_3g_7_0 | 1.54 | 0.0000 |
monthly_3g_8_0 | 1.52 | 0.0000 |
sachet_2g_8_0 | 1.36 | 0.0008 |
total_rech_num_6 | 1.27 | 0.0000 |
total_rech_num_8 | 1.24 | 0.0000 |
std_og_t2f_mou_8 | 1.20 | 0.0000 |
sachet_3g_6_0 | 1.12 | 0.0035 |
loc_ic_t2f_mou_8 | 1.09 | 0.0000 |
monthly_2g_8_0 | 1.03 | 0.0000 |
logr3_fit.summary()
Dep. Variable: | Churn | No. Observations: | 38374 |
---|---|---|---|
Model: | GLM | Df Residuals: | 38360 |
Model Family: | Binomial | Df Model: | 13 |
Link Function: | logit | Scale: | 1.0000 |
Method: | IRLS | Log-Likelihood: | -19486. |
Date: | Mon, 30 Nov 2020 | Deviance: | 38971. |
Time: | 21:57:10 | Pearson chi2: | 2.79e+05 |
No. Iterations: | 7 | ||
Covariance Type: | nonrobust |
coef | std err | z | P>|z| | [0.025 | 0.975] | |
---|---|---|---|---|---|---|
const | -0.2336 | 0.015 | -15.667 | 0.000 | -0.263 | -0.204 |
sachet_3g_6_0 | -0.0399 | 0.014 | -2.916 | 0.004 | -0.067 | -0.013 |
sachet_2g_7_0 | -0.0987 | 0.016 | -6.249 | 0.000 | -0.130 | -0.068 |
sachet_2g_8_0 | 0.0488 | 0.015 | 3.354 | 0.001 | 0.020 | 0.077 |
total_rech_num_6 | 0.6053 | 0.017 | 35.581 | 0.000 | 0.572 | 0.639 |
monthly_3g_8_0 | 0.3994 | 0.017 | 23.494 | 0.000 | 0.366 | 0.433 |
monthly_2g_8_0 | 0.3666 | 0.015 | 23.953 | 0.000 | 0.337 | 0.397 |
total_rech_num_8 | -1.2033 | 0.019 | -62.720 | 0.000 | -1.241 | -1.166 |
std_ic_t2f_mou_8 | -0.3363 | 0.026 | -12.788 | 0.000 | -0.388 | -0.285 |
std_ic_t2f_mou_7 | 0.1532 | 0.019 | 8.137 | 0.000 | 0.116 | 0.190 |
sachet_2g_6_0 | -0.1108 | 0.016 | -6.810 | 0.000 | -0.143 | -0.079 |
monthly_3g_7_0 | -0.2099 | 0.017 | -12.640 | 0.000 | -0.242 | -0.177 |
loc_ic_t2f_mou_8 | -1.2736 | 0.038 | -33.621 | 0.000 | -1.348 | -1.199 |
std_og_t2f_mou_8 | -0.2474 | 0.021 | -11.617 | 0.000 | -0.289 | -0.206 |
selected_columns
['sachet_3g_6_0', 'sachet_2g_7_0', 'sachet_2g_8_0', 'total_rech_num_6', 'monthly_3g_8_0', 'monthly_2g_8_0', 'total_rech_num_8', 'std_ic_t2f_mou_8', 'std_ic_t2f_mou_7', 'sachet_2g_6_0', 'monthly_3g_7_0', 'loc_ic_t2f_mou_8', 'std_og_t2f_mou_8']
# Prediction
y_train_pred_lr = logr3_fit.predict(sm.add_constant(X_train_resampled[selected_columns]))
y_train_pred_lr.head()
0 0.118916 1 0.343873 2 0.381230 3 0.015277 4 0.001595 dtype: float64
y_test_pred_lr = logr3_fit.predict(sm.add_constant(X_test[selected_columns]))
y_test_pred_lr.head()
mobile_number 7002242818 0.013556 7000517161 0.903162 7002162382 0.247123 7002152271 0.330787 7002058655 0.056105 dtype: float64
Finding Optimum Probability Cutoff
# Specificity / Sensitivity Tradeoff
# Classification at probability thresholds between 0 and 1
y_train_pred_thres = pd.DataFrame(index=X_train_resampled.index)
thresholds = [float(x)/10 for x in range(10)]
def thresholder(x, thresh) :
if x > thresh :
return 1
else :
return 0
for i in thresholds:
y_train_pred_thres[i]= y_train_pred_lr.map(lambda x : thresholder(x,i))
y_train_pred_thres.head()
0.0 | 0.1 | 0.2 | 0.3 | 0.4 | 0.5 | 0.6 | 0.7 | 0.8 | 0.9 | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
# DataFrame for Performance metrics at each threshold
logr_metrics_df = pd.DataFrame(columns=['sensitivity', 'specificity', 'accuracy'])
for thres,column in zip(thresholds,y_train_pred_thres.columns.to_list()) :
confusion = confusion_matrix(y_train_resampled, y_train_pred_thres.loc[:,column])
sensitivity,specificity,accuracy = model_metrics_thres(confusion)
logr_metrics_df = logr_metrics_df.append({
'sensitivity' :sensitivity,
'specificity' : specificity,
'accuracy' : accuracy
}, ignore_index = True)
logr_metrics_df.index = thresholds
logr_metrics_df
sensitivity | specificity | accuracy | |
---|---|---|---|
0.0 | 1.000 | 0.000 | 0.500 |
0.1 | 0.976 | 0.224 | 0.600 |
0.2 | 0.947 | 0.351 | 0.649 |
0.3 | 0.916 | 0.472 | 0.694 |
0.4 | 0.864 | 0.598 | 0.731 |
0.5 | 0.794 | 0.722 | 0.758 |
0.6 | 0.703 | 0.841 | 0.772 |
0.7 | 0.550 | 0.930 | 0.740 |
0.8 | 0.310 | 0.975 | 0.642 |
0.9 | 0.095 | 0.994 | 0.544 |
logr_metrics_df.plot(kind='line', figsize=(24,8), grid=True, xticks=np.arange(0,1,0.02),
title='Specificity-Sensitivity TradeOff');
optimum_cutoff = 0.53
y_train_pred_lr_final = y_train_pred_lr.map(lambda x : 1 if x > optimum_cutoff else 0)
y_test_pred_lr_final = y_test_pred_lr.map(lambda x : 1 if x > optimum_cutoff else 0)
train_matrix = confusion_matrix(y_train_resampled, y_train_pred_lr_final)
print('Confusion Matrix for train:\n', train_matrix)
test_matrix = confusion_matrix(y_test, y_test_pred_lr_final)
print('\nConfusion Matrix for test: \n', test_matrix)
Confusion Matrix for train: [[14531 4656] [ 4411 14776]] Confusion Matrix for test: [[6313 1918] [ 191 582]]
print('Train Performance: \n')
model_metrics(train_matrix)
print('\n\nTest Performance : \n')
model_metrics(test_matrix)
Train Performance: Accuracy : 0.764 Sensitivity / True Positive Rate / Recall : 0.77 Specificity / True Negative Rate : 0.757 Precision / Positive Predictive Value : 0.76 F1-score : 0.765 Test Performance : Accuracy : 0.766 Sensitivity / True Positive Rate / Recall : 0.753 Specificity / True Negative Rate : 0.767 Precision / Positive Predictive Value : 0.233 F1-score : 0.356
# ROC_AUC score
print('ROC AUC score for Train : ',round(roc_auc_score(y_train_resampled, y_train_pred_lr),3), '\n' )
print('ROC AUC score for Test : ',round(roc_auc_score(y_test, y_test_pred_lr),3) )
ROC AUC score for Train : 0.843 ROC AUC score for Test : 0.828
lr_summary_html = logr3_fit.summary().tables[1].as_html()
lr_results = pd.read_html(lr_summary_html, header=0, index_col=0)[0]
coef_column = lr_results.columns[0]
print('Most important predictors of Churn , in order of importance and their coefficients are as follows : \n')
lr_results.sort_values(by=coef_column, key=lambda x: abs(x), ascending=False)['coef']
Most important predictors of Churn , in order of importance and their coefficients are as follows :
loc_ic_t2f_mou_8 -1.2736 total_rech_num_8 -1.2033 total_rech_num_6 0.6053 monthly_3g_8_0 0.3994 monthly_2g_8_0 0.3666 std_ic_t2f_mou_8 -0.3363 std_og_t2f_mou_8 -0.2474 const -0.2336 monthly_3g_7_0 -0.2099 std_ic_t2f_mou_7 0.1532 sachet_2g_6_0 -0.1108 sachet_2g_7_0 -0.0987 sachet_2g_8_0 0.0488 sachet_3g_6_0 -0.0399 Name: coef, dtype: float64
from sklearn.decomposition import PCA
pca = PCA(random_state = 42)
pca.fit(X_train) # note that pca is fit on original train set instead of resampled train set.
pca.components_
array([[ 1.64887430e-01, 1.93987506e-01, 1.67239205e-01, ..., 1.43967238e-06, -1.55704675e-06, -1.88892194e-06], [ 6.48591961e-02, 9.55966684e-02, 1.20775174e-01, ..., -2.12841595e-06, -1.47944145e-06, -3.90881587e-07], [ 2.38415388e-01, 2.73645507e-01, 2.38436263e-01, ..., -1.25598531e-06, -4.37900299e-07, 6.19889336e-07], ..., [ 1.68015588e-06, 1.93600851e-06, -1.82065762e-06, ..., 4.25473944e-03, 2.56738368e-03, 3.51118176e-03], [ 0.00000000e+00, -1.11533905e-16, 1.57807487e-16, ..., 1.73764144e-15, 6.22907679e-16, 1.45339158e-16], [ 0.00000000e+00, 4.98537742e-16, -6.02718139e-16, ..., 1.27514583e-15, 1.25772226e-15, 3.41773342e-16]])
pca.explained_variance_ratio_
array([2.72067612e-01, 1.62438240e-01, 1.20827535e-01, 1.06070063e-01, 9.11349433e-02, 4.77504400e-02, 2.63978655e-02, 2.56843982e-02, 1.91789343e-02, 1.68045932e-02, 1.55523468e-02, 1.31676589e-02, 1.04552128e-02, 7.72970448e-03, 7.22746863e-03, 6.14494838e-03, 5.62073089e-03, 5.44579273e-03, 4.59009989e-03, 4.38488162e-03, 3.46703626e-03, 3.27941490e-03, 2.78099200e-03, 2.13444270e-03, 2.07542043e-03, 1.89794720e-03, 1.41383936e-03, 1.30240760e-03, 1.15369576e-03, 1.05262500e-03, 9.64293417e-04, 9.16686049e-04, 8.84067044e-04, 7.62966236e-04, 6.61794767e-04, 5.69667265e-04, 5.12585166e-04, 5.04441248e-04, 4.82396680e-04, 4.46889495e-04, 4.36441254e-04, 4.10389488e-04, 3.51844810e-04, 3.12626195e-04, 2.51673027e-04, 2.34723896e-04, 1.96950034e-04, 1.71296745e-04, 1.59882693e-04, 1.48330353e-04, 1.45919483e-04, 1.08583729e-04, 1.04038518e-04, 8.90621848e-05, 8.53009223e-05, 7.60704088e-05, 7.57150133e-05, 6.16615717e-05, 6.07777411e-05, 5.70517541e-05, 5.36161089e-05, 5.28495367e-05, 5.14887086e-05, 4.73768570e-05, 4.71283394e-05, 4.11523975e-05, 4.10392906e-05, 2.86090257e-05, 2.19793282e-05, 1.58203581e-05, 1.50969788e-05, 1.42865579e-05, 1.34537530e-05, 1.33026062e-05, 1.10239870e-05, 8.27539516e-06, 7.55845974e-06, 6.45372276e-06, 6.22570067e-06, 3.42288900e-06, 3.20804681e-06, 3.09270863e-06, 2.86608967e-06, 2.44898003e-06, 2.08230568e-06, 1.85144734e-06, 1.64714248e-06, 1.45630245e-06, 1.35265729e-06, 1.05472047e-06, 9.89133015e-07, 8.65864423e-07, 7.45065121e-07, 3.66727807e-07, 6.49277820e-08, 6.13357428e-08, 4.35995018e-08, 2.28152900e-08, 2.00441141e-08, 1.84235145e-08, 1.66102335e-08, 1.47870989e-08, 1.23390691e-08, 1.12094165e-08, 1.09702422e-08, 9.51924270e-09, 8.61596309e-09, 7.38051070e-09, 7.15370081e-09, 6.29095319e-09, 5.00739371e-09, 4.68791660e-09, 4.23376173e-09, 4.04558169e-09, 3.75847771e-09, 3.71213838e-09, 3.32806929e-09, 3.23527525e-09, 3.12734302e-09, 2.82062311e-09, 2.72602311e-09, 2.66103741e-09, 2.46562734e-09, 2.20243536e-09, 2.15044476e-09, 1.59498492e-09, 1.47087974e-09, 1.06159357e-09, 9.33938436e-10, 8.10080735e-10, 8.04656028e-10, 6.12994365e-10, 4.82074297e-10, 4.02577318e-10, 3.58059984e-10, 3.28374076e-10, 3.03687605e-10, 7.12091816e-11, 6.13978255e-11, 1.04375208e-33, 1.04375208e-33])
var_cum = np.cumsum(pca.explained_variance_ratio_)
plt.figure(figsize=(20,8))
sns.set_style('darkgrid')
sns.lineplot(np.arange(1,len(var_cum) + 1), var_cum)
plt.xticks(np.arange(0,140,5))
plt.axhline(0.95,color='r')
plt.axhline(1.0,color='r')
plt.axvline(15,color='b')
plt.axvline(45,color='b')
plt.text(10,0.96,'0.95')
plt.title('Scree Plot of Telecom Churn Train Set');
# Perform PCA using the first 45 components
pca_final = PCA(n_components=45, random_state=42)
transformed_data = pca_final.fit_transform(X_train)
X_train_pca = pd.DataFrame(transformed_data, columns=["PC_"+str(x) for x in range(1,46)], index = X_train.index)
data_train_pca = pd.concat([X_train_pca, y_train], axis=1)
data_train_pca.head()
PC_1 | PC_2 | PC_3 | PC_4 | PC_5 | PC_6 | PC_7 | PC_8 | PC_9 | PC_10 | PC_11 | PC_12 | PC_13 | PC_14 | PC_15 | PC_16 | PC_17 | PC_18 | PC_19 | PC_20 | PC_21 | PC_22 | PC_23 | PC_24 | PC_25 | PC_26 | PC_27 | PC_28 | PC_29 | PC_30 | PC_31 | PC_32 | PC_33 | PC_34 | PC_35 | PC_36 | PC_37 | PC_38 | PC_39 | PC_40 | PC_41 | PC_42 | PC_43 | PC_44 | PC_45 | Churn | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mobile_number | ||||||||||||||||||||||||||||||||||||||||||||||
7000166926 | -907.572208 | -342.923676 | 13.094442 | 58.813506 | -95.616159 | -1050.535219 | 254.648987 | -31.445039 | 305.140339 | -216.814250 | 95.825021 | 231.408291 | -111.002572 | -2.007256 | 444.977249 | 31.541681 | 573.831941 | -278.539708 | 30.768637 | -36.915195 | -0.293915 | -83.574447 | -13.960479 | -60.930941 | -53.208613 | 56.049658 | -17.776675 | -12.624526 | 14.149393 | -30.559156 | 26.064776 | -1.080160 | -19.814893 | -3.293546 | -2.717923 | 7.470255 | 22.686838 | 28.696686 | -14.312037 | 4.959030 | -8.652543 | 2.473147 | 17.080399 | -21.824778 | -8.062901 | 0 |
7001343085 | 573.898045 | -902.385767 | -424.839214 | -331.153508 | -148.987005 | -36.955710 | -134.445130 | 265.325388 | -92.070929 | -164.203586 | 25.105150 | -36.980621 | 164.785936 | -222.908959 | -12.573878 | -50.569424 | -44.767869 | -62.984835 | -18.100729 | -86.239469 | -115.399141 | -45.776518 | 16.345395 | -21.497140 | -10.541281 | -71.754047 | 29.230830 | -20.880178 | -0.690183 | 3.220864 | -21.223298 | 65.500636 | -39.719437 | 50.424623 | 10.586150 | 43.055219 | 0.209259 | -66.107880 | 13.583016 | 25.823444 | 52.037618 | -3.272773 | 8.493995 | 19.449057 | -38.779466 | 0 |
7001863283 | -1538.198366 | 514.032564 | 846.865497 | 57.032319 | -1126.228705 | -84.209511 | -44.422495 | -88.158881 | -58.411887 | 50.518811 | 3.052703 | -229.100202 | -109.215465 | -3.253782 | 7.045279 | -85.645393 | 54.536446 | -52.292779 | 20.978943 | -90.806167 | 96.348659 | 24.280381 | -52.425262 | 42.430049 | -40.627473 | -12.715890 | -4.331719 | -4.092290 | 50.339358 | -0.777645 | -35.146663 | -121.580965 | 98.868473 | -34.068010 | -8.941074 | 22.920757 | 1.669933 | 52.644942 | -8.542762 | 9.087643 | -18.403853 | 3.672076 | 26.073078 | 27.246371 | 19.603368 | 0 |
7002275981 | 486.830772 | -224.929803 | 1130.460535 | -496.189015 | 6.009139 | 81.106845 | -148.667431 | 170.280911 | -7.375197 | -99.556793 | -159.659135 | -14.186219 | -98.682096 | 213.233743 | -34.920639 | -17.212430 | 29.644778 | 4.941994 | 2.799763 | -49.580528 | -88.567855 | 16.809461 | -9.471018 | 4.383889 | 29.532189 | 38.211558 | 32.465761 | -5.316497 | -60.149577 | 12.593305 | 20.988200 | 80.709846 | -50.975160 | -3.712583 | 65.002407 | -57.837280 | -8.312631 | -5.931175 | -5.053131 | -5.667538 | -12.102225 | -14.690148 | -32.215573 | 12.517731 | -20.158820 | 0 |
7001086221 | -1420.949314 | 794.071749 | 99.221352 | 155.118564 | 145.349456 | 784.723580 | -10.947301 | 609.724272 | -172.482377 | -42.796400 | 59.174124 | -162.912577 | -112.219187 | -55.108445 | 17.303261 | -152.111164 | -611.929832 | 181.577435 | -211.358075 | -77.180329 | 116.282095 | 83.488753 | -26.254488 | 128.490023 | -69.085253 | 4.854304 | -128.278573 | 44.328867 | -6.470515 | -28.782209 | 14.618174 | -31.359379 | 27.331179 | -25.948771 | 8.941634 | -34.840913 | -21.933848 | 17.941556 | -0.866531 | -19.428832 | -5.321193 | 6.319611 | -11.398376 | 41.907093 | -8.296132 | 0 |
## Plotting principal components
sns.pairplot(data=data_train_pca, x_vars=["PC_1"], y_vars=["PC_2"], hue = "Churn", size=8);
# X,y Split
y_train_pca = data_train_pca.pop('Churn')
X_train_pca = data_train_pca
# Transforming test set with pca ( 45 components)
X_test_pca = pca_final.transform(X_test)
# Logistic Regression
lr_pca = LogisticRegression(random_state=100, class_weight='balanced')
lr_pca.fit(X_train_pca,y_train_pca )
LogisticRegression(class_weight='balanced', random_state=100)
# y_train predictions
y_train_pred_lr_pca = lr_pca.predict(X_train_pca)
y_train_pred_lr_pca[:5]
array([1, 0, 0, 0, 0])
# Test Prediction
X_test_pca = pca_final.transform(X_test)
y_test_pred_lr_pca = lr_pca.predict(X_test_pca)
y_test_pred_lr_pca[:5]
array([1, 1, 1, 1, 1])
Baseline Performance
train_matrix = confusion_matrix(y_train, y_train_pred_lr_pca)
test_matrix = confusion_matrix(y_test, y_test_pred_lr_pca)
print('Train Performance :\n')
model_metrics(train_matrix)
print('\nTest Performance :\n')
model_metrics(test_matrix)
Train Performance : Accuracy : 0.645 Sensitivity / True Positive Rate / Recall : 0.905 Specificity / True Negative Rate : 0.62 Precision / Positive Predictive Value : 0.184 F1-score : 0.306 Test Performance : Accuracy : 0.086 Sensitivity / True Positive Rate / Recall : 1.0 Specificity / True Negative Rate : 0.0 Precision / Positive Predictive Value : 0.086 F1-score : 0.158
# Creating a Logistic regression model using pca transformed train set
from sklearn.pipeline import Pipeline
lr_pca = LogisticRegression(random_state=100, class_weight='balanced')
from sklearn.model_selection import RandomizedSearchCV, GridSearchCV , StratifiedKFold
params = {
'penalty' : ['l1','l2','none'],
'C' : [0,1,2,3,4,5,10,50]
}
folds = StratifiedKFold(n_splits=4, shuffle=True, random_state=100)
search = GridSearchCV(cv=folds, estimator = lr_pca, param_grid=params,scoring='roc_auc', verbose=True, n_jobs=-1)
search.fit(X_train_pca, y_train_pca)
Fitting 4 folds for each of 24 candidates, totalling 96 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 4.0s [Parallel(n_jobs=-1)]: Done 96 out of 96 | elapsed: 6.9s finished
GridSearchCV(cv=StratifiedKFold(n_splits=4, random_state=100, shuffle=True), estimator=LogisticRegression(class_weight='balanced', random_state=100), n_jobs=-1, param_grid={'C': [0, 1, 2, 3, 4, 5, 10, 50], 'penalty': ['l1', 'l2', 'none']}, scoring='roc_auc', verbose=True)
# Optimum Hyperparameters
print('Best ROC-AUC score :', search.best_score_)
print('Best Parameters :', search.best_params_)
Best ROC-AUC score : 0.8763924253372933 Best Parameters : {'C': 0, 'penalty': 'none'}
# Modelling using the best LR-PCA estimator
lr_pca_best = search.best_estimator_
lr_pca_best_fit = lr_pca_best.fit(X_train_pca, y_train_pca)
# Prediction on Train set
y_train_pred_lr_pca_best = lr_pca_best_fit.predict(X_train_pca)
y_train_pred_lr_pca_best[:5]
array([1, 1, 0, 0, 0])
# Prediction on test set
y_test_pred_lr_pca_best = lr_pca_best_fit.predict(X_test_pca)
y_test_pred_lr_pca_best[:5]
array([1, 1, 1, 1, 1])
## Model Performance after Hyper Parameter Tuning
train_matrix = confusion_matrix(y_train, y_train_pred_lr_pca_best)
test_matrix = confusion_matrix(y_test, y_test_pred_lr_pca_best)
print('Train Performance :\n')
model_metrics(train_matrix)
print('\nTest Performance :\n')
model_metrics(test_matrix)
Train Performance : Accuracy : 0.627 Sensitivity / True Positive Rate / Recall : 0.918 Specificity / True Negative Rate : 0.599 Precision / Positive Predictive Value : 0.179 F1-score : 0.3 Test Performance : Accuracy : 0.086 Sensitivity / True Positive Rate / Recall : 1.0 Specificity / True Negative Rate : 0.0 Precision / Positive Predictive Value : 0.086 F1-score : 0.158
from sklearn.ensemble import RandomForestClassifier
# creating a random forest classifier using pca output
pca_rf = RandomForestClassifier(random_state=42, class_weight= {0 : class_1/(class_0 + class_1) , 1 : class_0/(class_0 + class_1) } , oob_score=True, n_jobs=-1,verbose=1)
pca_rf
RandomForestClassifier(class_weight={0: 0.08640165272733331, 1: 0.9135983472726666}, n_jobs=-1, oob_score=True, random_state=42, verbose=1)
# Hyper parameter Tuning
params = {
'n_estimators' : [30,40,50,100],
'max_depth' : [3,4,5,6,7],
'min_samples_leaf' : [15,20,25,30]
}
folds = StratifiedKFold(n_splits=4, shuffle=True, random_state=42)
pca_rf_model_search = GridSearchCV(estimator=pca_rf, param_grid=params,
cv=folds, scoring='roc_auc', verbose=True, n_jobs=-1 )
pca_rf_model_search.fit(X_train_pca, y_train)
Fitting 4 folds for each of 80 candidates, totalling 320 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 23.2s [Parallel(n_jobs=-1)]: Done 192 tasks | elapsed: 2.7min [Parallel(n_jobs=-1)]: Done 320 out of 320 | elapsed: 5.5min finished [Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 1.2s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 2.6s finished
GridSearchCV(cv=StratifiedKFold(n_splits=4, random_state=42, shuffle=True), estimator=RandomForestClassifier(class_weight={0: 0.08640165272733331, 1: 0.9135983472726666}, n_jobs=-1, oob_score=True, random_state=42, verbose=1), n_jobs=-1, param_grid={'max_depth': [3, 4, 5, 6, 7], 'min_samples_leaf': [15, 20, 25, 30], 'n_estimators': [30, 40, 50, 100]}, scoring='roc_auc', verbose=True)
# Optimum Hyperparameters
print('Best ROC-AUC score :', pca_rf_model_search.best_score_)
print('Best Parameters :', pca_rf_model_search.best_params_)
Best ROC-AUC score : 0.8861621751601011 Best Parameters : {'max_depth': 7, 'min_samples_leaf': 20, 'n_estimators': 100}
# Modelling using the best PCA-RandomForest Estimator
pca_rf_best = pca_rf_model_search.best_estimator_
pca_rf_best_fit = pca_rf_best.fit(X_train_pca, y_train)
# Prediction on Train set
y_train_pred_pca_rf_best = pca_rf_best_fit.predict(X_train_pca)
y_train_pred_pca_rf_best[:5]
[Parallel(n_jobs=-1)]: Using backend ThreadingBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 1.1s [Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 2.7s finished [Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.0s [Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed: 0.1s finished
array([0, 0, 0, 0, 0])
# Prediction on test set
y_test_pred_pca_rf_best = pca_rf_best_fit.predict(X_test_pca)
y_test_pred_pca_rf_best[:5]
[Parallel(n_jobs=4)]: Using backend ThreadingBackend with 4 concurrent workers. [Parallel(n_jobs=4)]: Done 42 tasks | elapsed: 0.1s [Parallel(n_jobs=4)]: Done 100 out of 100 | elapsed: 0.1s finished
array([0, 0, 0, 0, 0])
## PCA - RandomForest Model Performance - Hyper Parameter Tuned
train_matrix = confusion_matrix(y_train, y_train_pred_pca_rf_best)
test_matrix = confusion_matrix(y_test, y_test_pred_pca_rf_best)
print('Train Performance :\n')
model_metrics(train_matrix)
print('\nTest Performance :\n')
model_metrics(test_matrix)
Train Performance : Accuracy : 0.882 Sensitivity / True Positive Rate / Recall : 0.816 Specificity / True Negative Rate : 0.888 Precision / Positive Predictive Value : 0.408 F1-score : 0.544 Test Performance : Accuracy : 0.86 Sensitivity / True Positive Rate / Recall : 0.80 Specificity / True Negative Rate : 0.78 Precision / Positive Predictive Value : 0.37 F1-score : 0.51
## out of bag error
pca_rf_best_fit.oob_score_
0.8625220164707003
import xgboost as xgb
pca_xgb = xgb.XGBClassifier(random_state=42, scale_pos_weight= class_0/class_1 ,
tree_method='hist',
objective='binary:logistic',
) # scale_pos_weight takes care of class imbalance
pca_xgb.fit(X_train_pca, y_train)
XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_constraints='()', n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=10.573852680293097, subsample=1, tree_method='hist', validate_parameters=1, verbosity=None)
print('Baseline Train AUC Score')
roc_auc_score(y_train, pca_xgb.predict_proba(X_train_pca)[:, 1])
Baseline Train AUC Score
0.9999996277241286
print('Baseline Test AUC Score')
roc_auc_score(y_test, pca_xgb.predict_proba(X_test_pca)[:, 1])
Baseline Test AUC Score
0.46093390352284136
## Hyper parameter Tuning
parameters = {
'learning_rate': [0.1, 0.2, 0.3],
'gamma' : [10,20,50],
'max_depth': [2,3,4],
'min_child_weight': [25,50],
'n_estimators': [150,200,500]}
pca_xgb_search = GridSearchCV(estimator=pca_xgb , param_grid=parameters,scoring='roc_auc', cv=folds, n_jobs=-1, verbose=1)
pca_xgb_search.fit(X_train_pca, y_train)
Fitting 4 folds for each of 162 candidates, totalling 648 fits
[Parallel(n_jobs=-1)]: Using backend LokyBackend with 4 concurrent workers. [Parallel(n_jobs=-1)]: Done 42 tasks | elapsed: 28.3s [Parallel(n_jobs=-1)]: Done 192 tasks | elapsed: 2.1min [Parallel(n_jobs=-1)]: Done 442 tasks | elapsed: 4.8min [Parallel(n_jobs=-1)]: Done 648 out of 648 | elapsed: 8.0min finished
GridSearchCV(cv=StratifiedKFold(n_splits=4, random_state=42, shuffle=True), estimator=XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1, colsample_bynode=1, colsample_bytree=1, gamma=0, gpu_id=-1, importance_type='gain', interaction_constraints='', learning_rate=0.300000012, max_delta_step=0, max_depth=6, min_child_weight=1, missing=nan, monotone_... n_estimators=100, n_jobs=0, num_parallel_tree=1, random_state=42, reg_alpha=0, reg_lambda=1, scale_pos_weight=10.573852680293097, subsample=1, tree_method='hist', validate_parameters=1, verbosity=None), n_jobs=-1, param_grid={'gamma': [10, 20, 50], 'learning_rate': [0.1, 0.2, 0.3], 'max_depth': [2, 3, 4], 'min_child_weight': [25, 50], 'n_estimators': [150, 200, 500]}, scoring='roc_auc', verbose=1)
# Optimum Hyperparameters
print('Best ROC-AUC score :', pca_xgb_search.best_score_)
print('Best Parameters :', pca_xgb_search.best_params_)
Best ROC-AUC score : 0.8955777259491308 Best Parameters : {'gamma': 10, 'learning_rate': 0.1, 'max_depth': 2, 'min_child_weight': 50, 'n_estimators': 500}
# Modelling using the best PCA-XGBoost Estimator
pca_xgb_best = pca_xgb_search.best_estimator_
pca_xgb_best_fit = pca_xgb_best.fit(X_train_pca, y_train)
# Prediction on Train set
y_train_pred_pca_xgb_best = pca_xgb_best_fit.predict(X_train_pca)
y_train_pred_pca_xgb_best[:5]
array([0, 0, 0, 0, 0])
X_train_pca.head()
PC_1 | PC_2 | PC_3 | PC_4 | PC_5 | PC_6 | PC_7 | PC_8 | PC_9 | PC_10 | PC_11 | PC_12 | PC_13 | PC_14 | PC_15 | PC_16 | PC_17 | PC_18 | PC_19 | PC_20 | PC_21 | PC_22 | PC_23 | PC_24 | PC_25 | PC_26 | PC_27 | PC_28 | PC_29 | PC_30 | PC_31 | PC_32 | PC_33 | PC_34 | PC_35 | PC_36 | PC_37 | PC_38 | PC_39 | PC_40 | PC_41 | PC_42 | PC_43 | PC_44 | PC_45 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
mobile_number | |||||||||||||||||||||||||||||||||||||||||||||
7000166926 | -907.572208 | -342.923676 | 13.094442 | 58.813506 | -95.616159 | -1050.535219 | 254.648987 | -31.445039 | 305.140339 | -216.814250 | 95.825021 | 231.408291 | -111.002572 | -2.007256 | 444.977249 | 31.541681 | 573.831941 | -278.539708 | 30.768637 | -36.915195 | -0.293915 | -83.574447 | -13.960479 | -60.930941 | -53.208613 | 56.049658 | -17.776675 | -12.624526 | 14.149393 | -30.559156 | 26.064776 | -1.080160 | -19.814893 | -3.293546 | -2.717923 | 7.470255 | 22.686838 | 28.696686 | -14.312037 | 4.959030 | -8.652543 | 2.473147 | 17.080399 | -21.824778 | -8.062901 |
7001343085 | 573.898045 | -902.385767 | -424.839214 | -331.153508 | -148.987005 | -36.955710 | -134.445130 | 265.325388 | -92.070929 | -164.203586 | 25.105150 | -36.980621 | 164.785936 | -222.908959 | -12.573878 | -50.569424 | -44.767869 | -62.984835 | -18.100729 | -86.239469 | -115.399141 | -45.776518 | 16.345395 | -21.497140 | -10.541281 | -71.754047 | 29.230830 | -20.880178 | -0.690183 | 3.220864 | -21.223298 | 65.500636 | -39.719437 | 50.424623 | 10.586150 | 43.055219 | 0.209259 | -66.107880 | 13.583016 | 25.823444 | 52.037618 | -3.272773 | 8.493995 | 19.449057 | -38.779466 |
7001863283 | -1538.198366 | 514.032564 | 846.865497 | 57.032319 | -1126.228705 | -84.209511 | -44.422495 | -88.158881 | -58.411887 | 50.518811 | 3.052703 | -229.100202 | -109.215465 | -3.253782 | 7.045279 | -85.645393 | 54.536446 | -52.292779 | 20.978943 | -90.806167 | 96.348659 | 24.280381 | -52.425262 | 42.430049 | -40.627473 | -12.715890 | -4.331719 | -4.092290 | 50.339358 | -0.777645 | -35.146663 | -121.580965 | 98.868473 | -34.068010 | -8.941074 | 22.920757 | 1.669933 | 52.644942 | -8.542762 | 9.087643 | -18.403853 | 3.672076 | 26.073078 | 27.246371 | 19.603368 |
7002275981 | 486.830772 | -224.929803 | 1130.460535 | -496.189015 | 6.009139 | 81.106845 | -148.667431 | 170.280911 | -7.375197 | -99.556793 | -159.659135 | -14.186219 | -98.682096 | 213.233743 | -34.920639 | -17.212430 | 29.644778 | 4.941994 | 2.799763 | -49.580528 | -88.567855 | 16.809461 | -9.471018 | 4.383889 | 29.532189 | 38.211558 | 32.465761 | -5.316497 | -60.149577 | 12.593305 | 20.988200 | 80.709846 | -50.975160 | -3.712583 | 65.002407 | -57.837280 | -8.312631 | -5.931175 | -5.053131 | -5.667538 | -12.102225 | -14.690148 | -32.215573 | 12.517731 | -20.158820 |
7001086221 | -1420.949314 | 794.071749 | 99.221352 | 155.118564 | 145.349456 | 784.723580 | -10.947301 | 609.724272 | -172.482377 | -42.796400 | 59.174124 | -162.912577 | -112.219187 | -55.108445 | 17.303261 | -152.111164 | -611.929832 | 181.577435 | -211.358075 | -77.180329 | 116.282095 | 83.488753 | -26.254488 | 128.490023 | -69.085253 | 4.854304 | -128.278573 | 44.328867 | -6.470515 | -28.782209 | 14.618174 | -31.359379 | 27.331179 | -25.948771 | 8.941634 | -34.840913 | -21.933848 | 17.941556 | -0.866531 | -19.428832 | -5.321193 | 6.319611 | -11.398376 | 41.907093 | -8.296132 |
# Prediction on test set
X_test_pca = pca_final.transform(X_test)
X_test_pca = pd.DataFrame(X_test_pca, index=X_test.index, columns = X_train_pca.columns)
y_test_pred_pca_xgb_best = pca_xgb_best_fit.predict(X_test_pca)
y_test_pred_pca_xgb_best[:5]
array([1, 1, 1, 1, 1])
## PCA - XGBOOST [Hyper parameter tuned] Model Performance
train_matrix = confusion_matrix(y_train, y_train_pred_pca_xgb_best)
test_matrix = confusion_matrix(y_test, y_test_pred_pca_xgb_best)
print('Train Performance :\n')
model_metrics(train_matrix)
print('\nTest Performance :\n')
model_metrics(test_matrix)
Train Performance : Accuracy : 0.873 Sensitivity / True Positive Rate / Recall : 0.887 Specificity / True Negative Rate : 0.872 Precision / Positive Predictive Value : 0.396 F1-score : 0.548 Test Performance : Accuracy : 0.086 Sensitivity / True Positive Rate / Recall : 1.0 Specificity / True Negative Rate : 0.0 Precision / Positive Predictive Value : 0.086 F1-score : 0.158
## PCA - XGBOOST [Hyper parameter tuned] Model Performance
print('Train AUC Score')
print(roc_auc_score(y_train, pca_xgb_best.predict_proba(X_train_pca)[:, 1]))
print('Test AUC Score')
print(roc_auc_score(y_test, pca_xgb_best.predict_proba(X_test_pca)[:, 1]))
Train AUC Score 0.9442462043611259 Test AUC Score 0.6353301334697982
print('Most Important Predictors of churn , in the order of importance are : ')
lr_results.sort_values(by=coef_column, key=lambda x: abs(x), ascending=False)['coef']
Most Important Predictors of churn , in the order of importance are :
loc_ic_t2f_mou_8 -1.2736 total_rech_num_8 -1.2033 total_rech_num_6 0.6053 monthly_3g_8_0 0.3994 monthly_2g_8_0 0.3666 std_ic_t2f_mou_8 -0.3363 std_og_t2f_mou_8 -0.2474 const -0.2336 monthly_3g_7_0 -0.2099 std_ic_t2f_mou_7 0.1532 sachet_2g_6_0 -0.1108 sachet_2g_7_0 -0.0987 sachet_2g_8_0 0.0488 sachet_3g_6_0 -0.0399 Name: coef, dtype: float64
From the above, the following are the strongest indicators of churn
Based on the above indicators the recommendations to the telecom company are :