Classification Models in Machine Learning
Classification is a crucial task in machine learning, as it allows us to predict the labels or categories of new data based on its features. There are several classification algorithms available, each with its unique strengths and weaknesses. In this blog post, we will explore a range of classification models, from traditional machine learning algorithms like LogisticRegression, KNeighborsClassifier, SVC, DecisionTreeClassifier, RandomForestClassifier, GradientBoostingClassifier, AdaBoostClassifier, XGBClassifier to more complex neural network models. We will examine how each algorithm works, their advantages and disadvantages, and when to use them based on the specific data characteristics and problem requirements. By the end of this blog post, you will have a better understanding of different classification techniques and be able to choose the best algorithm for your classification task.
Context¶
An automobile company has plans to enter new markets with their existing products (P1, P2, P3, P4, and P5). After intensive market research, they’ve deduced that the behavior of the new market is similar to their existing market.
In their existing market, the sales team has classified all customers into 4 segments (A, B, C, D ). Then, they performed segmented outreach and communication for a different segment of customers. This strategy has work e exceptionally well for them. They plan to use the same strategy for the new markets and have identified 2627 new potential customers.
You are required to help the manager to predict the right group of the new customers.
Import libraries¶
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os
Import Dataset¶
train_data=pd.read_csv("Train.csv")
test_data=pd.read_csv("Test.csv")
train_data
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 462809 | Male | No | 22 | No | Healthcare | 1.0 | Low | 4.0 | Cat_4 | D |
1 | 462643 | Female | Yes | 38 | Yes | Engineer | NaN | Average | 3.0 | Cat_4 | A |
2 | 466315 | Female | Yes | 67 | Yes | Engineer | 1.0 | Low | 1.0 | Cat_6 | B |
3 | 461735 | Male | Yes | 67 | Yes | Lawyer | 0.0 | High | 2.0 | Cat_6 | B |
4 | 462669 | Female | Yes | 40 | Yes | Entertainment | NaN | High | 6.0 | Cat_6 | A |
… | … | … | … | … | … | … | … | … | … | … | … |
8063 | 464018 | Male | No | 22 | No | NaN | 0.0 | Low | 7.0 | Cat_1 | D |
8064 | 464685 | Male | No | 35 | No | Executive | 3.0 | Low | 4.0 | Cat_4 | D |
8065 | 465406 | Female | No | 33 | Yes | Healthcare | 1.0 | Low | 1.0 | Cat_6 | D |
8066 | 467299 | Female | No | 27 | Yes | Healthcare | 1.0 | Low | 4.0 | Cat_6 | B |
8067 | 461879 | Male | Yes | 37 | Yes | Executive | 0.0 | Average | 3.0 | Cat_4 | B |
8068 rows × 11 columns
DATASET OVERVIEW¶
View first 10 rows
#First 10 rows
train_data.head(10)
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 462809 | Male | No | 22 | No | Healthcare | 1.0 | Low | 4.0 | Cat_4 | D |
1 | 462643 | Female | Yes | 38 | Yes | Engineer | NaN | Average | 3.0 | Cat_4 | A |
2 | 466315 | Female | Yes | 67 | Yes | Engineer | 1.0 | Low | 1.0 | Cat_6 | B |
3 | 461735 | Male | Yes | 67 | Yes | Lawyer | 0.0 | High | 2.0 | Cat_6 | B |
4 | 462669 | Female | Yes | 40 | Yes | Entertainment | NaN | High | 6.0 | Cat_6 | A |
5 | 461319 | Male | Yes | 56 | No | Artist | 0.0 | Average | 2.0 | Cat_6 | C |
6 | 460156 | Male | No | 32 | Yes | Healthcare | 1.0 | Low | 3.0 | Cat_6 | C |
7 | 464347 | Female | No | 33 | Yes | Healthcare | 1.0 | Low | 3.0 | Cat_6 | D |
8 | 465015 | Female | Yes | 61 | Yes | Engineer | 0.0 | Low | 3.0 | Cat_7 | D |
9 | 465176 | Female | Yes | 55 | Yes | Artist | 1.0 | Average | 4.0 | Cat_6 | C |
#Last 10 rows
train_data.tail(10)
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
8058 | 460674 | Female | No | 31 | Yes | Entertainment | 0.0 | Low | 3.0 | Cat_3 | A |
8059 | 460132 | Male | No | 39 | Yes | Healthcare | 3.0 | Low | 2.0 | Cat_6 | D |
8060 | 463613 | Female | Yes | 48 | Yes | Artist | 0.0 | Average | 6.0 | Cat_6 | A |
8061 | 465231 | Male | Yes | 65 | No | Artist | 0.0 | Average | 2.0 | Cat_6 | C |
8062 | 463002 | Male | Yes | 41 | Yes | Artist | 0.0 | High | 5.0 | Cat_6 | B |
8063 | 464018 | Male | No | 22 | No | NaN | 0.0 | Low | 7.0 | Cat_1 | D |
8064 | 464685 | Male | No | 35 | No | Executive | 3.0 | Low | 4.0 | Cat_4 | D |
8065 | 465406 | Female | No | 33 | Yes | Healthcare | 1.0 | Low | 1.0 | Cat_6 | D |
8066 | 467299 | Female | No | 27 | Yes | Healthcare | 1.0 | Low | 4.0 | Cat_6 | B |
8067 | 461879 | Male | Yes | 37 | Yes | Executive | 0.0 | Average | 3.0 | Cat_4 | B |
#random sample of 20 rows
train_data.sample(20)
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
4441 | 464444 | Male | Yes | 36 | Yes | Artist | 8.0 | Average | 2.0 | Cat_6 | C |
6805 | 466859 | Male | No | 22 | No | Healthcare | 0.0 | Low | 5.0 | Cat_2 | D |
3013 | 462280 | Male | No | 32 | Yes | Healthcare | 0.0 | Low | 4.0 | Cat_6 | B |
6233 | 460252 | Male | Yes | 41 | Yes | Artist | 0.0 | Low | 2.0 | Cat_6 | A |
1428 | 467221 | Female | No | 20 | No | Doctor | NaN | Low | 4.0 | Cat_6 | C |
2624 | 464566 | Male | Yes | 35 | Yes | Executive | 0.0 | High | 4.0 | Cat_6 | A |
6080 | 466744 | Female | Yes | 52 | Yes | Artist | 1.0 | Average | 3.0 | Cat_6 | C |
4383 | 465479 | Male | No | 27 | No | Artist | 1.0 | Low | 1.0 | Cat_4 | D |
3846 | 465129 | Female | No | 61 | Yes | Entertainment | 3.0 | Low | 1.0 | Cat_6 | A |
2357 | 461693 | Female | Yes | 36 | Yes | Artist | 1.0 | Average | 4.0 | Cat_6 | B |
3884 | 462956 | Male | Yes | 42 | No | Artist | 7.0 | Average | 5.0 | Cat_6 | B |
7163 | 467168 | Male | Yes | 46 | Yes | Artist | 8.0 | Average | 2.0 | Cat_1 | B |
1180 | 464997 | Male | Yes | 43 | Yes | Artist | 5.0 | Average | 2.0 | Cat_6 | B |
4360 | 462525 | Female | Yes | 41 | Yes | Artist | 0.0 | Average | 2.0 | Cat_6 | C |
3642 | 464243 | Female | No | 49 | Yes | Artist | 1.0 | Low | 2.0 | Cat_6 | C |
1899 | 463025 | Male | Yes | 40 | No | Executive | 7.0 | Low | 4.0 | Cat_6 | A |
3508 | 460616 | Male | Yes | 32 | Yes | Homemaker | 2.0 | Average | 2.0 | Cat_3 | C |
1100 | 461129 | Female | Yes | 63 | Yes | Entertainment | 0.0 | Average | 4.0 | Cat_6 | B |
6915 | 464716 | Male | Yes | 43 | Yes | Marketing | 0.0 | High | 6.0 | Cat_4 | D |
303 | 466597 | Female | No | 19 | No | Healthcare | 1.0 | Low | 2.0 | Cat_6 | D |
Define function to get an overview of the data
def data_overview(data, title):
overview_analysis = {f'{title}':[data.shape[1], data.shape[0],
data.isnull().any(axis=1).sum(),
data.isnull().any(axis=1).sum()/len(data)*100,
data.duplicated().sum(),
data.duplicated().sum()/len(data)*100,
sum((data.dtypes == 'object') & (data.nunique() > 2)),
sum((data.dtypes == 'object') & (data.nunique() < 3)),
data.select_dtypes(include=['int64', 'float64']).shape[1]
]}
overview_analysis=pd.DataFrame(overview_analysis, index=['Columns','Rows','Missing_Values','Missing_Values %',
'Duplicates', 'Duplicates %','Categorical_variables','Boolean_variables','Numerical_variables']).round(2)
return overview_analysis
data_overview(train_data, "Data_Overview")
Data_Overview | |
---|---|
Columns | 11.00 |
Rows | 8068.00 |
Missing_Values | 1403.00 |
Missing_Values % | 17.39 |
Duplicates | 0.00 |
Duplicates % | 0.00 |
Categorical_variables | 4.00 |
Boolean_variables | 3.00 |
Numerical_variables | 4.00 |
Define function to have an overview of the variables
def variables_overview1 (data):
variable_details = {'unique':data.nunique(),
'dtype':data.dtypes,
'null':data.isna().sum(),
'null %':data.isna().sum()/len(data)*100
}
variable_details = pd.DataFrame(variable_details)
return variable_details
variables_overview=variables_overview1(train_data)
variables_overview
unique | dtype | null | null % | |
---|---|---|---|---|
ID | 8068 | int64 | 0 | 0.000000 |
Gender | 2 | object | 0 | 0.000000 |
Ever_Married | 2 | object | 140 | 1.735250 |
Age | 67 | int64 | 0 | 0.000000 |
Graduated | 2 | object | 78 | 0.966782 |
Profession | 9 | object | 124 | 1.536936 |
Work_Experience | 15 | float64 | 829 | 10.275161 |
Spending_Score | 3 | object | 0 | 0.000000 |
Family_Size | 9 | float64 | 335 | 4.152206 |
Var_1 | 7 | object | 76 | 0.941993 |
Segmentation | 4 | object | 0 | 0.000000 |
SUMMARY OF STATISTICS¶
Compute summary of statistics for the numerical columns in the DataFrame.
train_data.describe()
ID | Age | Work_Experience | Family_Size | |
---|---|---|---|---|
count | 8068.000000 | 8068.000000 | 7239.000000 | 7733.000000 |
mean | 463479.214551 | 43.466906 | 2.641663 | 2.850123 |
std | 2595.381232 | 16.711696 | 3.406763 | 1.531413 |
min | 458982.000000 | 18.000000 | 0.000000 | 1.000000 |
25% | 461240.750000 | 30.000000 | 0.000000 | 2.000000 |
50% | 463472.500000 | 40.000000 | 1.000000 | 3.000000 |
75% | 465744.250000 | 53.000000 | 4.000000 | 4.000000 |
max | 467974.000000 | 89.000000 | 14.000000 | 9.000000 |
Compute summary of statistics for the categorical columns in the DataFrame.¶
train_data.describe(include='object')
Gender | Ever_Married | Graduated | Profession | Spending_Score | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|
count | 8068 | 7928 | 7990 | 7944 | 8068 | 7992 | 8068 |
unique | 2 | 2 | 2 | 9 | 3 | 7 | 4 |
top | Male | Yes | Yes | Artist | Low | Cat_6 | D |
freq | 4417 | 4643 | 4968 | 2516 | 4878 | 5238 | 2268 |
EXPLANATORY DATA ANALYSIS (EDA)¶
Visualise customer segmentation count (Target Variable)¶
ax = sns.countplot(train_data["Segmentation"],
order = train_data["Segmentation"].value_counts().index)
abs_values = train_data['Segmentation'].value_counts().values
rel_values = train_data['Segmentation'].value_counts(normalize=True).values * 100
lbls = [f'{p[0]} ({p[1]:.0f}%)' for p in zip(abs_values, rel_values)]
ax.bar_label(container=ax.containers[0], labels=lbls)
ax.set_xlabel('Segmentation')
ax.set_ylabel('Count')
ax.set_title('Segmentation Count')
C:\Users\d\anaconda3\lib\site-packages\seaborn\_decorators.py:36: FutureWarning: Pass the following variable as a keyword arg: x. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation. warnings.warn(
Text(0.5, 1.0, 'Segmentation Count')
fig, ax = plt.subplots(1, 2, figsize=(9, 5))
train_data["Segmentation"].value_counts().plot.bar(color=['blue', 'orange', 'green', 'red'], ax=ax[0])
train_data["Segmentation"].value_counts().plot(kind='pie',autopct='%.2f%%',shadow=True, ax=ax[1])
centre_circle = plt.Circle((0,0),0.80,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
ax[1].set_title("Segmentation Analysis")
ax[0].set_title("Segmentation Analysis")
ax[1].legend(title="Segmentation", bbox_to_anchor=(1.1, 1), labels=['D', 'A', 'C', 'B'])
for i, patch in enumerate(ax[0].patches):
count = train_data["Segmentation"].value_counts().iloc[i]
ax[0].annotate(str(count), xy=(patch.get_x() + patch.get_width() / 2, patch.get_height() + 1),
ha='center', va='center', fontsize=10)
plt.xticks(rotation=90)
plt.yticks(rotation=45)
plt.show()
col= ['Gender',
'Ever_Married',
'Graduated',
'Profession',
'Spending_Score',
'Var_1']
for j in col:
plt.figure(figsize=(10,6))
plt.title(f'{j} Count by segmentation')
sns.countplot(data=train_data, x= j,hue='Segmentation')
mode_table=pd.DataFrame(train_data.groupby('Segmentation')[['Family_Size', 'Age', 'Work_Experience', 'Gender', 'Ever_Married', 'Profession', 'Spending_Score']].agg(pd.Series.mode))
mode_table
Family_Size | Age | Work_Experience | Gender | Ever_Married | Profession | Spending_Score | |
---|---|---|---|---|---|---|---|
Segmentation | |||||||
A | 2.0 | 35 | 1.0 | Male | Yes | Artist | Low |
B | 2.0 | 43 | 1.0 | Male | Yes | Artist | Low |
C | 2.0 | 50 | 1.0 | Male | Yes | Artist | Average |
D | 4.0 | 22 | 0.0 | Male | No | Healthcare | Low |
mode_table.plot(kind='bar', figsize=(10, 6))
plt.xlabel('Segmentation')
plt.ylabel('Mode')
plt.title('Mode values by segmentation')
plt.legend(title='Variable', bbox_to_anchor=(1, 1))
plt.show()
mean_table=pd.DataFrame(train_data.groupby('Segmentation')[['Family_Size', 'Age', 'Work_Experience', 'Gender', 'Ever_Married', 'Profession', 'Spending_Score']].agg(pd.Series.mean))
mean_table
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2097100306.py:1: FutureWarning: ['Gender', 'Ever_Married', 'Profession', 'Spending_Score'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning. mean_table=pd.DataFrame(train_data.groupby('Segmentation')[['Family_Size', 'Age', 'Work_Experience', 'Gender', 'Ever_Married', 'Profession', 'Spending_Score']].agg(pd.Series.mean))
Family_Size | Age | Work_Experience | |
---|---|---|---|
Segmentation | |||
A | 2.439531 | 44.924949 | 2.874578 |
B | 2.696970 | 48.200215 | 2.378151 |
C | 2.974559 | 49.144162 | 2.240771 |
D | 3.232624 | 33.390212 | 3.021717 |
mean_table.plot(kind='bar', figsize=(10, 6))
plt.xlabel('Segmentation')
plt.ylabel('Mode')
plt.title('Mean values by segmentation')
plt.legend(title='Variable', bbox_to_anchor=(1, 1))
plt.show()
#Gender Count in each segmentation
train_data.groupby(['Segmentation','Gender'])[['Gender']].count()
Gender | ||
---|---|---|
Segmentation | Gender | |
A | Female | 909 |
Male | 1063 | |
B | Female | 861 |
Male | 997 | |
C | Female | 922 |
Male | 1048 | |
D | Female | 959 |
Male | 1309 |
train_data.groupby(['Segmentation','Gender'])[['Gender']].count().plot(kind = 'barh')
<AxesSubplot:ylabel='Segmentation,Gender'>
Distribution of Gender by Segmentation and Spending Score¶
train_data.groupby(['Segmentation', 'Spending_Score','Gender'])[['Gender']].count()
Gender | |||
---|---|---|---|
Segmentation | Spending_Score | Gender | |
A | Average | Female | 151 |
Male | 192 | ||
High | Female | 110 | |
Male | 161 | ||
Low | Female | 648 | |
Male | 710 | ||
B | Average | Female | 236 |
Male | 354 | ||
High | Female | 149 | |
Male | 235 | ||
Low | Female | 476 | |
Male | 408 | ||
C | Average | Female | 382 |
Male | 521 | ||
High | Female | 176 | |
Male | 229 | ||
Low | Female | 364 | |
Male | 298 | ||
D | Average | Female | 62 |
Male | 76 | ||
High | Female | 55 | |
Male | 101 | ||
Low | Female | 842 | |
Male | 1132 |
counts = train_data.groupby(['Segmentation', 'Spending_Score', 'Gender'])[['Gender']].count().unstack()
ax = counts.plot(kind='bar', stacked=True, figsize=(10, 6))
ax.set_xlabel('(Segmentation, Spending_Score)')
ax.set_ylabel('Count')
ax.set_title('Distribution of Gender by Segmentation and Spending Score')
ax.legend(title='Gender')
plt.show()
for val in ["Yes","No"]:
plt.title(val)
sns.countplot(x=train_data[train_data["Ever_Married"]==val]["Segmentation"], hue=train_data["Spending_Score"])
plt.show()
Visualization for Categorical Variables¶
cat_var_list = train_data.select_dtypes(include=['object']).columns.to_list()
cat_var_list
['Gender', 'Ever_Married', 'Graduated', 'Profession', 'Spending_Score', 'Var_1', 'Segmentation']
#Visualise the count of all categorical variables
for i in cat_var_list:
plt.figure(figsize=(10,6))
plt.title(f'{i}')
ax = sns.countplot(data=train_data, x=i) # get the axis object
# Get the total count of each category
total_count = len(train_data[i])
# Iterate over each bar in the plot
for p in ax.patches:
# Get the height of the bar
height = p.get_height()
# Calculate the percentage of the total count that this bar represents
percentage = 100 * height / total_count
# Add the count value on top of the bar
ax.annotate(f'{height}\n{percentage:.1f}%',
xy=(p.get_x() + p.get_width() / 2., height),
xytext=(0, 5),
textcoords='offset points',
ha='center', va='bottom',
fontsize=8)
plt.figure(figsize=(15,25))
for i, var in enumerate(cat_var_list):
plt.subplot(4, 2, i+1)
train_data[var].value_counts().plot(kind='pie',autopct='%.2f%%',shadow = True)
centre_circle = plt.Circle((0,0),0.80,fc='white')
fig = plt.gcf()
fig.gca().add_artist(centre_circle)
plt.title(var)
plt.tight_layout()
plt.show()
cat_var_count_df = pd.DataFrame(columns=['Variable', 'Category', 'Count', 'Count%'])
for i in cat_var_list:
value_counts = train_data[i].value_counts()
n_obs = train_data.shape[0]
cat_var_count_df = cat_var_count_df.append(
pd.DataFrame({
'Variable': i,
'Category': value_counts.index,
'Count': value_counts.values,
'Count%': value_counts.values / n_obs * 100
}),
ignore_index=True
)
cat_var_count_df = cat_var_count_df.sort_values(['Variable', 'Category']).reset_index(drop=True)
cat_var_count_df
C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append( C:\Users\d\AppData\Local\Temp\ipykernel_14000\1692171894.py:6: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. cat_var_count_df = cat_var_count_df.append(
Variable | Category | Count | Count% | |
---|---|---|---|---|
0 | Ever_Married | No | 3285 | 40.716411 |
1 | Ever_Married | Yes | 4643 | 57.548339 |
2 | Gender | Female | 3651 | 45.252851 |
3 | Gender | Male | 4417 | 54.747149 |
4 | Graduated | No | 3022 | 37.456619 |
5 | Graduated | Yes | 4968 | 61.576599 |
6 | Profession | Artist | 2516 | 31.184928 |
7 | Profession | Doctor | 688 | 8.527516 |
8 | Profession | Engineer | 699 | 8.663857 |
9 | Profession | Entertainment | 949 | 11.762519 |
10 | Profession | Executive | 599 | 7.424393 |
11 | Profession | Healthcare | 1332 | 16.509668 |
12 | Profession | Homemaker | 246 | 3.049083 |
13 | Profession | Lawyer | 623 | 7.721864 |
14 | Profession | Marketing | 292 | 3.619236 |
15 | Segmentation | A | 1972 | 24.442241 |
16 | Segmentation | B | 1858 | 23.029251 |
17 | Segmentation | C | 1970 | 24.417452 |
18 | Segmentation | D | 2268 | 28.111056 |
19 | Spending_Score | Average | 1974 | 24.467030 |
20 | Spending_Score | High | 1216 | 15.071889 |
21 | Spending_Score | Low | 4878 | 60.461081 |
22 | Var_1 | Cat_1 | 133 | 1.648488 |
23 | Var_1 | Cat_2 | 422 | 5.230540 |
24 | Var_1 | Cat_3 | 822 | 10.188399 |
25 | Var_1 | Cat_4 | 1089 | 13.497769 |
26 | Var_1 | Cat_5 | 85 | 1.053545 |
27 | Var_1 | Cat_6 | 5238 | 64.923153 |
28 | Var_1 | Cat_7 | 203 | 2.516113 |
# create the plot
plt.figure(figsize=(12, 8))
plt.bar(cat_var_count_df['Category'], cat_var_count_df['Count%'])
plt.xticks(rotation=90)
plt.xlabel('Category')
plt.ylabel('Count %')
plt.title('Category vs Count%')
plt.show()
NUMERICAL VALUES VISUALISATION¶
HISTOGRAM PLOT VISUALISATION¶
#get numeriacl columns as a list
num_miss_list = variables_overview.loc[~(variables_overview["dtype"] == "object")]
num_miss_list=num_miss_list.index.to_list()
num_miss_list.remove('ID')
num_miss_list
['Age', 'Work_Experience', 'Family_Size']
for n in num_miss_list:
plt.figure(figsize=(10,6))
plt.title(f'{n}')
sns.histplot(data=train_data,x=n,kde=True)
#Heat Map
plt.figure(figsize=(15,15))
sns.heatmap(train_data.corr(), annot=True)
<AxesSubplot:>
DATA CLEANSING¶
Encoding Categorical Variables and missing values¶
train_data
ID | Gender | Ever_Married | Age | Graduated | Profession | Work_Experience | Spending_Score | Family_Size | Var_1 | Segmentation | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 462809 | Male | No | 22 | No | Healthcare | 1.0 | Low | 4.0 | Cat_4 | D |
1 | 462643 | Female | Yes | 38 | Yes | Engineer | NaN | Average | 3.0 | Cat_4 | A |
2 | 466315 | Female | Yes | 67 | Yes | Engineer | 1.0 | Low | 1.0 | Cat_6 | B |
3 | 461735 | Male | Yes | 67 | Yes | Lawyer | 0.0 | High | 2.0 | Cat_6 | B |
4 | 462669 | Female | Yes | 40 | Yes | Entertainment | NaN | High | 6.0 | Cat_6 | A |
… | … | … | … | … | … | … | … | … | … | … | … |
8063 | 464018 | Male | No | 22 | No | NaN | 0.0 | Low | 7.0 | Cat_1 | D |
8064 | 464685 | Male | No | 35 | No | Executive | 3.0 | Low | 4.0 | Cat_4 | D |
8065 | 465406 | Female | No | 33 | Yes | Healthcare | 1.0 | Low | 1.0 | Cat_6 | D |
8066 | 467299 | Female | No | 27 | Yes | Healthcare | 1.0 | Low | 4.0 | Cat_6 | B |
8067 | 461879 | Male | Yes | 37 | Yes | Executive | 0.0 | Average | 3.0 | Cat_4 | B |
8068 rows × 11 columns
Encoding Categorical Variables¶
Categorical variables can be divided into two categories, nominal (no particular order) and Ordinal (order between values). The dataset contains only nominal variables with no particular order for this reason, we will adopt nominal methodology for encoding.
#Categorical Variables excluding the target variable
cat_var_list_upd=['Gender',
'Ever_Married',
'Graduated',
'Profession',
'Spending_Score',
'Var_1']
#Encode Categorical variables with one hot encoding apart from the target variable
cat_var_encoded = pd.get_dummies(train_data[cat_var_list_upd], prefix_sep='_')
#Encode the missing values differently
for var in cat_var_list_upd:
cat_var_encoded[f'{var}_missing'] = pd.isna(train_data[var]).astype('uint8')
cat_var_encoded
Gender_Female | Gender_Male | Ever_Married_No | Ever_Married_Yes | Graduated_No | Graduated_Yes | Profession_Artist | Profession_Doctor | Profession_Engineer | Profession_Entertainment | … | Var_1_Cat_4 | Var_1_Cat_5 | Var_1_Cat_6 | Var_1_Cat_7 | Gender_missing | Ever_Married_missing | Graduated_missing | Profession_missing | Spending_Score_missing | Var_1_missing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
8063 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
8064 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8065 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8066 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8067 | 0 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8068 rows × 31 columns
#Concatenate encoded variables with original data
train_data_encoded = pd.concat([train_data.drop(cat_var_list_upd, axis=1), cat_var_encoded], axis=1)
train_data_encoded
ID | Age | Work_Experience | Family_Size | Segmentation | Gender_Female | Gender_Male | Ever_Married_No | Ever_Married_Yes | Graduated_No | … | Var_1_Cat_4 | Var_1_Cat_5 | Var_1_Cat_6 | Var_1_Cat_7 | Gender_missing | Ever_Married_missing | Graduated_missing | Profession_missing | Spending_Score_missing | Var_1_missing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 462809 | 22 | 1.0 | 4.0 | D | 0 | 1 | 1 | 0 | 1 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
1 | 462643 | 38 | NaN | 3.0 | A | 1 | 0 | 0 | 1 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
2 | 466315 | 67 | 1.0 | 1.0 | B | 1 | 0 | 0 | 1 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
3 | 461735 | 67 | 0.0 | 2.0 | B | 0 | 1 | 0 | 1 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
4 | 462669 | 40 | NaN | 6.0 | A | 1 | 0 | 0 | 1 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
8063 | 464018 | 22 | 0.0 | 7.0 | D | 0 | 1 | 1 | 0 | 1 | … | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
8064 | 464685 | 35 | 3.0 | 4.0 | D | 0 | 1 | 1 | 0 | 1 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8065 | 465406 | 33 | 1.0 | 1.0 | D | 1 | 0 | 1 | 0 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8066 | 467299 | 27 | 1.0 | 4.0 | B | 1 | 0 | 1 | 0 | 0 | … | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8067 | 461879 | 37 | 0.0 | 3.0 | B | 0 | 1 | 0 | 1 | 0 | … | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
8068 rows × 36 columns
#Encode the target variable with mapping technique
rank = {'A':0,'B':1,'C':2,'D':3}
train_data_encoded['Segmentation'] = train_data_encoded['Segmentation'].map(rank)
train_data_encoded=train_data_encoded.drop(['ID'], axis=1)
#check the data
variables_overview1(train_data_encoded)
unique | dtype | null | null % | |
---|---|---|---|---|
Age | 67 | int64 | 0 | 0.000000 |
Work_Experience | 15 | float64 | 829 | 10.275161 |
Family_Size | 9 | float64 | 335 | 4.152206 |
Segmentation | 4 | int64 | 0 | 0.000000 |
Gender_Female | 2 | uint8 | 0 | 0.000000 |
Gender_Male | 2 | uint8 | 0 | 0.000000 |
Ever_Married_No | 2 | uint8 | 0 | 0.000000 |
Ever_Married_Yes | 2 | uint8 | 0 | 0.000000 |
Graduated_No | 2 | uint8 | 0 | 0.000000 |
Graduated_Yes | 2 | uint8 | 0 | 0.000000 |
Profession_Artist | 2 | uint8 | 0 | 0.000000 |
Profession_Doctor | 2 | uint8 | 0 | 0.000000 |
Profession_Engineer | 2 | uint8 | 0 | 0.000000 |
Profession_Entertainment | 2 | uint8 | 0 | 0.000000 |
Profession_Executive | 2 | uint8 | 0 | 0.000000 |
Profession_Healthcare | 2 | uint8 | 0 | 0.000000 |
Profession_Homemaker | 2 | uint8 | 0 | 0.000000 |
Profession_Lawyer | 2 | uint8 | 0 | 0.000000 |
Profession_Marketing | 2 | uint8 | 0 | 0.000000 |
Spending_Score_Average | 2 | uint8 | 0 | 0.000000 |
Spending_Score_High | 2 | uint8 | 0 | 0.000000 |
Spending_Score_Low | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_1 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_2 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_3 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_4 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_5 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_6 | 2 | uint8 | 0 | 0.000000 |
Var_1_Cat_7 | 2 | uint8 | 0 | 0.000000 |
Gender_missing | 1 | uint8 | 0 | 0.000000 |
Ever_Married_missing | 2 | uint8 | 0 | 0.000000 |
Graduated_missing | 2 | uint8 | 0 | 0.000000 |
Profession_missing | 2 | uint8 | 0 | 0.000000 |
Spending_Score_missing | 1 | uint8 | 0 | 0.000000 |
Var_1_missing | 2 | uint8 | 0 | 0.000000 |
Missing value techniques¶
- I. K- Nearest Neighbor (KNN): This method utilizes the k- Nearest Neighbor imputation technique to replace the missing values in the column by identifying the most related/identical rows in the dataset (also called Nearest Neighbors). By default, it uses the Euclidean distance metric to impute the missing value. Here, K represents a number of neighbors to consider while replacing the missing value. For instance, if k = 3, then the 3 most related rows are identified, and the missing value is imputed by taking the mean of these 3 related rows.
#Categorical Variables with missing values
cat_miss_values = variables_overview.loc[(variables_overview["dtype"] == "object") & (variables_overview["null"] > 0)]
cat_miss_values
unique | dtype | null | null % | |
---|---|---|---|---|
Ever_Married | 2 | object | 140 | 1.735250 |
Graduated | 2 | object | 78 | 0.966782 |
Profession | 9 | object | 124 | 1.536936 |
Var_1 | 7 | object | 76 | 0.941993 |
#Categorical Variables with missing values count
cat_miss_values_count = cat_miss_values.index.to_list()
for i in cat_miss_values_count:
print(f'Value counts of {i} column')
print(train_data[i].value_counts()/len(train_data)*100, end="\n\n")
Value counts of Ever_Married column Yes 57.548339 No 40.716411 Name: Ever_Married, dtype: float64 Value counts of Graduated column Yes 61.576599 No 37.456619 Name: Graduated, dtype: float64 Value counts of Profession column Artist 31.184928 Healthcare 16.509668 Entertainment 11.762519 Engineer 8.663857 Doctor 8.527516 Lawyer 7.721864 Executive 7.424393 Marketing 3.619236 Homemaker 3.049083 Name: Profession, dtype: float64 Value counts of Var_1 column Cat_6 64.923153 Cat_4 13.497769 Cat_3 10.188399 Cat_2 5.230540 Cat_7 2.516113 Cat_1 1.648488 Cat_5 1.053545 Name: Var_1, dtype: float64
#Numerical Variables with missing values
num_miss_values = variables_overview.loc[~(variables_overview["dtype"] == "object") & (variables_overview["null"] > 0)]
num_miss_values
unique | dtype | null | null % | |
---|---|---|---|---|
Work_Experience | 15 | float64 | 829 | 10.275161 |
Family_Size | 9 | float64 | 335 | 4.152206 |
#Scaled data before using knnimputer
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
train_data_encoded_sca = pd.DataFrame(scaler.fit_transform(train_data_encoded), columns = train_data_encoded.columns)
train_data_encoded_sca.head()
Age | Work_Experience | Family_Size | Segmentation | Gender_Female | Gender_Male | Ever_Married_No | Ever_Married_Yes | Graduated_No | Graduated_Yes | … | Var_1_Cat_4 | Var_1_Cat_5 | Var_1_Cat_6 | Var_1_Cat_7 | Gender_missing | Ever_Married_missing | Graduated_missing | Profession_missing | Spending_Score_missing | Var_1_missing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.056338 | 0.071429 | 0.375 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.281690 | NaN | 0.250 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.690141 | 0.071429 | 0.000 | 0.333333 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.690141 | 0.000000 | 0.125 | 0.333333 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.309859 | NaN | 0.625 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
5 rows × 35 columns
from sklearn.impute import KNNImputer
imputer = KNNImputer(n_neighbors=5)
imputed_data = pd.DataFrame(imputer.fit_transform(train_data_encoded_sca), columns=train_data_encoded_sca.columns)
imputed_data
Age | Work_Experience | Family_Size | Segmentation | Gender_Female | Gender_Male | Ever_Married_No | Ever_Married_Yes | Graduated_No | Graduated_Yes | … | Var_1_Cat_4 | Var_1_Cat_5 | Var_1_Cat_6 | Var_1_Cat_7 | Gender_missing | Ever_Married_missing | Graduated_missing | Profession_missing | Spending_Score_missing | Var_1_missing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.056338 | 0.071429 | 0.375 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.281690 | 0.057143 | 0.250 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.690141 | 0.071429 | 0.000 | 0.333333 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.690141 | 0.000000 | 0.125 | 0.333333 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.309859 | 0.328571 | 0.625 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
8063 | 0.056338 | 0.000000 | 0.750 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 |
8064 | 0.239437 | 0.214286 | 0.375 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8065 | 0.211268 | 0.071429 | 0.000 | 1.000000 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8066 | 0.126761 | 0.071429 | 0.375 | 0.333333 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8067 | 0.267606 | 0.000000 | 0.250 | 0.333333 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8068 rows × 35 columns
Building Model¶
- LogisticRegression: a classification model that uses logistic regression to estimate the probability of a binary target variable.
- KNeighborsClassifier: a classification model that uses the k-nearest neighbors algorithm to classify new instances based on the k nearest neighbors in the training set.
- SVC: a classification model that uses a support vector machine to find the hyperplane that separates the different classes.
- DecisionTreeClassifier: a classification model that uses a decision tree to split the data into smaller subgroups and make predictions based on the features that are most important for each subgroup.
- RandomForestClassifier: an ensemble model that combines multiple decision trees to make predictions.
- GradientBoostingClassifier: an ensemble model that trains weak models (e.g., decision trees) sequentially and combines their predictions to improve accuracy.
- AdaBoostClassifier: an ensemble model that trains a series of weak models (e.g., decision stumps) sequentially, with each new model focusing on the instances that were misclassified by the previous models.
- XGBClassifier: an implementation of gradient boosting that is optimized for speed and performance.
- classification_report: a function that generates a report with precision, recall, and F1-score for each class, as well as the overall accuracy, for the model.
#IMPORT MODELS
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier,GradientBoostingClassifier,AdaBoostClassifier
from xgboost import XGBClassifier
from sklearn.metrics import classification_report
#make a copy
model_data = imputed_data.copy()
FEATURE ENGINEERING¶
In machine learning, feature selection plays a critical role in improving the accuracy and efficiency of predictive models. The process of feature selection involves identifying and removing irrelevant or redundant features from the dataset to focus on the most informative ones. One common approach to feature selection is to remove low-variance features from the dataset. Variance tells us about the spread of the data. It tells us how far the points are from the mean.
Low-variance features are those whose values do not vary much across the samples in the dataset. These features can be problematic because they often contain little information and can add noise to the data. In some cases, low-variance features can even lead to overfitting, where a model learns the noise in the data instead of the underlying patterns. Therefore, it is crucial to identify and remove such features from the dataset during feature selection.
The VarianceThreshold class from the sklearn.feature_selection module is a useful tool for removing low-variance features from the dataset. This class removes all features whose variance does not meet a certain threshold, which can be set to an arbitrary value. For example, a threshold of 0.1 means that any features with a variance less than or equal to 0.1 will be removed from the dataset.
To apply this method, one can fit the VarianceThreshold class to the training set X_train using the fit method. This computes the variance of each feature in X_train and stores it as an attribute of the transformer object. After fitting the transformer to the data, the transform method can be called to apply the transformer to any dataset with the same number of features as X_train. This will remove any features that have a variance less than or equal to the threshold specified during fitting.
It is important to note that the variance threshold calculation depends on the probability density function of a particular distribution. For example, if a feature has a normal distribution, the variance threshold can be calculated using the normal variance formula. The effectiveness of the feature selection process can be evaluated by measuring the accuracy of the predictions after removing low-variance features.
For example, suppose we have a dataset containing features that are believed to be important in predicting the likelihood of a customer to make a purchase. Upon inspecting the dataset, one feature is found to have a variance of 95% or more, indicating that the values of this feature are very close to zero. This suggests that the feature is unlikely to contribute much to the predictive performance of the model. Therefore, it is recommended to remove the feature from the dataset during the feature selection process.
from sklearn.feature_selection import VarianceThreshold
# Initialize VarianceThreshold with threshold=0.01
vt = VarianceThreshold(threshold=0.001)
# Fit and transform your data
df_transformed = pd.DataFrame(vt.fit_transform(model_data), columns=model_data.columns[vt.get_support()])
df_transformed
Age | Work_Experience | Family_Size | Segmentation | Gender_Female | Gender_Male | Ever_Married_No | Ever_Married_Yes | Graduated_No | Graduated_Yes | … | Var_1_Cat_2 | Var_1_Cat_3 | Var_1_Cat_4 | Var_1_Cat_5 | Var_1_Cat_6 | Var_1_Cat_7 | Ever_Married_missing | Graduated_missing | Profession_missing | Var_1_missing | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.056338 | 0.071429 | 0.375 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
1 | 0.281690 | 0.057143 | 0.250 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 0.690141 | 0.071429 | 0.000 | 0.333333 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 0.690141 | 0.000000 | 0.125 | 0.333333 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
4 | 0.309859 | 0.328571 | 0.625 | 0.000000 | 1.0 | 0.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
… | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … | … |
8063 | 0.056338 | 0.000000 | 0.750 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 |
8064 | 0.239437 | 0.214286 | 0.375 | 1.000000 | 0.0 | 1.0 | 1.0 | 0.0 | 1.0 | 0.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8065 | 0.211268 | 0.071429 | 0.000 | 1.000000 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8066 | 0.126761 | 0.071429 | 0.375 | 0.333333 | 1.0 | 0.0 | 1.0 | 0.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8067 | 0.267606 | 0.000000 | 0.250 | 0.333333 | 0.0 | 1.0 | 0.0 | 1.0 | 0.0 | 1.0 | … | 0.0 | 0.0 | 1.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 | 0.0 |
8068 rows × 33 columns
model_data= df_transformed.copy()
#Check for null values before feeding to the machine
variables_overview1(model_data)
unique | dtype | null | null % | |
---|---|---|---|---|
Age | 67 | float64 | 0 | 0.0 |
Work_Experience | 81 | float64 | 0 | 0.0 |
Family_Size | 28 | float64 | 0 | 0.0 |
Segmentation | 4 | float64 | 0 | 0.0 |
Gender_Female | 2 | float64 | 0 | 0.0 |
Gender_Male | 2 | float64 | 0 | 0.0 |
Ever_Married_No | 2 | float64 | 0 | 0.0 |
Ever_Married_Yes | 2 | float64 | 0 | 0.0 |
Graduated_No | 2 | float64 | 0 | 0.0 |
Graduated_Yes | 2 | float64 | 0 | 0.0 |
Profession_Artist | 2 | float64 | 0 | 0.0 |
Profession_Doctor | 2 | float64 | 0 | 0.0 |
Profession_Engineer | 2 | float64 | 0 | 0.0 |
Profession_Entertainment | 2 | float64 | 0 | 0.0 |
Profession_Executive | 2 | float64 | 0 | 0.0 |
Profession_Healthcare | 2 | float64 | 0 | 0.0 |
Profession_Homemaker | 2 | float64 | 0 | 0.0 |
Profession_Lawyer | 2 | float64 | 0 | 0.0 |
Profession_Marketing | 2 | float64 | 0 | 0.0 |
Spending_Score_Average | 2 | float64 | 0 | 0.0 |
Spending_Score_High | 2 | float64 | 0 | 0.0 |
Spending_Score_Low | 2 | float64 | 0 | 0.0 |
Var_1_Cat_1 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_2 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_3 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_4 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_5 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_6 | 2 | float64 | 0 | 0.0 |
Var_1_Cat_7 | 2 | float64 | 0 | 0.0 |
Ever_Married_missing | 2 | float64 | 0 | 0.0 |
Graduated_missing | 2 | float64 | 0 | 0.0 |
Profession_missing | 2 | float64 | 0 | 0.0 |
Var_1_missing | 2 | float64 | 0 | 0.0 |
Create a dictionary called models that contains instances of the several machine learning models from scikit-learn and XGBoost libraries which will be used.
- Each model is initialized with its default hyperparameters except for LogisticRegression, which has max_iter=10000. This means that the logistic regression model will run for a maximum of 10,000 iterations
models = {'LogisticRegression': LogisticRegression(max_iter=10000, multi_class='multinomial'),
'KNeighborsClassifier': KNeighborsClassifier(),
'SVC': SVC(),
'DecisionTreeClassifier': DecisionTreeClassifier(),
'RandomForestClassifier': RandomForestClassifier(),
'GradientBoostingClassifier': GradientBoostingClassifier(),
'AdaBoostClassifier': AdaBoostClassifier(),
'XGBClassifier': XGBClassifier()}
features = model_data.drop(columns=['Segmentation'],axis=1)
Use the train_test_split function from scikit-learn to split a dataset into training and testing sets.
from sklearn.model_selection import train_test_split
target = model_data['Segmentation']
X_train,X_test,y_train,y_test = train_test_split(features,target,test_size=0.2,random_state=0)
from sklearn.preprocessing import LabelEncoder
encoder = LabelEncoder()
y_train = encoder.fit_transform(y_train)
y_test = encoder.transform(y_test)
FIT BASELINE MODEL¶
def fit_and_score(models, X_train, X_test, y_train, y_test):
np.random.seed(0)
model_scores = {}
for name, model in models.items():
print(f"Training model: {name}")
model.fit(X_train,y_train)
model_scores[name] = model.score(X_test,y_test)
model_scores = pd.DataFrame(model_scores,index=['Score']).transpose()
model_scores = model_scores.sort_values('Score')
return model_scores
model_scores = fit_and_score(models,X_train,X_test,y_train,y_test)
Training model: LogisticRegression Training model: KNeighborsClassifier
C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1)
Training model: SVC Training model: DecisionTreeClassifier Training model: RandomForestClassifier Training model: GradientBoostingClassifier Training model: AdaBoostClassifier Training model: XGBClassifier
cm = sns.color_palette('PuBuGn',as_cmap=True)
score = model_scores.style.background_gradient(cmap=cm)
score
Score | |
---|---|
DecisionTreeClassifier | 0.438662 |
KNeighborsClassifier | 0.474597 |
RandomForestClassifier | 0.480793 |
SVC | 0.493185 |
XGBClassifier | 0.501239 |
LogisticRegression | 0.505576 |
AdaBoostClassifier | 0.513631 |
GradientBoostingClassifier | 0.521066 |
get_param= GradientBoostingClassifier()
# Get all the hyperparameters and their default values
params = get_param.get_params()
# Print the hyperparameters and their values
for param_name in sorted(params.keys()):
print("%s: %r" % (param_name, params[param_name]))
ccp_alpha: 0.0 criterion: 'friedman_mse' init: None learning_rate: 0.1 loss: 'deviance' max_depth: 3 max_features: None max_leaf_nodes: None min_impurity_decrease: 0.0 min_samples_leaf: 1 min_samples_split: 2 min_weight_fraction_leaf: 0.0 n_estimators: 100 n_iter_no_change: None random_state: None subsample: 1.0 tol: 0.0001 validation_fraction: 0.1 verbose: 0 warm_start: False
PARAMETERS OPTIMISATION¶
param_grid = {'LogisticRegression': {'penalty': ['l2'],
'C': [0.001, 0.01, 0.1, 1, 10, 100],
'solver': ['newton-cg', 'lbfgs', 'saga','liblinear','newton-cholesky','sag'],
'tol': [1e-4, 1e-3, 1e-2],
'fit_intercept': [True, False],
'class_weight': [None, 'balanced']
},
'KNeighborsClassifier': {'n_neighbors': [3, 5, 7, 9],
'weights': ['uniform', 'distance'],
'metric': ['euclidean', 'manhattan'],
'algorithm': ['auto'],
'leaf_size': [10,30,50],
'p':[1,2]
},
'SVC': {'kernel': ['linear', 'poly', 'rbf', 'sigmoid'],
'C': [0.001, 0.01, 0.1, 1, 10, 100],
'gamma': ['scale', 'auto'],
'degree': [2,3,4]
},
'DecisionTreeClassifier': {'criterion': ['gini', 'entropy'],
'max_depth': [None, 2, 5, 10],
'min_samples_split': [2, 5, 10]},
'RandomForestClassifier': {'n_estimators': [50, 100, 200],
'max_depth': [None, 2, 5, 10],
'min_samples_split': [2, 5, 10]},
'GradientBoostingClassifier': {'loss': ['deviance', 'exponential'],
'learning_rate': [0.001, 0.01, 0.1, 1],
'n_estimators': [50, 100, 200],
'max_depth': [2, 5, 10]},
'AdaBoostClassifier': {'algorithm': ['SAMME', 'SAMME.R'],
'n_estimators': [50, 100, 200],
'learning_rate': [0.001, 0.01, 0.1, 1]},
'XGBClassifier': {'max_depth': [3, 5, 7, 10],
'learning_rate': [0.001, 0.01, 0.1, 1],
'n_estimators': [50, 100, 200],
'gamma': [0, 0.1, 1]}}
RANDOMSEARCHCV¶
from sklearn.model_selection import RandomizedSearchCV
import pandas as pd
def random_search_models(models, params, X_train, y_train, X_test, y_test):
results = pd.DataFrame(columns=['Model', 'Best Parameters', 'Train Score', 'Test Score'])
for name, model in models.items():
print(f"Training model: {name}")
params_grid = params[name]
random_search = RandomizedSearchCV(model, params_grid, cv=3, n_iter=25, verbose=1, n_jobs=-1)
random_search.fit(X_train, y_train)
# Evaluate on train and test sets
train_score = random_search.best_score_
test_score = random_search.score(X_test, y_test)
# Append results to the data frame
results = results.append({
'Model': name,
'Best Parameters': random_search.best_params_,
'Train Score': train_score,
'Test Score': test_score
}, ignore_index=True)
return results
results_rand = random_search_models(models, param_grid, X_train, y_train, X_test, y_test)
Training model: LogisticRegression Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:372: FitFailedWarning: 21 fits failed out of a total of 75. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 12 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1519, in fit multi_class = _check_multi_class(self.multi_class, solver, len(self.classes_)) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 483, in _check_multi_class raise ValueError("Solver %s does not support a multinomial backend." % solver) ValueError: Solver liblinear does not support a multinomial backend. -------------------------------------------------------------------------------- 9 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1461, in fit solver = _check_solver(self.solver, self.penalty, self.dual) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 434, in _check_solver raise ValueError( ValueError: Logistic Regression supports only solvers in ['liblinear', 'newton-cg', 'lbfgs', 'sag', 'saga'], got newton-cholesky. warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan 0.49225131 0.51146609 0.49364838 0.49054962 0.49054962 0.51193063 nan nan 0.51208574 nan 0.5120856 0.51162106 0.49767608 nan 0.49054962 0.5058883 0.51193071 0.50650738 0.51146609 0.50635299 0.4930265 nan 0.50681738 nan] warnings.warn( C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: KNeighborsClassifier Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: SVC Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({ C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:292: UserWarning: The total space of parameters 24 is smaller than n_iter=25. Running 24 iterations. For exhaustive searches, use GridSearchCV. warnings.warn(
Training model: DecisionTreeClassifier Fitting 3 folds for each of 24 candidates, totalling 72 fits
C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: RandomForestClassifier Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: GradientBoostingClassifier Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:372: FitFailedWarning: 30 fits failed out of a total of 75. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 30 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb.py", line 525, in fit self._check_params() File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb.py", line 310, in _check_params self.loss_ = loss_class(self.n_classes_) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb_losses.py", line 890, in __init__ raise ValueError( ValueError: ExponentialLoss requires 2 classes; got 4 class(es) warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [ nan nan 0.52913035 nan 0.51936788 0.52928453 nan 0.46575776 0.47427956 0.47397143 nan nan 0.52014408 0.50000115 0.43740563 nan 0.51983473 0.4893127 nan 0.48900061 0.51441046 0.51750951 0.48094241 nan nan] warnings.warn( C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({ C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:292: UserWarning: The total space of parameters 24 is smaller than n_iter=25. Running 24 iterations. For exhaustive searches, use GridSearchCV. warnings.warn(
Training model: AdaBoostClassifier Fitting 3 folds for each of 24 candidates, totalling 72 fits
C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: XGBClassifier Fitting 3 folds for each of 25 candidates, totalling 75 fits
C:\Users\d\AppData\Local\Temp\ipykernel_14000\677180539.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
results_rand.sort_values('Test Score', ascending=False, inplace=True)
results_rand
Model | Best Parameters | Train Score | Test Score | |
---|---|---|---|---|
4 | RandomForestClassifier | {‘n_estimators’: 200, ‘min_samples_split’: 5, … | 0.537961 | 0.530359 |
5 | GradientBoostingClassifier | {‘n_estimators’: 200, ‘max_depth’: 2, ‘loss’: … | 0.529285 | 0.521066 |
3 | DecisionTreeClassifier | {‘min_samples_split’: 2, ‘max_depth’: 5, ‘crit… | 0.516735 | 0.508055 |
7 | XGBClassifier | {‘n_estimators’: 50, ‘max_depth’: 7, ‘learning… | 0.526186 | 0.506196 |
0 | LogisticRegression | {‘tol’: 0.0001, ‘solver’: ‘sag’, ‘penalty’: ‘l… | 0.512086 | 0.505576 |
6 | AdaBoostClassifier | {‘n_estimators’: 100, ‘learning_rate’: 0.1, ‘a… | 0.517199 | 0.503098 |
2 | SVC | {‘kernel’: ‘rbf’, ‘gamma’: ‘scale’, ‘degree’: … | 0.514254 | 0.493185 |
1 | KNeighborsClassifier | {‘weights’: ‘uniform’, ‘p’: 1, ‘n_neighbors’: … | 0.489931 | 0.491945 |
GRID SEARCH¶
Grid Search uses a different combination of all the specified hyperparameters and their values and calculates the performance for each combination and selects the best value for the hyperparameters. This makes the processing time-consuming and expensive based on the number of hyperparameters involved.
from sklearn.model_selection import GridSearchCV
import pandas as pd
def grid_search_models(models, params, X_train, y_train, X_test, y_test):
results = pd.DataFrame(columns=['Model', 'Best Parameters', 'Train Score', 'Test Score'])
for name, model in models.items():
print(f"Training model: {name}")
params_grid = params[name]
grid_search = GridSearchCV(model, params_grid, cv=3)
grid_search.fit(X_train, y_train)
# Evaluate on train and test sets
train_score = grid_search.best_score_
test_score = grid_search.score(X_test, y_test)
# Append results to the data frame
results = results.append({
'Model': name,
'Best Parameters': grid_search.best_params_,
'Train Score': train_score,
'Test Score': test_score
}, ignore_index=True)
return results
results = grid_search_models(models, param_grid, X_train, y_train, X_test, y_test)
Training model: LogisticRegression
C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:372: FitFailedWarning: 432 fits failed out of a total of 1296. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 216 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1519, in fit multi_class = _check_multi_class(self.multi_class, solver, len(self.classes_)) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 483, in _check_multi_class raise ValueError("Solver %s does not support a multinomial backend." % solver) ValueError: Solver liblinear does not support a multinomial backend. -------------------------------------------------------------------------------- 216 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 1461, in fit solver = _check_solver(self.solver, self.penalty, self.dual) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.py", line 434, in _check_solver raise ValueError( ValueError: Logistic Regression supports only solvers in ['liblinear', 'newton-cg', 'lbfgs', 'sag', 'saga'], got newton-cholesky. warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [0.4549124 0.4549124 0.4549124 0.4549124 0.4549124 0.4549124 0.4549124 0.4549124 0.45599681 nan nan nan nan nan nan 0.4549124 0.45460247 0.45537708 0.45708165 0.45708165 0.45708165 0.45708165 0.45708165 0.45708165 0.45708165 0.45708165 0.45739151 nan nan nan nan nan nan 0.45708165 0.45723654 0.45754662 0.49364838 0.49364838 0.49364838 0.49364838 0.49364838 0.49364838 0.49364838 0.49349349 0.49364824 nan nan nan nan nan nan 0.49349349 0.49271865 0.49581756 0.49054962 0.49054962 0.49054962 0.49054962 0.49054962 0.49054962 0.49054962 0.49039466 0.49085956 nan nan nan nan nan nan 0.49054962 0.49054962 0.48946493 0.49814126 0.49814126 0.49814126 0.49814126 0.49814126 0.49814126 0.49814126 0.49814126 0.50000072 nan nan nan nan nan nan 0.49814126 0.49814119 0.49721154 0.49783104 0.49783104 0.49767608 0.49767608 0.49767608 0.49767608 0.49783104 0.49767608 0.49659153 nan nan nan nan nan nan 0.49783104 0.49783104 0.49798601 0.49318147 0.49318147 0.49318147 0.4930265 0.4930265 0.4930265 0.49318147 0.4930265 0.49302657 nan nan nan nan nan nan 0.4930265 0.49333643 0.49318132 0.49240627 0.49240627 0.49240627 0.49240627 0.49240627 0.49240627 0.49240627 0.49209634 0.49163151 nan nan nan nan nan nan 0.49240627 0.49225131 0.49225123 0.50650745 0.50650745 0.50650745 0.50650745 0.50650745 0.50650745 0.50650745 0.50666242 0.50650759 nan nan nan nan nan nan 0.50650745 0.50604255 0.50604284 0.50650738 0.50650738 0.50650738 0.50635241 0.50635241 0.50635241 0.50650738 0.50619752 0.50480339 nan nan nan nan nan nan 0.50650738 0.50635248 0.50650774 0.50728235 0.50728235 0.50728235 0.50728235 0.50728235 0.50728235 0.50728235 0.50728235 0.50681753 nan nan nan nan nan nan 0.50728235 0.50712746 0.50650752 0.50681717 0.50681717 0.50681717 0.50681717 0.50681717 0.50681717 0.50681717 0.50650738 0.50526772 nan nan nan nan nan nan 0.50681717 0.50650738 0.50743689 0.51193071 0.51193071 0.51224064 0.51224064 0.51224064 0.51224064 0.51193071 0.51193071 0.51193107 nan nan nan nan nan nan 0.51193071 0.51162099 0.5113112 0.51193063 0.51193063 0.51193063 0.51193063 0.51193063 0.51193063 0.5120856 0.51193071 0.51239568 nan nan nan nan nan nan 0.5120856 0.51239561 0.5127059 0.50588809 0.50588809 0.50588809 0.50588809 0.50588809 0.50588809 0.50588809 0.50588809 0.50557822 nan nan nan nan nan nan 0.50573312 0.50542319 0.50666256 0.50635299 0.50635299 0.50635299 0.50635299 0.50635299 0.50635299 0.50635299 0.50650781 0.50635284 nan nan nan nan nan nan 0.50635299 0.50619802 0.5049585 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51131113 0.51348037 nan nan nan nan nan nan 0.51146609 0.51131113 0.51162106 0.51162106 0.51162106 0.51162106 0.51162106 0.51162106 0.51162106 0.51162106 0.51193107 0.50852266 nan nan nan nan nan nan 0.51162106 0.5113112 0.5113112 0.5058883 0.5058883 0.5058883 0.5058883 0.5058883 0.5058883 0.5058883 0.50604327 0.50557801 nan nan nan nan nan nan 0.5058883 0.50619816 0.50697292 0.5058883 0.5058883 0.5058883 0.50573334 0.50573334 0.50573334 0.5058883 0.50619816 0.50697328 nan nan nan nan nan nan 0.5058883 0.5060432 0.50588816 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.5113112 0.51100112 nan nan nan nan nan nan 0.51146609 0.5113112 0.50821201 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51146609 0.51162113 0.51038169 nan nan nan nan nan nan 0.51146609 0.51115623 0.51100127 0.50588823 0.50588823 0.5060432 0.50588823 0.50588823 0.50588823 0.50588823 0.50635313 0.50604276 nan nan nan nan nan nan 0.50588823 0.50573334 0.50604334 0.50588823 0.50588823 0.50588823 0.50588823 0.50588823 0.50588823 0.50588823 0.50588823 0.50759265 nan nan nan nan nan nan 0.50588823 0.50604327 0.50650795] warnings.warn( C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: KNeighborsClassifier
C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\anaconda3\lib\site-packages\sklearn\neighbors\_classification.py:228: FutureWarning: Unlike other reduction functions (e.g. `skew`, `kurtosis`), the default behavior of `mode` typically preserves the axis it acts along. In SciPy 1.11.0, this behavior will change: the default value of `keepdims` will become False, the `axis` over which the statistic is taken will be eliminated, and the value None will no longer be accepted. Set `keepdims` to True or False to avoid this warning. mode, _ = stats.mode(_y[neigh_ind, k], axis=1) C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: SVC
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: DecisionTreeClassifier
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: RandomForestClassifier
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: GradientBoostingClassifier
C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py:372: FitFailedWarning: 108 fits failed out of a total of 216. The score on these train-test partitions for these parameters will be set to nan. If these failures are not expected, you can try to debug them by setting error_score='raise'. Below are more details about the failures: -------------------------------------------------------------------------------- 108 fits failed with the following error: Traceback (most recent call last): File "C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 680, in _fit_and_score estimator.fit(X_train, y_train, **fit_params) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb.py", line 525, in fit self._check_params() File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb.py", line 310, in _check_params self.loss_ = loss_class(self.n_classes_) File "C:\Users\d\anaconda3\lib\site-packages\sklearn\ensemble\_gb_losses.py", line 890, in __init__ raise ValueError( ValueError: ExponentialLoss requires 2 classes; got 4 class(es) warnings.warn(some_fits_failed_message, FitFailedWarning) C:\Users\d\anaconda3\lib\site-packages\sklearn\model_selection\_search.py:969: UserWarning: One or more of the test scores are non-finite: [0.28153082 0.3862744 0.45181588 0.33157772 0.43725066 0.48900061 0.36597614 0.43756269 0.48853808 nan nan nan nan nan nan nan nan nan 0.49953661 0.50542427 0.52153828 0.51425549 0.52029905 0.52634153 0.49287564 0.49752219 0.49783219 nan nan nan nan nan nan nan nan nan 0.53005986 0.52727025 0.52928453 0.52603239 0.52293363 0.51642517 0.49101409 0.47861741 0.47784387 nan nan nan nan nan nan nan nan nan 0.52913035 0.51936788 0.51750951 0.47892878 0.47180074 0.46684195 0.4714911 0.46823709 0.43181905 nan nan nan nan nan nan nan nan nan] warnings.warn( C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: AdaBoostClassifier
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
Training model: XGBClassifier
C:\Users\d\AppData\Local\Temp\ipykernel_14000\2778447527.py:18: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results = results.append({
results.sort_values('Test Score', ascending=False, inplace=True)
results
Model | Best Parameters | Train Score | Test Score | |
---|---|---|---|---|
4 | RandomForestClassifier | {‘max_depth’: 10, ‘min_samples_split’: 10, ‘n_… | 0.536877 | 0.530979 |
5 | GradientBoostingClassifier | {‘learning_rate’: 0.1, ‘loss’: ‘deviance’, ‘ma… | 0.530060 | 0.515489 |
7 | XGBClassifier | {‘gamma’: 0, ‘learning_rate’: 0.1, ‘max_depth’… | 0.536722 | 0.511152 |
3 | DecisionTreeClassifier | {‘criterion’: ‘entropy’, ‘max_depth’: 5, ‘min_… | 0.516735 | 0.508055 |
0 | LogisticRegression | {‘C’: 10, ‘class_weight’: None, ‘fit_intercept… | 0.513480 | 0.507435 |
6 | AdaBoostClassifier | {‘algorithm’: ‘SAMME.R’, ‘learning_rate’: 0.1,… | 0.517199 | 0.503098 |
2 | SVC | {‘C’: 100, ‘degree’: 2, ‘gamma’: ‘auto’, ‘kern… | 0.524327 | 0.502478 |
1 | KNeighborsClassifier | {‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘metric… | 0.495665 | 0.489467 |
EVALUTAION¶
from sklearn.metrics import classification_report,plot_confusion_matrix
from sklearn.model_selection import cross_val_score
# Select the best model and its parameters
best_model_name = results.loc[results['Test Score'].idxmax(), 'Model']
best_params = results.loc[results['Test Score'].idxmax(), 'Best Parameters']
# Fit a new model with the best parameters using the entire dataset
best_model = eval(best_model_name)(**best_params)
# Get all the hyperparameters and their default values
params = best_model.get_params()
# Print the hyperparameters and their values
for param_name in sorted(params.keys()):
print("%s: %r" % (param_name, params[param_name]))
bootstrap: True ccp_alpha: 0.0 class_weight: None criterion: 'gini' max_depth: 10 max_features: 'auto' max_leaf_nodes: None max_samples: None min_impurity_decrease: 0.0 min_samples_leaf: 1 min_samples_split: 10 min_weight_fraction_leaf: 0.0 n_estimators: 200 n_jobs: None oob_score: False random_state: None verbose: 0 warm_start: False
Fit Best Model¶
best_model.fit(X_train,y_train)
y_pred = best_model.predict(X_test)
print(classification_report(y_test,y_pred))
precision recall f1-score support 0 0.44 0.50 0.47 384 1 0.42 0.29 0.35 399 2 0.52 0.56 0.54 380 3 0.69 0.75 0.72 451 accuracy 0.53 1614 macro avg 0.52 0.53 0.52 1614 weighted avg 0.52 0.53 0.53 1614
plot_confusion_matrix(best_model,X_test,y_test,cmap='BuPu')
C:\Users\d\anaconda3\lib\site-packages\sklearn\utils\deprecation.py:87: FutureWarning: Function plot_confusion_matrix is deprecated; Function `plot_confusion_matrix` is deprecated in 1.0 and will be removed in 1.2. Use one of the class methods: ConfusionMatrixDisplay.from_predictions or ConfusionMatrixDisplay.from_estimator. warnings.warn(msg, category=FutureWarning)
<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay at 0x158600ff3a0>
cv_accuracy = cross_val_score(best_model,X_train,y_train,cv=5,scoring='accuracy')
print(f'Cross Validaion accuracy Scores: {cv_accuracy}')
print(f'Cross Validation accuracy Mean Score: {cv_accuracy.mean()}')
Cross Validaion accuracy Scores: [0.535244 0.54763749 0.50735864 0.53292022 0.57364341] Cross Validation accuracy Mean Score: 0.5393607503347564
FEATURE IMPORTANCE¶
feat_importance = best_model.feature_importances_
feat_importance = pd.DataFrame(feat_importance,
columns=['Score'],
index=features.columns)
feat_importance.sort_values(by='Score',ascending=False).style.background_gradient(cmap=cm)
Score | |
---|---|
Age | 0.219271 |
Profession_Healthcare | 0.088065 |
Profession_Artist | 0.082593 |
Spending_Score_Low | 0.077411 |
Family_Size | 0.065243 |
Work_Experience | 0.053449 |
Ever_Married_Yes | 0.048015 |
Graduated_No | 0.044655 |
Ever_Married_No | 0.044490 |
Graduated_Yes | 0.044005 |
Spending_Score_Average | 0.032310 |
Profession_Entertainment | 0.018965 |
Var_1_Cat_4 | 0.018915 |
Var_1_Cat_6 | 0.017353 |
Profession_Marketing | 0.015496 |
Gender_Female | 0.014493 |
Gender_Male | 0.014491 |
Profession_Engineer | 0.014016 |
Spending_Score_High | 0.011619 |
Var_1_Cat_3 | 0.009725 |
Var_1_Cat_2 | 0.008396 |
Profession_Doctor | 0.008241 |
Profession_Executive | 0.007588 |
Profession_Homemaker | 0.006736 |
Profession_Lawyer | 0.006035 |
Var_1_Cat_7 | 0.005574 |
Ever_Married_missing | 0.005088 |
Var_1_Cat_1 | 0.004616 |
Var_1_missing | 0.003588 |
Profession_missing | 0.003487 |
Graduated_missing | 0.003106 |
Var_1_Cat_5 | 0.002963 |
plt.figure(figsize=(20,10))
plt.title('Feature Importances')
sns.barplot(x=feat_importance.Score,y=feat_importance.index)
<AxesSubplot:title={'center':'Feature Importances'}, xlabel='Score'>
DEEP NEURAL NETWORK¶
# Get an idea of the shape of the data and type of the data
print(X_train.shape, X_test.shape, y_train.shape, y_test.shape)
(6454, 32) (1614, 32) (6454,) (1614,)
#Get the number of features
n_features = features.shape[1]
n_features
32
The resulting X_train_new and X_test_new datasets will have only the high-variance features, which can help improve the performance of some machine learning algorithms by reducing the noise in the data.
BUILD THREE DIFFERNT MODEL WITH THREE DIFFERENT LAYERS¶
For our neural networks models. We build three different models, each with different layers. The first model with only one hidden dense layer and the second one with two and the third with three. We chose 50 neurons for every single dense. Selu activation was chosen as it has been proven to be very effecive over other activation measures such as elu and relu. Beside this the selu model was chosen for its self normalisation function. We also adopted a kernel initializer(lecun_normal), as reccomended by Aurelien[1]. For optimizer adam was chosen, which is proven to be very effective when compared to the popular sgd optimizer. We then created a dictionar and fitted the three models through a function. The model with only one hidden layer produced the best results on both training and validation data.
We chose randomly chose 50 default neurons for each dense layer.
We then implemented grid search cv to search for optimal parameters to improve the model. For the number of optimal neurons we selected a number below the initial input neuron and another greater than the initial input neurons. This was done to have a roughly idea to know if we had increase or decrease the neurons. For our optmiser the grid search was between adam and nadam. We also grid search the number of epoch.
Code Explanation¶
This code defines a function create_custom_model that takes in three arguments: input_dim, output_dim, and n. The function returns another function create_model that builds a neural network model with the specified parameters. The create_model function takes three optional arguments neuron1, activation, and optimizer, which can be used to customize the model architecture.
The create_model function first creates a Sequential model object with the specified name. Then, it adds an input layer with the shape input_dim and a specified activation function.
Next, the function adds n hidden layers to the model, each with neuron1 neurons and the specified activation function. The kernel_initializer parameter is set to “lecun_normal”, which is a common weight initialization scheme in deep learning.
Finally, the function adds an output layer with output_dim neurons and a softmax activation function, which is commonly used for multi-class classification problems.
The model is compiled with the sparse_categorical_crossentropy loss function, which is used for multi-class classification problems with integer labels. The specified optimizer is used to optimize the model parameters during training, and the model’s accuracy is used as a metric during training.
The function create_custom_model returns the create_model function, which can be used to create multiple neural network models with different architectures.
The models list contains three elements, each of which is a create_custom_model function with a different value of n (1, 2, or 3). The for loop iterates over each create_custom_model function in models and calls the create_model function to build the corresponding neural network model. The summary method is called on each model to print a summary of its architecture.
The output_dim parameter is used to specify the number of output units (neurons) in the last layer of the neural network model. In this case, the output_dim is set to 4, which means that the model will have 4 output neurons in the last layer. In a multi-class classification problem like this one, the output_dim would be set to the number of classes being predicted.
The parameter n is used to specify the number of hidden layers in the neural network. In the create_custom_model function, n is a hyperparameter that can be adjusted to create neural networks with varying numbers of hidden layers.
The for loop in the code block creates three different neural networks (models) with 1, 2, and 3 hidden layers respectively, by calling create_custom_model with different values of n (i.e., 1, 2, and 3).
The purpose of creating multiple models with different numbers of hidden layers is to evaluate which model architecture performs the best on a given task.
from tensorflow import keras
#Function to build three different models
def create_custom_model(input_dim, output_dim, n=1, name='model'):
def create_model(neuron1=200,
activation='selu',
optimizer= 'adam'
):
# Create model
#modeltt.add(keras.layers.InputLayer(input_shape=[58,]))
model = keras.models.Sequential(name=name)
model.add(keras.layers.InputLayer(input_shape=input_dim))
for i in range(n):
model.add(keras.layers.Dense(neuron1, activation=activation, kernel_initializer="lecun_normal",))
model.add(keras.layers.Dense(output_dim, activation='softmax'))
#optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999)
# Compile model
model.compile(loss="sparse_categorical_crossentropy",
optimizer= optimizer,
metrics=['accuracy'])
return model
return create_model
models = [create_custom_model(n_features, 4, i, 'model_{}'.format(i))
for i in range(1, 4)]
for create_model in models:
create_model().summary()
Model: "model_1" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense (Dense) (None, 200) 6600 dense_1 (Dense) (None, 4) 804 ================================================================= Total params: 7,404 Trainable params: 7,404 Non-trainable params: 0 _________________________________________________________________ Model: "model_2" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_2 (Dense) (None, 200) 6600 dense_3 (Dense) (None, 200) 40200 dense_4 (Dense) (None, 4) 804 ================================================================= Total params: 47,604 Trainable params: 47,604 Non-trainable params: 0 _________________________________________________________________ Model: "model_3" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= dense_5 (Dense) (None, 200) 6600 dense_6 (Dense) (None, 200) 40200 dense_7 (Dense) (None, 200) 40200 dense_8 (Dense) (None, 4) 804 ================================================================= Total params: 87,804 Trainable params: 87,804 Non-trainable params: 0 _________________________________________________________________
The EarlyStopping callback is used in Keras to stop the training process of a neural network when a monitored quantity stops improving for a specified number of epochs. In this case, the monitored quantity is the validation loss, and the patience parameter determines the number of epochs with no improvement that the training process will wait before stopping.
The restore_best_weights parameter is used to restore the weights of the best model obtained during training. This means that if the training process stops early due to lack of improvement, the weights of the model at the point where the validation loss was at its minimum will be restored. This can help prevent overfitting and improve the performance of the model on new data.
In this code snippet, early_stopping_cb is an instance of the EarlyStopping callback class with a patience parameter of 10 and restore_best_weights set to True. This means that if the validation loss does not improve for 10 epochs, the training process will stop, and the weights of the best model obtained during training will be restored.
#DEFINE CALLBACK PARAMETERS
early_stopping_cb = keras.callbacks.EarlyStopping(patience=15, restore_best_weights=True)
This code defines a loop to fit the models created earlier to the training data using early stopping as a callback. The results of the training and validation losses and accuracies are printed for each model. Finally, a dictionary is created to store the training history and model object for each model.
Here’s what each line does:
- from keras.callbacks import TensorBoard: import the TensorBoard callback from Keras.
- history_dict = {}: create an empty dictionary to store the training history and model object for each model.
- for create_model in models:: loop over each of the models created earlier.
- model = create_model(): create an instance of the current model.
- print(‘Model name:’, model.name): print the name of the current model.
- history_callback = model.fit(X_train, y_train, batch_size=10, epochs=50, verbose=0, validation_data=(X_test, y_test), callbacks=[early_stopping_cb]): fit the current model to the training data using early stopping as a callback. The batch size is set to 10, the number of epochs is set to 50, and the verbose argument is set to 0 to suppress progress updates. The validation data is specified as well as the early stopping callback.
- val_score_mlp = model.evaluate(X_test, y_test, verbose=0): evaluate the current model on the validation data and store the results in val_score_mlp.
- print(‘Validation loss:’, val_score_mlp[0]): print the validation loss for the current model.
- print(‘Validation accuracy:’, val_score_mlp[1]): print the validation accuracy for the current model.
- train_score_mlp = model.evaluate(X_train, y_train, verbose=0): evaluate the current model on the training data and store the results in train_score_mlp.
- print(‘Training loss:’, train_score_mlp[0]): print the training loss for the current model.
- print(‘Training accuracy:’, train_score_mlp[1]): print the training accuracy for the current model.
- history_dict[model.name] = [history_callback, model]: add an entry to the history_dict dictionary for the current model, with the model’s name as the key and a list containing the training history and the model object as the value.
from keras.callbacks import TensorBoard
history_dict = {}
results_mod_selection = pd.DataFrame(columns=['Model', 'Validation Loss', 'Validation Accuracy', 'Training Loss', 'Training Accuracy'])
for create_model in models:
model = create_model()
print('Model name:', model.name)
history_callback = model.fit(X_train, y_train,
batch_size=10,
epochs=100,
verbose=0,
validation_data=(X_test, y_test),
callbacks=[early_stopping_cb])
val_score_mlp = model.evaluate(X_test, y_test, verbose=0)
print('Validation loss:', val_score_mlp[0])
print('Validation accuracy:', val_score_mlp[1])
train_score_mlp = model.evaluate(X_train, y_train, verbose=0)
print('Training loss:', train_score_mlp[0])
print('Training accuracy:', train_score_mlp[1])
history_dict[model.name] = [history_callback, model]
results_mod_selection = results_mod_selection.append({
'Model': model.name,
'Validation Loss': val_score_mlp[0],
'Validation Accuracy': val_score_mlp[1],
'Training Loss': train_score_mlp[0],
'Training Accuracy': train_score_mlp[1]
}, ignore_index=True)
print(results_mod_selection)
Model name: model_1 Validation loss: 1.0739285945892334 Validation accuracy: 0.5173482298851013 Training loss: 1.055850863456726 Training accuracy: 0.539355456829071 Model name: model_2
C:\Users\d\AppData\Local\Temp\ipykernel_14000\1385447963.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results_mod_selection = results_mod_selection.append({
Validation loss: 1.060271978378296 Validation accuracy: 0.5322180986404419 Training loss: 1.019758939743042 Training accuracy: 0.5568639636039734 Model name: model_3
C:\Users\d\AppData\Local\Temp\ipykernel_14000\1385447963.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results_mod_selection = results_mod_selection.append({
Validation loss: 1.0672200918197632 Validation accuracy: 0.5105328559875488 Training loss: 1.0176751613616943 Training accuracy: 0.563836395740509 Model Validation Loss Validation Accuracy Training Loss \ 0 model_1 1.073929 0.517348 1.055851 1 model_2 1.060272 0.532218 1.019759 2 model_3 1.067220 0.510533 1.017675 Training Accuracy 0 0.539355 1 0.556864 2 0.563836
C:\Users\d\AppData\Local\Temp\ipykernel_14000\1385447963.py:25: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results_mod_selection = results_mod_selection.append({
results_mod_selection
Model | Validation Loss | Validation Accuracy | Training Loss | Training Accuracy | |
---|---|---|---|---|---|
0 | model_1 | 1.073929 | 0.517348 | 1.055851 | 0.539355 |
1 | model_2 | 1.060272 | 0.532218 | 1.019759 | 0.556864 |
2 | model_3 | 1.067220 | 0.510533 | 1.017675 | 0.563836 |
PLOT MODEL TRAINING¶
VALIDATION DATA PLOT¶
fig, (ax1, ax2) = plt.subplots(2, figsize=(8, 6))
for model_name in history_dict:
val_accurady = history_dict[model_name][0].history['val_accuracy']
val_loss = history_dict[model_name][0].history['val_loss']
epochs = range(1, len(val_accurady) + 1)
# Add the number of epochs trained to the model name
model_name = '{} ({} epochs)'.format(model_name, len(val_accurady))
ax1.plot(epochs, val_accurady, label=model_name)
ax2.plot(epochs, val_loss, label=model_name)
ax1.set_ylabel('Validation accuracy')
ax2.set_ylabel('Validation loss')
ax2.set_xlabel('epochs')
ax1.legend()
ax2.legend();
TRAINING DATA PLOT¶
fig, (ax1, ax2) = plt.subplots(2, figsize=(8, 6))
for model_name in history_dict:
accuracy = history_dict[model_name][0].history['accuracy']
loss = history_dict[model_name][0].history['loss']
epochs = range(1, len(accuracy) + 1)
# Add the number of epochs trained to the model name
model_name = '{} ({} epochs)'.format(model_name, len(accuracy))
ax1.plot(epochs, accuracy, label=model_name)
ax2.plot(epochs, loss, label=model_name)
ax1.set_ylabel('training accuracy')
ax2.set_ylabel('training loss')
ax2.set_xlabel('epochs')
ax1.legend()
ax2.legend();
Among the three initial models created, the one with 3 hidden dense layers outperform the other 2.
- The model with 3 hidden layes was the only which kept improving after being trained for more than 50 epochs.
- This was then wrapped with a keras classifier from scikit learn with a batch size of 10 and epochs of 50. The model was then evaluated with a three fold cross validation
# BUILD AND MEASURE MODEL 1 PERFORMANCE WITH CROSS VALIDATION
from keras.wrappers.scikit_learn import KerasClassifier
from sklearn.model_selection import cross_val_score
from sklearn.metrics import accuracy_score, confusion_matrix, recall_score, precision_score, f1_score
create_model = create_custom_model(n_features, 4, 1)
estimator = KerasClassifier(build_fn=create_model, batch_size=10,
epochs=59, verbose=0)
cv_scores_mlp = cross_val_score(estimator, X_train, y_train, cv=3)
print("Accuracy : {:0.4f} (+/- {:0.4f})".format(cv_scores_mlp.mean(), cv_scores_mlp.std()))
C:\Users\d\AppData\Local\Temp\ipykernel_14000\3525485787.py:9: DeprecationWarning: KerasClassifier is deprecated, use Sci-Keras (https://github.com/adriangb/scikeras) instead. See https://www.adriangb.com/scikeras/stable/migration.html for help migrating. estimator = KerasClassifier(build_fn=create_model, batch_size=10,
Accuracy : 0.5121 (+/- 0.0127)
estimator.fit(X_train, y_train)
<keras.callbacks.History at 0x1587f5d6ee0>
valid_mlp_model = estimator.predict(X_test)
val_mlp_score= accuracy_score(y_test, valid_mlp_model)
val_mlp_score
51/51 [==============================] - 1s 2ms/step
0.5148698884758365
MODEL OPTIMISATION AND PARAMETERS TUNNING¶
We then implemented grid search cv to search for optimal parameters to improve the model. For the number of optimal neurons we selected a number below the initial input neuron and another greater than the initial input neurons. This was done to have a roughly idea to know if we had increase or decrease the neurons. We try to optimise our optmiser between adam and namam. We also grid search the number of epoch.
estimator.get_params()
{'batch_size': 10, 'epochs': 59, 'verbose': 0, 'build_fn': <function __main__.create_custom_model.<locals>.create_model(neuron1=200, activation='selu', optimizer='adam')>}
from scipy.stats import reciprocal
optimizer = ['adam', 'nadam']
neuron1 = [150,200,250]
epochs = [20,26,50]
# Parameter space what we want explore
# Note that it exactly matches up to our build_model function parameters.
param_grids = dict( optimizer=optimizer, epochs = epochs, neuron1=neuron1)
grid = GridSearchCV(estimator=estimator,
param_grid=param_grids,
cv= 3,
verbose=0)
#fit grid search
grid_results_mlp = grid.fit(X_train, y_train)
# Print the best parameters that were found
print(grid_results_mlp.best_params_)
print(grid_results_mlp.best_score_)
{'epochs': 50, 'neuron1': 150, 'optimizer': 'adam'} 0.5196779568990072
#Test models on validation and training set and print gridsearch best score cv
#Training score
train_results_mlp_opt = grid_results_mlp.predict(X_train)
train_accuracy_score_gridcv = accuracy_score(y_train, train_results_mlp_opt)
print('Training Score',train_accuracy_score_gridcv)
#Validation score
grid_results_mlp_pred = grid_results_mlp.predict(X_test)
val_accuracy_score_gridcv = accuracy_score(y_test, grid_results_mlp_pred)
202/202 [==============================] - 0s 2ms/step Training Score 0.5557793616361946 51/51 [==============================] - 0s 2ms/step
val_accuracy_score_gridcv
0.5216852540272615
mlp_grid_res= pd.DataFrame({'model_name':['mlp_res'],
'Grid_best_score':[grid_results_mlp.best_score_],
'Test_score':[val_accuracy_score_gridcv],
'Best_Parameters':[grid_results_mlp.best_params_]
})
mlp_grid_res
model_name | Grid_best_score | Test_score | Best_Parameters | |
---|---|---|---|---|
0 | mlp_res | 0.519678 | 0.521685 | {‘epochs’: 50, ‘neuron1’: 150, ‘optimizer’: ‘a… |
## Add results to the result dataframe
results1 = results.append({
'Model': 'Mlp_res',
'Best Parameters': grid_results_mlp.best_params_,
'Train Score': grid_results_mlp.best_score_,
'Test Score': val_accuracy_score_gridcv
}, ignore_index=True)
C:\Users\d\AppData\Local\Temp\ipykernel_14000\4103834277.py:2: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results1 = results.append({
results1
Model | Best Parameters | Train Score | Test Score | |
---|---|---|---|---|
0 | RandomForestClassifier | {‘max_depth’: 10, ‘min_samples_split’: 10, ‘n_… | 0.536877 | 0.530979 |
1 | GradientBoostingClassifier | {‘learning_rate’: 0.1, ‘loss’: ‘deviance’, ‘ma… | 0.530060 | 0.515489 |
2 | XGBClassifier | {‘gamma’: 0, ‘learning_rate’: 0.1, ‘max_depth’… | 0.536722 | 0.511152 |
3 | DecisionTreeClassifier | {‘criterion’: ‘entropy’, ‘max_depth’: 5, ‘min_… | 0.516735 | 0.508055 |
4 | LogisticRegression | {‘C’: 10, ‘class_weight’: None, ‘fit_intercept… | 0.513480 | 0.507435 |
5 | AdaBoostClassifier | {‘algorithm’: ‘SAMME.R’, ‘learning_rate’: 0.1,… | 0.517199 | 0.503098 |
6 | SVC | {‘C’: 100, ‘degree’: 2, ‘gamma’: ‘auto’, ‘kern… | 0.524327 | 0.502478 |
7 | KNeighborsClassifier | {‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘metric… | 0.495665 | 0.489467 |
8 | Mlp_res | {‘epochs’: 50, ‘neuron1’: 150, ‘optimizer’: ‘a… | 0.519678 | 0.521685 |
WIDE AND DEEP NEURAL NETWORK¶
We also implemented a non sequential neural network model, Wide and Deep neural network. The network connects all or parts of the inputs directly to the output layer, and helps the neural network to learn both deep patterns (using the deep path) and simple rules(through the short path), in contrast to a regular MLP which forces all the data to flow through the full stack of layers. We first create an Input object, with the shape equivalent to the number of features in our dataset. Next we create a dense layer of 20 neurons(from our previous grid search), using elu activation function. This dense layer will called like a function and connect it with the input object. We hen create another hidden layer, and use as a funtion to conncect to the previuos hidden layer. Followed by a concatenate layer, to concanate the input and the output of the second hidden layer. Lastly we create an output layer with softmax activation function, and passing it the result of the concatenation and we define a keras model where we specify which input and output to use [2].
The first line creates an input layer that takes in data of shape n_features. The next two lines create two hidden layers, each with 20 neurons and using the ELU activation function. The input_df layer is passed as input to the first hidden layer, and the output of the first hidden layer (hidden1) is passed as input to the second hidden layer (hidden2). The next line concatenates the input_df layer with the output of the second hidden layer (hidden2) using the Concatenate() layer. This combines the information learned by the wide and deep parts of the network. Finally, the output layer is created with 20 neurons and using the softmax activation function. It takes the concatenated output from the previous layer (concat) as input.
#Build a wide and deep neural network model
input_df = keras.layers.Input(shape=n_features)
hidden1 = keras.layers.Dense(100, activation="elu")(input_df)
hidden2 = keras.layers.Dense(100, activation="elu")(hidden1)
concat = keras.layers.Concatenate()([input_df,hidden2])
output = keras.layers.Dense(4, activation="softmax")(concat)
modelwd = keras.Model(inputs=[input_df], outputs=[output])
modelwd.compile(loss="sparse_categorical_crossentropy",
optimizer="adam",
metrics=["accuracy"])
The fit method of a Keras model returns a History object, which contains information about the training history of the model. The History object has attributes such as history.losses and history.metrics that you can use to visualize the training and validation performance of the model over epochs.
- historywd is assigned the output of modelwd.fit(), so it will contain the training history of the Wide and Deep model. The fit method is training the model for 100 epochs with a batch size of 10, using the training set (X_train and y_train). The validation set (X_test and y_test) is used for evaluation after each epoch. The verbose argument is set to 0, which means that no output will be printed during training. The early_stopping_cb callback is also provided, which will stop the training early if the validation loss does not improve after a certain number of epochs.
# Now fit the model
historywd = modelwd.fit(X_train, y_train, epochs=100, batch_size=10, validation_data=(X_test, y_test), verbose=0, callbacks=[early_stopping_cb])
#Plot model graph
import matplotlib.pyplot as plt
pd.DataFrame(historywd.history).plot(figsize=(8, 5))
plt.grid(True)
plt.gca().set_ylim(0, 1.5) # set the vertical range to [0-1] plt.show()
(0.0, 1.5)
modelwd.save('mlp2.h5')
from keras.models import load_model
modelwd= load_model('mlp2.h5')
train_score_wd = modelwd.evaluate(X_train, y_train, verbose=0)
print('Training loss:', train_score_wd[0])
print('Training accuracy:', train_score_wd[1])
val_score_wd = modelwd.evaluate(X_test, y_test, verbose=0)
print('Validation loss:', val_score_wd[0])
print('Validation accuracy:', val_score_wd[1])
Training loss: 1.0182280540466309 Training accuracy: 0.5585683584213257 Validation loss: 1.0593392848968506 Validation accuracy: 0.5254027247428894
# create a new row for the wide and deep model
results_df_wd = pd.DataFrame({
'model_name':['Wide and Deep'],
'train_loss': [train_score_wd[0]],
'train_acc': [train_score_wd[1]],
'val_loss': [val_score_wd[0]],
'val_acc': [val_score_wd[1]],
})
# set the model_name column as the index
results_df_wd = results_df_wd.set_index('model_name')
results_df_wd
train_loss | train_acc | val_loss | val_acc | |
---|---|---|---|---|
model_name | ||||
Wide and Deep | 1.018228 | 0.558568 | 1.059339 | 0.525403 |
Model Tunning and Optimisation¶
#Put the model in a function and compile
def build_wd (optimizer1 = 'adam', neurons=100):
input_df = keras.layers.Input(shape=n_features)
hidden1 = keras.layers.Dense(neurons, activation="elu")(input_df)
hidden2 = keras.layers.Dense(neurons, activation="elu")(hidden1)
concat = keras.layers.Concatenate()([input_df,hidden2])
output = keras.layers.Dense(4, activation="softmax")(concat)
modelwd = keras.Model(inputs=[input_df], outputs=[output])
modelwd.compile(loss="sparse_categorical_crossentropy",
optimizer=optimizer1,
metrics=["accuracy"])
return modelwd
#Wrap mdels in object
estimatorwd = KerasClassifier(build_fn=build_wd, batch_size=10,
epochs=100, verbose=0)
C:\Users\d\AppData\Local\Temp\ipykernel_14000\143830857.py:2: DeprecationWarning: KerasClassifier is deprecated, use Sci-Keras (https://github.com/adriangb/scikeras) instead. See https://www.adriangb.com/scikeras/stable/migration.html for help migrating. estimatorwd = KerasClassifier(build_fn=build_wd, batch_size=10,
#Calculate cross validation score
cv_scores_wd = cross_val_score(estimatorwd, X_train, y_train, cv=3)
print("Accuracy : {:0.4f} (+/- {:0.4f})".format(cv_scores_wd.mean(), cv_scores_wd.std()))
Accuracy : 0.5042 (+/- 0.0127)
#Define hyperparameters to tweak and see which one works best on using K-fold cross validation
optimizer1 =['sgd', 'adam']
batches = [5,10,15]
param_grid = dict (batch_size = batches, optimizer1 = optimizer1)
gridmw = GridSearchCV(estimator=estimatorwd, param_grid= param_grid, cv= 3,
verbose=0)
grid_results_wd= gridmw.fit(X_train, y_train)
grid_results_wd.best_score_
0.5161137382189432
#Validation Score
grid_results_wd_pred = grid_results_wd.predict(X_test)
valid_accuracy_score_wd_grid = accuracy_score(y_test, grid_results_wd_pred)
51/51 [==============================] - 0s 2ms/step
wd_grid_res= pd.DataFrame({'model_name':['Grid_wd_res'],
'Grid_score':[grid_results_wd.best_score_],
'Test_score':[valid_accuracy_score_wd_grid],
'Best_Parameters':[grid_results_wd.best_params_]
})
wd_grid_res=wd_grid_res.set_index('model_name')
wd_grid_res
Grid_score | Test_score | Best_Parameters | |
---|---|---|---|
model_name | |||
Grid_wd_res | 0.516114 | 0.511152 | {‘batch_size’: 15, ‘optimizer1’: ‘sgd’} |
## Add results to the result dataframe
results1 = results1.append({
'Model': 'Wd_res',
'Best Parameters': grid_results_wd.best_params_,
'Train Score': grid_results_wd.best_score_,
'Test Score': valid_accuracy_score_wd_grid
}, ignore_index=True)
C:\Users\d\AppData\Local\Temp\ipykernel_14000\1131071273.py:2: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. results1 = results1.append({
results1.sort_values('Test Score', ascending=False, inplace=True)
results1
Model | Best Parameters | Train Score | Test Score | |
---|---|---|---|---|
0 | RandomForestClassifier | {‘max_depth’: 10, ‘min_samples_split’: 10, ‘n_… | 0.536877 | 0.530979 |
8 | Mlp_res | {‘epochs’: 50, ‘neuron1’: 150, ‘optimizer’: ‘a… | 0.519678 | 0.521685 |
1 | GradientBoostingClassifier | {‘learning_rate’: 0.1, ‘loss’: ‘deviance’, ‘ma… | 0.530060 | 0.515489 |
2 | XGBClassifier | {‘gamma’: 0, ‘learning_rate’: 0.1, ‘max_depth’… | 0.536722 | 0.511152 |
9 | Wd_res | {‘batch_size’: 15, ‘optimizer1’: ‘sgd’} | 0.516114 | 0.511152 |
3 | DecisionTreeClassifier | {‘criterion’: ‘entropy’, ‘max_depth’: 5, ‘min_… | 0.516735 | 0.508055 |
4 | LogisticRegression | {‘C’: 10, ‘class_weight’: None, ‘fit_intercept… | 0.513480 | 0.507435 |
5 | AdaBoostClassifier | {‘algorithm’: ‘SAMME.R’, ‘learning_rate’: 0.1,… | 0.517199 | 0.503098 |
6 | SVC | {‘C’: 100, ‘degree’: 2, ‘gamma’: ‘auto’, ‘kern… | 0.524327 | 0.502478 |
7 | KNeighborsClassifier | {‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘metric… | 0.495665 | 0.489467 |