项目介绍

这次我们要学习的是银行用户流失预测项目,首先先来看看数据,数据分别存放在两个文件中,’Churn-Modelling.csv’里面是训练数据,’Churn-Modelling-Test-Data.csv’里面是测试数据。下面是数据内容:

数据来源于国外匿名化处理后的真实数据

RowNumber:行号
CustomerID:用户编号
Surname:用户姓名
CreditScore:信用分数
Geography:用户所在国家/地区
Gender:用户性别
Age:年龄
Tenure:当了本银行多少年用户
Balance:存贷款情况
NumOfProducts:使用产品数量
HasCrCard:是否有本行信用卡
IsActiveMember:是否活跃用户
EstimatedSalary:估计收入
Exited:是否已流失,这将作为我们的标签数据

首先先载入一些常用模块

import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn import neighbors
from sklearn.metrics import classification_report
from sklearn.neural_network import MLPClassifier
from sklearn.preprocessing import LabelEncoder

然后用numpy读入数据,因为数据中有字符串类型的数据,所以读入数据的时候dtype设置为np.str

train_data = np.genfromtxt('Churn-Modelling.csv' , delimiter=',' , dtype=np.str)
test_data = np.genfromtxt('Churn-Modelling-Test-Data.csv',delimiter=',',dtype=np.str)

数据切分,表头不需要,第0到第倒数第2列为数据,最后1列为标签

x_train = train_data[1:,:-1]
y_train = train_data[1:,-1]
x_test = test_data[1:,:-1]
y_test = test_data[1:,-1]

第0,1,2列数据数据分别为编号,ID,人名,这三个数据对最后的结果应该影响不大,所以可以删除掉。

x_train = np.delete(x_train,[0,1,2],axis=1)
x_test = np.delete(x_test,[0,1,2],axis=1)

删除掉0,1,2列数据后剩下的1,2列数据为国家地区和性别,都是字符型的数据,需要转化为数字类型的数据才能构建模型

labelencoder1 = LabelEncoder()
x_train[:,1] = labelencoder1.fit_transform(x_train[:,1])
x_test[:,1] = labelencoder1.transform(x_test[:,1])
labelencoder2 = LabelEncoder()
x_train[:,2] = labelencoder2.fit_transform(x_train[:,2])
x_test[:,2] = labelencoder2.transform(x_test[:,2])

由于读取数据的时候用的是np.str类型,所以训练模型之前要先把string类型的数据变成float类型

x_train = x_train.astype(np.float32)
x_test = x_test.astype(np.float32)
y_train = y_train.astype(np.float32)
y_test = y_test.astype(np.float32)

然后做数据标准化

sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

构建KNN模型并检验测试集结果

knn = neighbors.KNeighborsClassifier(n_neighbors=5)
knn.fit(x_train, y_train)
predictions = knn.predict(x_test)
print(classification_report(y_test, predictions))

              precision    recall  f1-score   support

         0.0       0.80      0.95      0.87       740

         1.0       0.69      0.33      0.45       260

   micro avg       0.79      0.79      0.79      1000

   macro avg       0.75      0.64      0.66      1000

weighted avg       0.77      0.79      0.76      1000

构建MLP模型并检验测试集结果

mlp = MLPClassifier(hidden_layer_sizes=(20,10) ,max_iter=500)
mlp.fit(x_train,y_train)
predictions = mlp.predict(x_test)
print(classification_report(y_test, predictions))

               precision    recall  f1-score   support

         0.0       0.82      0.96      0.88       740

         1.0       0.77      0.38      0.51       260

         

   micro avg       0.81      0.81      0.81      1000

   macro avg       0.79      0.67      0.70      1000

weighted avg       0.80      0.81      0.79      1000

项目打包

百度网盘
密码:4t6k