信用評分卡模型在Python中實踐(下)
python信用評分卡建模視訊系列教程(附程式碼) 博主錄製
上一篇已經完成資料集的準備和指標篩選,本篇繼續介紹模型構建和評分卡的建立。
五、模型訓練
信用評分卡的模型一般採用邏輯迴歸模型,屬於二分類模型,Python 中的sklearn.linear_model匯入LogisticRegression即可。
#入模定量和定性指標 model_data = data[np.append(quant_model_vars,qual_model_vars)] # model_data_WOE = pd.DataFrame() model_data_WOE['duration']=duration_WoE model_data_WOE['amount']=amount_WoE model_data_WOE['age']=age_WoE model_data_WOE['installment_rate']=installment_rate_WoE model_data_WOE['status']=status_WoE model_data_WOE['credit_history']=credit_history_WoE model_data_WOE['savings']=savings_WoE model_data_WOE['property']=property_WoE model_data_WOE['employment_duration']=employment_duration_WoE model_data_WOE['purpose']=purpose_WoE #model_data_WOE['credit_risk']=credit_risk #邏輯迴歸 model = LogisticRegression() model.fit(model_data_WOE,credit_risk) coefficients = model.coef_.ravel() intercept = model.intercept_[0]
注:Python中的模型不夠R中模型友好,想看模型的變數、係數、檢驗之類的都比較麻煩,要一個變數一個變數去找,然後輸出列印,反之R的模型結果就友好很多了,一個summary函式就把全部概況顯示出來了。
###########自定義ks函式############# def predict_df(model,data,label,feature=None): if feature: df_feature=data.loc[:,feature] else: all_feature = list(data.columns.values) all_feature.remove(label) df_feature=data.loc[:,all_feature] df_prob=model.predict(df_feature) df_pred=pd.Series(df_prob).map(lambda x:1 if x>0.5 else 0) df=pd.DataFrame() df['predict']=df_pred df['label']=data.loc[:,label].values df['score']=df_prob return df def ks(data,model,label): data_df = predict_df(model,data,label) KS_data = data_df.sort_values(by='score',ascending=True) KS_data['Bad'] = KS_data['label'].cumsum() / KS_data['label'].sum() KS_data['Count'] = np.arange(1 , len(KS_data['label']) + 1) KS_data['Good'] = (KS_data['Count'] - KS_data['label'].cumsum() ) / (len(KS_data['label']) - KS_data['label'].sum()) KS_data.index=KS_data['Count'] ks = KS_data.iloc[::int(len(KS_data)/100),:] ks.index = np.arange(len(ks)) return ks def ks_plot(ks_df): plt.figure(figsize=(6, 5)) plt.subplot(111) plt.plot(ks_df['Bad'], lw=3.5, color='r', label='Bad') # train_ks['Bad'] plt.plot(ks_df['Good'], lw=3.5, color='g', label='Good') # train_ks['Good'] plt.legend(loc=4) plt.grid(True) plt.axis('tight') plt.title('The KS Curve of data') plt.show()
KS(Kolmogorov-Smirnov):KS用於模型風險區分能力進行評估,
指標衡量的是好壞樣本累計分部之間的差值。好壞樣本累計差異越大,KS指標越大,那麼模型的風險區分能力越強,通常來講,KS>0.2即表示模型有較好的預測準確性。經過計算,模型的KS值為0.35,模型效果較好,如下:
六、評分卡
引用文獻的評分卡計算方法:
一般評分卡公式:Score=A - B * log(Odds)
通常情況下,需要設定兩個假設:
(1)給某個特定的比率設定特定的預期分值;
(2)確定比率翻番的分數(PDO)
根據以上的分析,我們首先假設比率為x的特定點的分值為P。則比率為2x的點的分值應該為P+PDO。代入式中,可以得到如下兩個等式:
P = A - B * log(x)
P - PDO = A - B * log(2x)
本文中通過指定特定比率(好壞比)(1/20)的特定分值(50)和比率翻番的分數(10),來計算評分卡的係數alpha和beta
def alpha_beta(basepoints,baseodds,pdo): beta = pdo/math.log(2) alpha = basepoints + beta * math.log(baseodds) return alpha,beta
評分卡公式:Score=6.78- 14.43 * log(Odds)
而,代入WOE轉換後的變數並進行變化,可得到最終的評分卡公式:
式中ωijωij為第i行第j個變數的WOE,為已知變數;βiβi為邏輯迴歸方程中的係數,為已知變數;δijδij為二元變數,表示變數i是否取第j個值。
根據以上表格可計算出指標各分段的分值
#計算基礎分值 basepoint = round(alpha - beta * intercept) #變數_score duration_score = np.round(model_data_WOE['duration']*coefficients[0]*beta) amount_score = np.round(model_data_WOE['amount']*coefficients[1]*beta) age_score = np.round(model_data_WOE['age']*coefficients[2]*beta) installment_rate_score = np.round(model_data_WOE['installment_rate']*coefficients[2]*beta) status_score = np.round(model_data_WOE['status']*coefficients[4]*beta) credit_history_score = np.round(model_data_WOE['credit_history']*coefficients[5]*beta) savings_score = np.round(model_data_WOE['savings']*coefficients[6]*beta) property_score = np.round(model_data_WOE['property']*coefficients[7]*beta) employment_duration_score = np.round(model_data_WOE['employment_duration']*coefficients[8]*beta) purpose_score = np.round(model_data_WOE['purpose']*coefficients[9]*beta) #變數的分值 duration_scoreCard = pd.DataFrame(duration_Cutpoint,duration_score).drop_duplicates() amount_scoreCard = pd.DataFrame(amount_Cutpoint,amount_score).drop_duplicates() age_scoreCard = pd.DataFrame(age_Cutpoint,age_score).drop_duplicates() installment_rate_scoreCard = pd.DataFrame(installment_rate_Cutpoint,installment_rate_score).drop_duplicates() status_scoreCard = pd.DataFrame(np.array(discrete_data['status']),status_score).drop_duplicates() credit_history_scoreCard = pd.DataFrame(np.array(discrete_data['credit_history']),credit_history_score).drop_duplicates() savings_scoreCard = pd.DataFrame(np.array(discrete_data['savings']),savings_score).drop_duplicates() property_scoreCard = pd.DataFrame(np.array(discrete_data['property']),property_score).drop_duplicates() employment_duration_scoreCard = pd.DataFrame(np.array(discrete_data['employment_duration']),employment_duration_score).drop_duplicates() purpose_scoreCard = pd.DataFrame(np.array(discrete_data['purpose']),purpose_score).drop_duplicates()
轉載https://blog.csdn.net/kxiaozhuk/article/details/84612632
至此,信用評分卡的建模介紹到這裡,歡迎學習我的python信用評分卡課程。