1. 程式人生 > 實用技巧 >Android之Activity啟動流程詳解(基於api28)

Android之Activity啟動流程詳解(基於api28)

技術標籤:sklearn

問題描述

  • 官方文件:https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
  • 使用 sklearn.impute 包中的 SimpleImputer 填補缺失值 nan 時,如果輸入的資料是多維(多列)時,當 SimpleImputer 填補缺失值採用“mean”、“median”、“most_frequent”時,是沿著每一列單獨計算每一列的均值、中值、眾數,用每一列的均值、中值、眾數來填補 nan 值,並不是計算整個矩陣的均值、中值、眾數來替換

驗證方法

  • 方法一:直接填補
import numpy as np
from sklearn.impute import SimpleImputer
data_list = [[1,2,3],[2,np.nan,4],[3,4,5],[4,5,np.nan]]
data = np.asarray(data_list)
# 使用均值替換 nan 值
imp_mean = SimpleImputer(missing_values=np.nan, strategy="mean")
imputed_data = imp_mean.fit_transform(data)
print
(imputed_data) print(imputed_data.shape)
  • 輸出如下:
[[1.         2.         3.        ]
 [2.         3.66666667 4.        ]
 [3.         4.         5.        ]
 [4.         5.         4.        ]]
  • 方法二:對每一列單獨計算 “mean”,然後再用每一列的均值填補每一列的 nan 值
import numpy as np
from sklearn.impute import SimpleImputer
data_list  =
[[1,2,3],[2,np.nan,4],[3,4,5],[4,5,np.nan]] data = np.asarray(data_list ) imp_mean = SimpleImputer(missing_values=np.nan, strategy="mean") imputed_data_list = [] for i in range(data.shape[1]): imputed_data_one_column = imp_mean.fit_transform(data[:,i].reshape(-1,1)) # 這裡注意將一列資料轉換為 [1,1] 的二維資料,否則會報錯 print(imputed_data_one_column.shape) imputed_data_list.append(imputed_data_one_column) imputed_data = np.hstack(imputed_data_list) print(imputed_data) print(imputed_data.shape)
  • 輸出如下:
[[1.         2.         3.        ]
 [2.         3.66666667 4.        ]
 [3.         4.         5.        ]
 [4.         5.         4.        ]]
(4, 3)
  • PS:通過例項程式碼可以看出,方法二效果與方法一相同,如果閱讀文件不仔細,就可能會重複造輪子了!