Android之Activity啟動流程詳解(基於api28)
阿新 • • 發佈:2020-12-12
技術標籤:sklearn
問題描述
- 官方文件:https://scikit-learn.org/stable/modules/generated/sklearn.impute.SimpleImputer.html#sklearn.impute.SimpleImputer
- 使用 sklearn.impute 包中的 SimpleImputer 填補缺失值 nan 時,如果輸入的資料是多維(多列)時,當 SimpleImputer 填補缺失值採用“mean”、“median”、“most_frequent”時,是沿著每一列單獨計算每一列的均值、中值、眾數,用每一列的均值、中值、眾數來填補 nan 值,並不是計算整個矩陣的均值、中值、眾數來替換
驗證方法
- 方法一:直接填補
import numpy as np
from sklearn.impute import SimpleImputer
data_list = [[1,2,3],[2,np.nan,4],[3,4,5],[4,5,np.nan]]
data = np.asarray(data_list)
# 使用均值替換 nan 值
imp_mean = SimpleImputer(missing_values=np.nan, strategy="mean")
imputed_data = imp_mean.fit_transform(data)
print (imputed_data)
print(imputed_data.shape)
- 輸出如下:
[[1. 2. 3. ]
[2. 3.66666667 4. ]
[3. 4. 5. ]
[4. 5. 4. ]]
- 方法二:對每一列單獨計算 “mean”,然後再用每一列的均值填補每一列的 nan 值
import numpy as np
from sklearn.impute import SimpleImputer
data_list = [[1,2,3],[2,np.nan,4],[3,4,5],[4,5,np.nan]]
data = np.asarray(data_list )
imp_mean = SimpleImputer(missing_values=np.nan, strategy="mean")
imputed_data_list = []
for i in range(data.shape[1]):
imputed_data_one_column = imp_mean.fit_transform(data[:,i].reshape(-1,1)) # 這裡注意將一列資料轉換為 [1,1] 的二維資料,否則會報錯
print(imputed_data_one_column.shape)
imputed_data_list.append(imputed_data_one_column)
imputed_data = np.hstack(imputed_data_list)
print(imputed_data)
print(imputed_data.shape)
- 輸出如下:
[[1. 2. 3. ]
[2. 3.66666667 4. ]
[3. 4. 5. ]
[4. 5. 4. ]]
(4, 3)
- PS:通過例項程式碼可以看出,方法二效果與方法一相同,如果閱讀文件不仔細,就可能會重複造輪子了!