Taking care of missing data

  • Remove rows with missing data
  • Replace missing data with some proper value (like mean/ mode of column)
Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 Yes
France 35 58000 Yes
Spain 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy='mean')
imputer.fit(data[:, 1:3]) 
# fitting all rows and all numerical columns 
data[:, 1:3] = imputer.transform(data[:, 1:3])
Country Age Salary Purchased
France 44 72000 No
Spain 27 48000 Yes
Germany 30 54000 No
Spain 38 61000 No
Germany 40 63777.77777777778 Yes
France 35 58000 Yes
Spain 38.77777777777778 52000 No
France 48 79000 Yes
Germany 50 83000 No
France 37 67000 Yes