Taking care of missing data
- Remove rows with missing data
- Replace missing data with some proper value (like mean/ mode of column)
Country | Age | Salary | Purchased |
---|---|---|---|
France | 44 | 72000 | No |
Spain | 27 | 48000 | Yes |
Germany | 30 | 54000 | No |
Spain | 38 | 61000 | No |
Germany | 40 | Yes | |
France | 35 | 58000 | Yes |
Spain | 52000 | No | |
France | 48 | 79000 | Yes |
Germany | 50 | 83000 | No |
France | 37 | 67000 | Yes |
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values = np.nan, strategy='mean')
imputer.fit(data[:, 1:3])
# fitting all rows and all numerical columns
data[:, 1:3] = imputer.transform(data[:, 1:3])
Country | Age | Salary | Purchased |
---|---|---|---|
France | 44 | 72000 | No |
Spain | 27 | 48000 | Yes |
Germany | 30 | 54000 | No |
Spain | 38 | 61000 | No |
Germany | 40 | 63777.77777777778 | Yes |
France | 35 | 58000 | Yes |
Spain | 38.77777777777778 | 52000 | No |
France | 48 | 79000 | Yes |
Germany | 50 | 83000 | No |
France | 37 | 67000 | Yes |