Model ensembling

Winning a Kaggle Competition in Python

Yauhen Babakhin

Kaggle Grandmaster

ensemble design of a winning solution in the Kaggle competition

Model blending

Test ID	Model A prediction	Model B prediction	Arithmetic mean
1	1.2	1.5	1.35
2	0.1	0.4	0.25
3	5.4	7.2	6.30

$$arithmetic = \frac{1}{n}\sum_{i=1}^{n}{x_i}$$

$$geometric = \Bigg({\prod_{i=1}^{n}{x_i}}\Bigg)^{\frac{1}{n}}$$

Train ID	feature_1	...	feature_N	Target
1	0.55	...	1.37	1
2	0.12	...	-2.50	0
3	0.65	...	3.14	0
4	0.10	...	2.87	1
5	0.54	...	-0.10	0

Test IDs	feature_1	...	feature_N	Target
11	0.49	...	-2.32	?
12	0.32	...	1.15	?
13	0.91	...	0.81	?

Train ID	feature_1	...	feature_N	Target
1	0.55	...	1.37	1
2	0.12	...	-2.50	0
3	0.65	...	3.14	0

Train ID	feature_1	...	feature_N	Target
4	0.10	...	2.87	1
5	0.54	...	-0.10	0

Train ID	feature_1	...	feature_N	Target
1	0.55	...	1.37	1
2	0.12	...	-2.50	0
3	0.65	...	3.14	0

Train ID	feature_1	...	feature_N	Target
4	0.10	...	2.87	1
5	0.54	...	-0.10	0

Train ID	feature_1	...	feature_N	Target	A_pred	B_pred	C_pred
4	0.10	...	2.87	1	0.71	0.52	0.98
5	0.54	...	-0.10	0	0.45	0.32	0.24

Test IDs	feature_1	...	feature_N	Target	A_pred	B_pred	C_pred
11	0.49	...	-2.32	?	0.62	0.45	0.81
12	0.32	...	1.15	?	0.31	0.52	0.41
13	0.91	...	0.81	?	0.74	0.55	0.92

Train ID	Target	A_pred	B_pred	C_pred
4	1	0.71	0.52	0.98
5	0	0.45	0.32	0.24

Test IDs	Target	A_pred	B_pred	C_pred
11	?	0.62	0.45	0.81
12	?	0.31	0.52	0.41
13	?	0.74	0.55	0.92

Train ID	Target	A_pred	B_pred	C_pred
4	1	0.71	0.52	0.98
5	0	0.45	0.32	0.24

Test IDs	Target	A_pred	B_pred	C_pred
11	?	0.62	0.45	0.81
12	?	0.31	0.52	0.41
13	?	0.74	0.55	0.92

Train ID	Target	A_pred	B_pred	C_pred
4	1	0.71	0.52	0.98
5	0	0.45	0.32	0.24

Test IDs	Target	A_pred	B_pred	C_pred	Stacking prediction
11	?	0.62	0.45	0.81	0.73
12	?	0.31	0.52	0.41	0.35
13	?	0.74	0.55	0.92	0.88

Winning a Kaggle Competition in Python