Strojno učenje

1 Priprava podatkov

Izbrane spremenljivke (p < 0.10):

  1. HospitalizationBeforeSurgeryMo
  2. Age
  3. BMI
  4. CardiogenicSchockYN
  5. Diabetes
  6. DiabetesPerOsTherapie
  7. DiabetesOnInsuline
  8. PerifernoArterijskoObolenje
  9. ExtracardiacArteriopathy
  10. PsychoSyndrome
  11. TherapyRelevantPsychoSyndrome
  12. PreoperativeInfectionYN
  13. KongestiveHeartFailure
  14. EjectionFractionEF
  15. EF50
  16. AtrialFibrillationYN
  17. ChronicLungDiseaseYN
  18. CockcraftGaultIndexPreop
  19. ACEInhibitors
  20. IABPPreoperatively
  21. DurationOfTheOperation
  22. NumberOfGrafts
  23. PericardDrainage
  24. RethoracotomyYN
  25. CoagulationDisorder
  26. Cardioversion
  27. SumOtherInfectYN
  28. AcuteKidneyFailure
  29. TotalDrainage
  30. NumberOfPlasmaUnits
  31. Transfusion
  32. MoreThan2UnitsOfErythrocytes
  33. RespiratoryFailureYN
  34. ProlongedMechanicalVentilation
  35. Reintubation
  36. NumberOfReintubations
  37. Tracheotomy
  38. AorticClampingTime
  39. BypassOperationTime
  40. AorticCalcificatio
  41. LeukocytesFirstPostoperativeDa
  42. LeukocytesSecondPostoperativeD
  43. HbPreop12GDl
  44. HbPreoperativelyGDl
  45. PoorGlycemicControlPrediabetes
  46. ITA
  47. BIMA
  48. SaEtAlCreatinine226MgDdlOrPost
  49. TiesselFibrinGlueMl
  50. GFRLaurisProdop60NotNormal
  51. GFRLaurisPostop1stDay60NotNorm
  52. GFRLaurisPostop2ndDay60NotNorm
  53. PleuralEffusion

Target: DSWI01 | Predictors: 53

Izvedeno skaliranje s standardizacijo numeričnih spremenljivk in “one-hot encoding” za kategorijske spremenljivke.

2 Modeli: CV evaluacija na TrainSet (privzete nastavitve)

CV results:

Model CV Accuracy CV F1 CV ROC AUC
LogisticRegression 0.913
[0.906, 0.920]
0.444
[0.397, 0.522]
0.855
[0.814, 0.866]
DecisionTree 0.894
[0.877, 0.901]
0.514
[0.471, 0.552]
0.731
[0.716, 0.759]
RandomForest 0.934
[0.927, 0.938]
0.555
[0.478, 0.603]
0.943
[0.939, 0.960]
GradientBoosting 0.943
[0.929, 0.945]
0.662
[0.551, 0.676]
0.929
[0.885, 0.937]
AdaBoost 0.893
[0.892, 0.893]
0.000
[0.000, 0.000]
0.811
[0.783, 0.852]
XGBoost 0.950
[0.945, 0.958]
0.706
[0.672, 0.761]
0.941
[0.909, 0.950]
LightGBM 0.953
[0.949, 0.963]
0.729
[0.692, 0.792]
0.944
[0.920, 0.963]
SVC 0.933
[0.925, 0.940]
0.550
[0.483, 0.613]
0.894
[0.891, 0.934]
KNN 0.928
[0.925, 0.930]
0.583
[0.567, 0.606]
0.858
[0.847, 0.914]
MLP 0.932
[0.917, 0.937]
0.645
[0.608, 0.695]
0.889
[0.836, 0.930]

CV setup: StratifiedKFold with 5 folds (shuffle=True, random_state=1974).

3 Modeli: CV evaluacija (nastavljeni parametri)

CV results:

Model CV Accuracy CV F1 CV ROC AUC
LogisticRegression 0.786
[0.763, 0.815]
0.423
[0.390, 0.484]
0.857
[0.818, 0.873]
DecisionTree 0.805
[0.767, 0.830]
0.433
[0.417, 0.458]
0.819
[0.785, 0.835]
RandomForest 0.930
[0.922, 0.935]
0.533
[0.422, 0.567]
0.945
[0.939, 0.962]
GradientBoosting 0.956
[0.952, 0.958]
0.759
[0.719, 0.770]
0.942
[0.913, 0.958]
AdaBoost 0.897
[0.893, 0.908]
0.067
[0.065, 0.245]
0.852
[0.806, 0.880]
XGBoost 0.955
[0.950, 0.957]
0.743
[0.710, 0.759]
0.950
[0.918, 0.961]
LightGBM 0.958
[0.953, 0.965]
0.757
[0.729, 0.813]
0.942
[0.909, 0.955]
SVC 0.940
[0.937, 0.950]
0.647
[0.611, 0.710]
0.925
[0.900, 0.932]
KNN 0.934
[0.932, 0.940]
0.602
[0.574, 0.657]
0.910
[0.882, 0.927]
MLP 0.932
[0.917, 0.937]
0.645
[0.608, 0.695]
0.889
[0.836, 0.930]

CV setup: StratifiedKFold with 5 folds (shuffle=True, random_state=1974).

3.1 Shranimo optimalne modele

Shranjeni modeli: 10