Antonio Battistaa, Rosa Alessia Battistab, Federica Battistac, Gerardo
Iovaned,∗, Riccardo Emanuele Landie
aA.O.U. S. Giovanni di Dio e Ruggi d’Aragona, UOC Chir Urg, UOC Laboratorio Analisi,
Salerno, Italy
bIRCCS Ospedale San Raffaele, University Vita Salute S. Raffaele, Milan, Italy
cIRCCS Foundation Policlinico San Matteo, University of Pavia, Pavia, Italy
dDepartment of Computer Science, University of Salerno, Salerno, Italy
eDepartment of Electronics, Information and Bioengineering, Politecnico di Milano, Milan,

Colorectal cancer (CRC) is one of the most common malignancies among the
general population, accounting for 10% of all diagnosed cancers. CRC represents
the third most frequent malignant tumor in men while is even only the second in
women after breast cancer. In 2020 its estimated incidence has been 19:100.0005
worldwide, while the mortality rate represents the 9.4% of all cancer-related
deaths [1]. Over the past decades, the scientific world witnessed a slight decrease
in CRC’s incidence and mortality, supposedly related to both implementations
of mass screening programs and subsequent earlier recognition of the disease
at initial stages. Hemoglobin and DNA alteration detected in stool samples10
along with endoscopy is the most diffused and recommended screening test in
the general population. However, they are burdened by quite low sensitivity
and invasiveness, respectively [2]. For this reason, the eventual development of
new minimally invasive, highly sensitive, and specific approaches has a crucial
role in aiming for recognition of the disease at the earliest stage possible.

Artificial Intelligence introduced relevant perspectives of the solution, espe-
cially through supervised machine learning aimed at approximating unknown
patterns among relevant data. Significant parameters can efficiently and reliably
allow inferences about patients’ health status; pattern recognition in colorectal
cancer diagnosis is in continuously considerable expansion [3, 4, 5, 6].20
In this study, we propose a further improvement of B-index [7], a mathe-
matical tool based on Artificial Neural Networks and extended reality for non-
invasive early colorectal cancer diagnosis. We faced the prediction problem of
cancer presence and staging classification by combining the outcomes provided
by multiple models through an ensemble learning, i.e. the majority voting, ap-25
proach [8]. We performed a comparative analysis of the performances, as binary
and staging predictors, provided by four machine learning models: RF (Ran-
dom Forest) [9], XGB (XGBoost) [10], SVM (Support Vector Machine) [11],
and ANN (Artificial Neural Network) [12].