Covidex Supplementary data

Rambaut et al nomenclature

This file contains the supplementary data from the Covidex app. All graphs and some tables are interactive and the reader can explore the data.
First, we present some basic stats from the training and testing datasets.

Classification model Training date Sequences Number of subtypes Number of trees mtry Oob error rate
Rambaut et al nomenclature 2021-03-15 60362 882 500 350 0.0365
Classes were excluded due to contradictions with data supplied by Rambaut et al.
Classification model Sequences Number of subtypes Error Multi-class AUC
Rambaut et al nomenclature 24411 882 0.0293 0.9472

The following graph plots probability vs the number of ambiguous bases for each sequence. As expected, the proportion of wrongly classified sequences (red dots) increases with lower probability values. Also we see a trend towards larger proportion of wrongly classified sequences with the number of ambiguous bases.  

In the following table evaluation metrics for each class are presented:


In the next heatmap we show the correlation between the expected classification and the obtained classification by Covidex for each class. Overall we find a high correlation value.
 

The Precision-Recall curve shows the good performance of the method
Â