This file contains the supplementary data from the Covidex app. All graphs and some tables are interactive and the reader can explore the data.
First, we present some basic stats from the training and testing datasets.
Classification model | Training date | Sequences | Number of subtypes | Number of trees | mtry | Oob error rate |
Rambaut et al nomenclature | 2021-03-15 | 60362 | 882 | 500 | 350 | 0.0365 |
Classification model | Sequences | Number of subtypes | Error | Multi-class AUC |
Rambaut et al nomenclature | 24411 | 882 | 0.0293 | 0.9472 |
The following graph plots probability vs the number of ambiguous bases for each sequence. As expected, the proportion of wrongly classified sequences (red dots) increases with lower probability values. Also we see a trend towards larger proportion of wrongly classified sequences with the number of ambiguous bases. Â
In the following table evaluation metrics for each class are presented:
Click here to see table captions Sensitivity or Recall: the proportion of true positive cases which were correctly identified. Specificity: the proportion of true negative cases which were correctly identified. Positive Predictive Value (PPV): the proportion of positive cases that were correctly identified. Negative Predictive Value (NPV): the proportion of negative cases that were correctly identified. Precision: the proportion of predicted positive cases which were correctly identified.
F1: the harmonic mean of precision and recall values. Prevalence: the proportion of the total of cases which are actual positive cases.
Detection Rate: the proportion of the total of cases wich are correctly identified positive cases.
Detection Prevalence: the proportion of the total of cases which were predicted as positive cases. Balanced Accuracy: the arithmetic mean of sensitivity and specificity.
In the next heatmap we show the correlation between the expected classification and the obtained classification by Covidex for each class. Overall we find a high correlation value.
Â