Measuring performance in the interpretation of chest radiographs: a pilot study

AIM: To develop a system to assess the image interpretation performance of radiologists in identifying signs of malignancy on chest radiographs. MATERIALS AND METHODS: A test set of 30 chest radiographs was chosen by an experienced radiologist consisting of 11 normal and 19 abnormal cases. The malignant cases all had biopsyproven pathology; the normal and benign cases all had at least 2 years of imaging follow-up. Fourteen radiologists with a range of experiences were recruited. Participants individually read the test set displayed on a standard reporting workstation, with their findings entered directly into a laptop running specially designed reporting software. For each case, relevant clinical information was given and the reader was asked to mark any perceived abnormality and rate their level of suspicion on a five-point scale (normal, benign, indeterminate, suspicious, or malignant). On completion, participants were given instant feedback with performance parameters including sensitivity and specificity automatically calculated. An opportunity was then given to review the cases together with an expert opinion and pathology. The time each participant took to complete the test was recorded. RESULTS: Six consultant radiologists who took part showed significantly better performance as determined by receiver operating characteristic (ROC) analysis compared to eight specialist registrars (area under the ROC curve [AUC]¼0.9297 and 0.7648 respectively, p¼0.003). There was a significant correlation with years of experience in the interpretation of chest radiographs and performance on the test set (r¼0.573, p¼0.032). Consultant radiologists completed the test significantly more quickly that the specialist registrars: mean time 19.65 minutes compared to 26.51 minutes (p¼0.033). CONCLUSION: It is possible to use a test set to measure individual differences in the interpretation of chest radiographs. This has the potential to be a useful tool in performance testing.