Detection of metallic objects on digital radiographs with convolutional neural networks A

Introduction: Screening for metallic implants and foreign bodies before magnetic resonance imaging (MRI) examinations, are crucial for patient safety. History of health are supplied by the patient, a family member, screening of electronic health records or the picture and archive systems (PACS). PACS securely store and transmits digital radiographs (DR) and related reports with patient information. Convolutional neural networks (CNN) can be used to detect metallic objects in DRs stored in PACS. This study evaluates the accuracy of CNNs in the detection of metallic objects on DRs as an MRI screening tool. Methods: The musculoskeletal radiographs (MURA) dataset consisting of 14.863 upper extremity studies were strati ﬁ ed into datasets with and without metal. For each anatomical region: Elbow, ﬁ nger, hand, humerus, forearm, shoulder and wrist we trained and validated CNN algorithms to classify radiographs with and without metal. Algorithm performance was evaluated with area under the receiver-operating curve (AUC), sensitivity, speci ﬁ city, predictive values and accuracies compared with a reference standard of manually labelling. Results: Sensitivities, speci ﬁ cities and area under the ROC-curves (AUC) for the six anatomic regions ranged from 85.33% (95% CI: 78.64% e 90.57%) to 100.00% (95% CI: 98.16% e 100.00%), 75.44% (95% CI: 62.24% e 85.87%) to 93.57% (95% CI: 88.78% e 96.75%) and 0.95 to 0.99, respectively. Conclusion: CNN algorithms classify DRs with metallic objects for six different anatomic regions with near-perfect accuracy. The rapid and iterative capability of the algorithms allows for scalable expansion and as a substitute MRI screening tool for metallic objects. Implications for practice: All CNNs would be able to assist in metal detection of digital radiographs prior to MRI, an substantially decrease screening time. © 2022 The Author(s). Published by Elsevier Ltd on behalf of The College of Radiographers. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).


Introduction
Magnetic resonance imaging (MRI) is described as a safe modality due to the fact that no ionizising radiation is involved.The potential diagnostic benefits of MRI is numerous, but safety restrictions in the MRI environment must be acknowledged and any hazards excluded.One or a combination of three main components of the system: the strong magnetic field, the pulsed gradient, and the pulse radiofrequency might be intrinsic hazards to the patient. 1 Safety can be breached by interactions of these fields with metallic objects, conductors and medical devices.A safe MRI environment is a daily challenge for the staff, who must ensure up to date knowledge of metallic objects within the patient that can create a safety risk.Investigations needs to be conducted through literature as well as considering whether the objects are applicable to the MRI system, or another system, sequence and situation is preferable.It's critical to establish an effective pre-MRI screening policy and procedure of all individuals entering the MRI environment.Biomedical devices and implants found in patients poses a considerable safety concern associated with high electromagnetic field strengths of the MRI. 2 It's crucial to adequately prepare and screen patients for metallic  6 Deep learning, a subset of machine learning, have been succesfully developed as a computer vision and automated identification tool, most notably in the detection of diabetic retinopathy screening. 7Deep learning is well suited to big data medical databases and can be used to extract useful knowledge.The system includes several functions, including quantitative feature analysis and automatic detection of lesions or objects in medical imaging.One breakthrough algorithm in deep learning is the convolutional neural network (CNN), which consists of multiple layers of neuron-like connections, that has achieved significant accuracies in computer vision.The animal visual cortex is simulated by CNNs in the overall learning process 8 and succesfully training a CNN can construct an algorithm with hierarchical information and object oriented image classification for specific purposes.
Large datasets of medical images with manually labelling by health care professionals have played a critical role in advancing the field with convolutional neural network algorithms. 9The musculoskeletal radiograph (MURA) database was introduced in 2017 containing 40,561 images from 14,863 studies, which was manually labelled as normal or abnormal by radiologists. 10The dataset is binary classified with abnormalities such as fractures, hardware, degenerative joint diseases and other miscellaneous abnormalities, including lesions and subluxations.Different approaches of abnormality detection with CNNs have been used for image classification of the MURA dataset, including ensemble algorithms, capsule networks and standard transfer learning CNNs.11e13 Giving the challenges of MRI safety identification risks, we investigated whether a CNN could aid in the immediate detection of metal objects in several anatomical regions.
The purpose of this study is to determine whether a CNN can be trained with limited image data to detect metallic objects on radiographs.

Methods/materials
This study was retrospective and used anonymized data only, and therefore, ethics approval was waived.Permission was granted to the MURA-dataset from the Stanford University School of Medicine by accepting the Terms of Use. 10 Dataset A total of 8.940 radiographs was extracted from the musculoskeletal radiographs (MURA v 1.1 10 ), used to train, validate and test the CNNs, with sample images presented in Fig. 1.Prior to this study, each image was labeled as abnormal or normal by board certified radiologists at the Stanford Hospital.Each radiograph was then labeled as with metal or no metal by an experienced x-ray technologist with 8 years of experience.The dataset contains only upper extremity x-rays where each study contains one or more images based on different radiographic views.The input for the CNN models is a set of images based on each anatomical region, and the output of each model is a binary prediction, where 0 indicates presence of metal and 1 indicates absence of metal.A dataset of the hands was not created, as there were insufficient amount of images with metal.The chest xray consisted from a public chest xray dataset with pseudo-anonymized data, and therefore ethics approval was waived. 14

Algorithm training
This study was image-based classification with CNN models to directly classify the images as either with or without metal. 15Algorithm training will consist of different architectures of CNNs, including parameter-tuning for optimizing the accuracy.Training were conducted with keras v2.4.3 and TensorFlow 2 as backend. 16ix sequential CNN models was developed for each anatomical region and datasets were split into 80% images for training and validation sets and 20% images for test set.All datasets were augmented with random horizontal flipping, random rotation of 1 and images were standardized to values between 0 and 1 for the first layer of their respective models, with example presented in Fig. 2. Hyper-parameters were optimized iteratively, such as learning rate and cycle number.For each model a learning rate of 0,001 were used and the adam optimizer for stochastic optimization was used. 17All code was written and adapted in the Python (version 3.8) programming language. 18A generalized gradientbased visual explanation (Grad-CAMþþ) were used as the visual explanation for detection of metal in elbow radiographs as shown in Figs. 3 and 4. The Grad-CAMþþ uses the binary weighted distribution of the last convolutional layer feature maps of the model to generate a visual explanation for any metal in the radiograph. 19

Evaluation
All images in the test sets were analysed using the final metal detection model resulting in scores representing the likelihood that the DR should be classified with metal or without metal.The area under the curve (AUC) was calculated from performance measures. 20Sensitivities, specificities, predictive values and accuracies were derived and calculated for model performance evaluation.

Results
During the training of the CNN we increased the dataset size for each dataset with horizontal flipping and image rotation, as shown in Fig. 2. Sample predictions of metallic objects on a chest xray and elbow with and without metal are presented in Fig. 3.The algorithm detected metal on the chest xray with 100%, metal in the elbow with 99.99% and no metal with 99.91% certainty.Fig. 4 presents sample predictions with the Grad-CAMþþ visual explanation for the most likely places where the CNN detects metal.The algorithms detected orthopaedic metal in the elbow correctly, and in the chest x ray.However, the chest x ray included unexplainable detection of metal in the upper left corner of the image.Flow of images for each CNN is presented in Fig. 5, and each dataset included different parts of metallic objects in the respective anatomical areas.A minimum of 1.000 images where extracted from the MURA-dataset for each dataset class, which was found acceptable for CNN training.However, a very low collection of images were extracted from the hand-dataset which was insufficient for CNN training.A minimum of 1.000 images is expected for CNN training, but for the sparse datasets (finger and humerus) we achieved acceptable accuracy scores.Dataset CNN performance measures with a training time of 30 min for each CNN is presented in Table 1 and Table 2, respectively.Highest scores on the test sets for each CNN was achieved with the elbow, finger, shoulder and wrist CNNs, which also had acceptable number of images.The CNN diagnostic accuracy measures in Table 2 shows that the test dataset for forearm and humerus was significantly lower than for the other anatomical regions.However, testing on low dataset sizes could potentially imply substantial performance variances.

Discussion
This study used several sequential CNN models to automatically detect metallic objects on a sparse radiographic dataset and achieved a near-perfect accuracy.
The MURA dataset includes a limited amount of radiographs with and without metal which was used for algorithm training.Studies with low dataset sizes have shown accuracies over 80% in detection of diseases on thorax x-rays, distal radius fractures and femur fracture classification.21e23 Varma et al. subsequently investigated the effect of dataset size and algorithm training on the same dataset of lower extremity radiographs.The performance increased from an area under the receiver operating characteristics curve (AUC-ROC) of 0.672 for a dataset size of 1000 images to an AUC-ROC of 0.872 for 50,000 images. 24The authors used a DenseNet-161 algorithm which was pretrained on the ImageNet dataset, and achieved higher performance when the algorithm were pretrained on the ImageNet and MURA dataset.The models of this study achieved encouraging results when considering the very low dataset size for the foream and the humerus with 652 and 680, respectively.However, the high performance could have been achieved because of the binary classification task of metal identification and the relative high hounsfield unit of metal manifestation on x-ray images.Over the last 10 years, several CNN architectures have been developed that allow researchers to test different and more hyperparameters of each architecture.However, complex architectures with more layers has the disadvantage of bigger hardware requirements and longer training times compared to its shorter counterparts. 25As an example, fourteen different neural network architectures were trained on a Covid-19 image dataset with binary classification and achieved training times ranging from 20 min to over 5 h on a specialized workstation with two high end graphic cards. 25Varma et al. found no statistically differences between the performance of three neural networks with varying architecture and depth. 24The simpler structure of the architecture used in our study had a short training time of 30 min with a low end graphics card.The strategy of using a simpler CNN in a metal identification task, is a promising strategy which requires less training time and computational power.Although, applying this CNN to a narrow area (i.e.upper extremity x-rays), the application of this strategy throughout MRI screening are wide-ranging.CNNs have shown to have the ability to classify abstract disease manifestations in a variety of settings achieving higher performance of healthcare professionals. 27urthermore, reducing the average time to interpret images in a dataset compared radiologists (240 min) to a CNN algorithm (1.5 min), which could potentially reduce real-time clinical decision-making in MRI screening. 28The increasing demand for MRI with its superior soft-tissue contrast compared to other medical imaging has contributed to almost 30,000 MRI scanners worldwide.More and more healthcare professionals needs to be trained in MRI safety and protect patients from the potential risks of MRI and be updated on compatibility of medical devices and  implants. 29Image data are plentiful, and allows the CNN technology to be a assisting solution in MRI safety screening in already existing techniques for operative report screening. 30However, orthopaedic implants is generally considered to be safe or conditional with MRI-scanning, and this study only implies that metallic objects can be detected on radiographic datasets.However, future studies should be investigating CNN algorithms detecting potentially dangerous metallic objects or subtype metallic objects as ferrous and non-ferrous objects.
Several limitations needs to be addressed in this study.First, predictions of an algorithm can become very accurate when trained on a limited dataset, making the algorithm overfit and not well suited for new examples.The test set were kept separate from the training process limiting the overfitting bias.Second, images can be classified with metal, when there are features of the image that are not related to the metal within the patient, e.g.radiopaque markers for left and right.One study have shown that a pneumonia detection algorithm relayed on information from radiopaque markers and affected disease prediction. 31Finally, for the classification task, spectrum bias can occur when the dataset does not appropriately represent the range of possible types of patients and diseases.The MURA dataset includes images with a wide range of different metallic implants.Some x-rays in the training data set might not contain a metallic implant that was represented in the test data set and vice versa.
Future studies should investigate algorithm training on medical images of other modalities and medical health journals, with subsequently detection of metallic subtypes.

Conclusion
CNN algorithms classify DRs with metallic objects for six different anatomic regions with near-perfect accuracy.The rapid and iterative capability of the algorithms allows for scalable expansion and as a substitute MRI screening tool for metallic objects.

Figure 1 .
Figure 1.Sample digital radiographs extracted from the MURA-dataset. 100 indicates presence of metal within the patients anatomy. 1 indicates no presence of metal.
Short training time allow to test more hyperparameters and optimizes the overall training process leaving more time for increasing and labelling the amount of training data available.Shorter training times also allows better model training and adaption strategies when new scanners are implemented and more images are added to the training dataset.26

Figure 2 .
Figure 2. Example of data augmentation technique with horizontal flipping and 1 rotation of elbow radiograph.

Figure 3 .
Figure 3. Sample digital radiographs of digital radiographs with metal and without metal within the patients anatomy.

Figure 4 .
Figure 4. Gradient-based visual explanation (Grad-CAMþþ) used as the visual explanation for detection of metal in samples from Fig. 3.

Figure 5 .
Figure 5. Flowchart of digital radiograph extraction from the MURA-dataset for six anatomical regions.
Abbreviations: AI, Artificial Intelligence; AUC, Area under the curve; AUC-ROC, Area under the receiver operating characteristics curve; CNN, Convolutional neural network; DR, Digital radiography; EPJ, Electronic patient journals; MRI, Magnetic Resonance Imaging; MURA, Musculoskeletal radiograph dataset; PACS, Picture and archiving system.bodies, which have been established as best practice for MRI safety screening in The Join Commission and the American College of Radiology (ACR) in their documents "Joint Commission Sentinel Event Alert (SEA)" and "ACR Guidance Document for Safe MR Practices: 2013".3e5 Patients are specifically questioned about metallic objects in a questionnaire sheet, as well as verbally screened by MRI technologists.However, patients, * Corresponding author.Finsensgade 35, 6700, Esbjerg, Denmark.E-mail address: Simon.Lysdahlgaard@rsyd.dk(S.Lysdahlgaard).Contents lists available at ScienceDirectRadiography j o u r n a l h o me p a g e : w w w .el s e v i e r .co m/ l o ca t e / r a d i https://doi.org/10.1016/j.radi.2022.01.001 1078-8174/© 2022 The Author(s).Published by Elsevier Ltd on behalf of The College of Radiographers.This is an open access article under the CC BY license (http:// creativecommons.org/licenses/by/4.0/).foreign

Table 1
Convolutional neural network performance measures for six upper extremity anatomical regions.Area under the receiver operating characteristics curve.CNN ¼ Convolutional neural network.NPV ¼ Negative predictive value.PPV ¼ Positive predictive value.CNN diagnostic accuracy measures.

Table 2
Convolutional neural network diagnostic accuracy measures for six upper extremity anatomical regions.CNN diagnostic accuracy measures Anatomic region True positive False positive False negative True negative CNN ¼ Convolutional neural network.