1. Introduction
The production of high-performance PCBs involves complex technologies and manufacturing processes, which often lead to defects such as short circuits and open circuits seriously affecting the performance and lifespan of electronic products. Therefore, developing efficient and low miss detection rate PCB defect detection technology is particularly crucial. Indeed, PCB defect detection still faces the following challenges: (1) most defects on PCBs are small targets that are hard to identify by the human eye; (2) semantic information of photo images is limited; and (3) the existing PCB samples are poorly targeted. Traditional PCB defect detection relies on manual visual inspection [1], optical inspection [2], and electrical inspection [3]. However, while manual visual inspection is low-cost, it is prone to subjective factors, inefficient, and struggles to identify microscopic defects. Optical inspection, on the other hand, offers speed and accuracy but comes with the drawback of expensive equipment and complex maintenance. Electrical inspection, being cost-effective and capable of detecting electrical performance defects, is constrained by the complexity of test fixtures, programming, and debugging, and has limited recognition of non-electrical defects. Therefore, deep learning-based PCB defect detection, with its high precision, efficiency, and scalability, has significantly improved the accuracy and speed of detection, making it an indispensable advanced detection technology in modern electronics manufacturing.
In recent years, Deep Learning (DL) has attracted much attention due to its ability in feature extraction and generalization, becoming a promising choice for target defect detection. Currently, DL-based object detection algorithms are mainly divided into two categories: two-stage detection algorithms [4] and one-stage detection algorithms [5]. Two-stage detection algorithms adopt a two-step strategy, which first generates candidate regions and then classifies these regions. They are renowned for their high detection accuracy and low miss rate, but suffer from a slow detection speed, computational complexity, and low timeliness. In contrast, one-stage detection algorithms adopt an integrated strategy that combines feature extraction from candidate regions with the localization of predicted bounding boxes, directly classifying and locating objects [6]. Typical algorithms include “You Only Look Once” (YOLO) series algorithms (YOLOv3, YOLOv4, YOLOv5). They are characterized by a fast detection speed and low computational complexity, meeting the demand for real-time detection. To further improve detection accuracy, Li [7] extended the original YOLOv3 architecture by combining virtual and real PCB image datasets, and improved detection accuracy by adding additional output layers to the original YOLOv3 architecture. However, this method has limited accuracy in identifying small-target defects and fails to achieve a balance between speed and precision. Wu [8] proposed GSC-YOLOv5, a DL-based detection method combining a lightweight network and dual attention mechanisms to effectively address small-target detection issues; however, the attention mechanism introduced was complex and slow. Xia [9] introduced contextual attention into the YOLOv5s model, significantly reducing missed and false detections in small targets and complex background textures by implementing compression training methods and integrating micro detection heads. Although this method improves accuracy, its compression method leads to a longer training time, which affects the efficiency of the model. Addressing the problems of low efficiency and accuracy in PCB defect classification technology, Xiong [10] proposed using YOLOv8 for PCB defect detection, which significantly improved the prediction accuracy of the model. However, this method does not achieve a balance between accuracy and real-time performance, and its performance in resource-constrained scenarios is not satisfactory. Long [11] introduced an algorithm for PCB defect detection based on an improved YOLOv8n model. By incorporating an advanced neck network structure and enhancing the model’s capability for multi-scale feature fusion, this algorithm manages to reduce the computational complexity of the model, making it more adept at detecting small objects. Although this method improves both detection speed and accuracy, it has not yet achieved a good balance between the two, and the detection results are prone to being affected by complex environments. The YOLO series algorithms indeed have certain advantages in terms of detection accuracy and speed compared to other algorithms. However, these improved algorithms, including YOLOv8, still face issues such as lower recognition accuracy for small objects, difficulty in balancing speed and accuracy, and inconvenience in deployment.
To address the issues of small ( pixel) defect defection, this paper proposes a lightweight PCB defect detection method based on a one-stage algorithm YOLOv8n. This method targets the requirement for lightweight embedding in portable devices through the following innovations and improvements:
- (1)
Embedding SCConv into the C2f structure of the backbone network to reduce redundant computations while enhancing the model’s learning capability.
- (2)
Integrating an adaptive feature selection module to improve the network’s ability to recognize small PCB defect targets.
- (3)
Replacing the original Decoupled Head with a self-developed lightweight shared convolutional head to reduce the model’s computational complexity and increase detection accuracy.
- (4)
Introducing the WIoU loss function to provide more precise evaluation results and enhance generalization capability.
2. Methodology
YOLOv8 (You Only Look Once version 8) [12] represents the latest iteration of the YOLO object detection and image segmentation model series. The YOLOv8 series offers a range of models to cater to diverse scenarios and requirements. Among them, YOLOv8n stands out as the most computationally efficient and fastest in inference speed, making it particularly suitable for deployment on resource-constrained devices. Its structure is shown in Figure 1. It consists of three components: Backbone, Neck, and Head. The Backbone module incorporates Conv, C2f, and Spatial Pyramid Pooling Fast (SPPF) components, which, respectively, enhance nonlinear expression, feature propagation, and multi-scale processing capabilities. The Neck part employs a Path Aggregation Network (PAN) [13] structure, combining the Feature Pyramid Network (FPN) [14] and PAN to achieve efficient feature fusion. In the Head module, YOLOv8n uses a Decoupled Head design to separately handle localization and classification tasks, improving prediction accuracy and efficiency, simplifying the model structure, and enhancing computational speed. Regarding the loss function, YOLOv8n employs Complete Intersection over Union loss (CIoU) combining IoU [15] and the distance between central points to measure more accurately the difference between the predicted and ground truth boxes, thereby improving localization accuracy.
When implementing PCB defect detection on mobile devices, a lightweight model is crucial to address the limited computational resources. Although YOLOv8n excels in object detection, its model size remains relatively large for embedded devices or real-time processing scenarios, necessitating further optimization. Additionally, the structural complexity of PCBs and the issue of defects at different scales pose challenges for high-precision detection with one-stage networks. Therefore, the improvements to the YOLOv8n algorithm in this paper are as follows: Firstly, SCConv is integrated into the C2f structure of the backbone network of the original model. This enhancement reduces the computational load while improving the model’s learning capability. Secondly, an adaptive feature selection module is incorporated into the model to enhance the network’s ability to detect small PCB defect targets. Thirdly, a self-developed lightweight shared convolution head is used to replace the original Decoupled Head, reducing complex computations and improving the detection accuracy of the network. Lastly, the WIoU loss function is introduced to provide refined evaluation results and enhance generalization capability. The network structure of the improved model is shown in Figure 2:
2.1. Spatial and Channel Reconstruction Convolution
The YOLOv8n network structure includes numerous C2f modules, whose core function is to learn residual features. Consequently, the overall performance of the network is closely linked to the performance of the C2f modules in feature learning, given that PCB surface defects exhibit significant varieties in shape, location, and size—especially notches, multi-solder, and broken circuit defects. In this context, the original C2f module is insufficient for extracting PCB surface defect features, resulting in a substantial amount of redundant computation. The SCConv [16] module significantly enhances the operational efficiency and performance of the model. Its working mechanism leverages the effective utilization of spatial and channel redundancy between feature maps for CNN compression. This approach reduces redundant computations and enables the model to learn more representative features. The SCConv module comprises two core units: the Spatial Reconstruction Unit (SRU) and the Channel Reconstruction Unit (CRU). These units work in tandem to process feature maps efficiently. The overall structure, as shown in Figure 3, illustrates the interaction between the SRU and CRU and their crucial role in enhancing model performance. This improvement allows the model to achieve higher efficiency and accuracy in processing complex tasks. To enhance the network’s learning efficiency and integrate multi-dimensional feature information, this paper proposes a novel module, C2f_SCConv. Its structure is shown in Figure 3.
The C2f_SCConv structure replaces all the Bottleneck modules in the original C2f module with the more efficient Bottleneck_SCConv module. The Bottleneck_SCConv module improves the original Bottleneck module by substituting the Convmodule with the SCConv structure. This enhancement not only reduces redundant computations but also significantly improves the model’s ability to learn PCB surface defect features.
The SCConv structure is illustrated in Figure 4. For the intermediate input feature in the bottleneck residual block, the spatial refinement feature is first obtained by the SRU, and then the channel refinement feature is obtained by using the CRU. In the SCConv module, all the parameters are centralized in the transformation phase. Therefore, a reduction in theoretical memory usage is analyzed. The parameter for the standard convolution can be calculated as:
The number of parameters of the SCConv module can be calculated as:
where is the convolution kernel size; and are the number of input and output feature channels; denotes the segmentation ratio; denotes the squeezing ratio; and is the group size of the GWC operation. In the experiments, the general parameter set is , , , , and the number of parameters is reduced by a factor of 5 for . Therefore, this model outperforms the standard convolution.
Thus, applying the improved C2f_SCConv to YOLOv8n enhances the feature extraction capability of the model and it provides a fast inference speed while maintaining high accuracy.
2.2. Adaptive Feature Selection Module Integration
Due to the diverse shapes and small sizes of PCB surface defects, recognizing and classifying PCB defects during the inspection process required reliance on a significant amount of local information. Consequently, this paper embedded an efficient adaptive feature selection module, namely Triplet Attention [17], into the Backbone part of YOLOv8n. Its structure is shown in Figure 5.
Triplet Attention is capable of capturing interaction information between different dimensions, reducing the interference of irrelevant information, enabling the model to focus more on the extraction of target features, and improving recognition accuracy. It is a quasi-parameter-free attention mechanism that reduces the loss of spatial information by capturing the interaction between the spatial dimension and the channel dimension of the input tensor. It consists of three parallel branches, with the input tensor simultaneously entering into these three branches. In the first branch, tensor is rotated 90° counterclockwise around axis , yielding a rotated tensor . After Z-pooling, the tensor’s shape becomes . Then, through convolutional operations and the sigmoid activation function, attention weights are generated. Finally, it is rotated 90° clockwise around axis to output tensor and similar operations are applied to tensor around axis to produce a rotated tensor and an output tensor . In the third branch, tensor is successively transformed by a pooling layer, a convolution, and a sigmoid activation function, outputting tensor . Finally, the three tensors are aggregated by averaging, and the final output tensor is:
In this equation, represents the sigmoid activation function, and represent standard convolutions.
2.3. Introduction of Shared Lightweight Convolutional Heads
YOLOv8 has the most changes in its Head section compared to YOLOv5, as shown in Figure 6.
YOLOv8 introduces a new head structure diverging from YOLOv5’s Anchor-Based coupling to an Anchor-Free decoupling design. This innovation separates classification and detection tasks entirely, eliminating the traditional objectiveness branch in favor of a decoupled classification and regression approach employing Distribution Focal Loss (DFL). To optimize model efficiency without compromising accuracy, this paper proposes the Shared Lightweight Convolutional Detection (SLCD) Head, integrated into YOLOv8n’s Detect layer, as depicted in Figure 7.
The network receives output feature maps (P3, P4, P5) from YOLOv8n at three different stages and scales. Each feature map then undergoes 1 × 1 convolution with Group Normalization (GN) [18] with a stride of 1. This convolution compresses the number of channels of the feature map to mix features before further processing, preserving the spatial dimensions of the feature maps. This allows features from three different scales to undergo shared convolutional operations. Following the initial 1 × 1 convolution, the processed features pass through a shared convolutional layer for additional feature processing. This layer consists of two 3 × 3 Convolutional layers with GN (Conv_GN). The first Conv_GN extracts spatial features while maintaining the receptive field size. The processed feature maps then pass through two 3 × 3 Conv_GN layers for deeper feature extraction. Subsequently, each feature map stream splits into two paths: Conv_Reg for bounding box regression (estimating object locations) and Conv_Cls for category prediction (object classification). It is noteworthy that both Conv_Reg and Conv_Cls utilize shared convolutional designs, further enhancing model efficiency. Finally, the Scale layer scales the output of the Conv_Reg layer in the detection branch. This layer defines a learnable scaling factor applied to the output feature map in an element-wise manner, adjusting feature magnitudes and subsequently impacting subsequent convolutional layers. This adjustment stabilizes regression outputs, thereby improving bounding box prediction accuracy during training.
Furthermore, this paper proposes an optimization strategy for weight sharing among convolutional layers in the Detect layer for classification and regression operations. The original YOLOv8 Head layer design, shown in Figure 8, processes features at three different scales (P3, P4, and P5) through independent branches. Each branch includes two 3 × 3 convolutional layers and one 1 × 1 convolutional layer. While this design enhances detection accuracy, it also significantly increases the number of convolutional layer parameters.
To address the aforementioned issue, this paper introduces the concept of shared convolution in the detection head, illustrated in Figure 9. Specifically, the original design disperses two 3 × 3 convolutional layers across multiple branches, along with convolutional layers for classification and regression, which are now consolidated for shared usage. This sharing mechanism significantly reduces model parameters, lowers complexity, and is anticipated to enhance computational efficiency.
To effectively address the challenge of gradient vanishing that may arise during network training, batch normalization techniques are often employed in network architecture design. By transforming the input data to approximate a standard normal distribution, batch normalization enhances the sensitivity of nonlinear functions to input data. However, the effectiveness of batch normalization is significantly impacted by the batch size. Especially in cases of small batches, the application of batch normalization may lead to increased errors in the output results, thereby affecting the model’s performance. To overcome the influence of batch size on normalization effects, this paper proposes an innovative approach: replacing the original Batch Normalization (BN) in the SLCD convolutional layer with Group Normalization (GN). GN is a more robust normalization method that maintains stable performance across different batch sizes. By adopting GN, we can avoid excessive influence of batch size selection on normalization effects, ensuring that the model performs well in various scenarios. This improvement not only helps to enhance the model’s convergence speed and prediction accuracy but also strengthens its robustness and generalization capabilities. GN is a normalization technique utilized in convolutional neural networks. Unlike Batch Normalization (BN), G = groups channels and normalizes within each group, thereby mitigating the impact of batch size variations on model performance. GN computations are independent of batch size, ensuring stability even with small batch sizes, particularly for high-precision images. e calculates the mean and variance within each channel group, effectively reducing noise. The normalization process of GN is implemented as shown in Equation (4).
where is an input feature, is the mean computed within each group, is the variance computed within each group, and is a local minimum. Specifically, if we have a feature x shaped as [N, C, H, W], where N is the batch size, C is the number of channels, and H and W are the spatial dimensions, first, the C dimension is divided into G groups with C/G channels each, and then the features within each are normalized.
In the literature [19], GN has been demonstrated to enhance the performance of localization and classification in the detection head as illustrated in the FCOS paper. Data in Table 5 of the literature [19] indicates that removing GN from classification and regression results in a 1% decrease in model accuracy. Therefore, to uphold the detection performance of the detection head, the GN module is integrated into all convolutional layers of the SLCD. The structure of these convolutional layers is depicted in Figure 10. From the figure, it can be seen that the original batch normalization in the Conv structure has been replaced by GN, resulting in a Conv_GN structure. This enhancement not only accelerates the convergence speed of model training but also improves detection accuracy.
2.4. WIoU Loss Function
YOLOv8n utilizes the CIoU loss function for regression, which considers three geometric factors: overlap area, distance from the center point, and aspect ratio. While the CIoU loss function is more comprehensive compared to previous methods, it neglects directional discrepancies between real and predicted frames. This oversight can cause predicted frames to oscillate during model training, leading to slower convergence and reduced efficiency. To address biases introduced by the traditional IoU loss function in evaluation, the WIoU [20] loss function is introduced. When integrated with YOLOv8 for PCB defect detection, WIoU optimizes the loss function by incorporating weight factors to adjust the IoU score, enabling a more precise evaluation of the overlap between predicted and ground truth bounding boxes. This enhancement subsequently boosts the model’s detection accuracy and recall rate for minute and complex defects on PCBs, ultimately strengthening the performance and reliability of the entire detection system.
3. Experiments and Analysis
3.1. Experimental Environment and Dataset
The DL configuration is Win10 OS, AMD Ryzen 7945HX CPU with 2.5 GHz base frequency (AMD, Santa Clara, CA, USA), NVIDIA GeForce RTX 4060 with 8 GB video memory (NVIDIA, Santa Clara, CA, USA), version 3.8 Python, version 2.2.0 PyTorch, and version 11.8 Cuda.
The network was trained and tested using a publicly available dataset from the Intelligent Robotics Laboratory of Peking University, which contains 684 images of PCB defects and the corresponding annotation files. The dataset was randomly divided into training, validation, and test sets in an 8:1:1 ratio. As depicted in Figure 11, the primary defect types include Mouse_bite, Missing_hole, Open-circuit, Short, Spurious_copper, and Spur. Due to the dataset’s small size, data augmentation technology was applied to extend to 7316 defect images. Techniques such as image flipping, rotating, filtering, and denoising were employed to mitigate the risk of overfitting. Furthermore, this study leveraged transfer learning and stochastic gradient descent (SGD) based on the YOLOv8n algorithm pre-trained on the VOC dataset. The training process utilized a dynamic learning rate strategy (Multi-Step-LR), initializing the learning rate appropriately, adjusting the gradient dynamically to 0.2, setting a batch size of 16, and conducting 150 iterations. These measures were implemented to enhance the model’s generalization ability and training efficiency.
3.2. Evaluation Indicators
For model verification defect detection, Precision (P), recall rate, Average Precision (AP), and mAP are introduced for this analysis. In the following equations, TP stands for True Positives, which refers to the number of positive instances that are correctly predicted by the model as belonging to the positive class. FP stands for False Positives, which refers to the number of negative instances that are incorrectly predicted by the model as belonging to the positive class. TN stands for True Negatives, which refers to the number of negative instances that are correctly predicted by the model as belonging to the negative class. FN stands for False Negatives, which refers to the number of positive instances that are incorrectly predicted by the model as belonging to the negative class.
Precision: The Precision metric is used to measure the proportion of samples predicted by the model as belonging to the positive class (which is typically the class of interest) that actually belong to the positive class. Specifically, Precision gauges the “precision” of the model, which is the extent to which the samples predicted as positive by the model are truly positive. In other words, Precision quantifies how many of the samples predicted as positive by the model are indeed correctly classified, as shown in Equation (5).
A higher rate indicates a higher recognition rate of the model.
Recall rate: It measures the proportion of target objects that the model is able to correctly identify out of all the actual target objects. Specifically, Recall gauges the “recall” of the model, which is the extent to which the samples that are actually positive are predicted as positive by the model, and is calculated as in Equation (6).
A positive recall rate implies low likelihood of producing a missed detection.
AP: It measures the average of the model’s detection precision across different levels of recall. This metric is obtained by plotting the precision–recall curve for each category and calculating the area under this curve. It is calculated as in Equation (7).
mAP: it measures the average detection performance of a model across all categories by averaging the AP values of each individual category, resulting in a single metric that reflects the overall detection performance of the model, as in Equation (8).
where represents the number of AP values taken. represents the total number of categories, and represents the value for the -th category.
3.3. Ablation Experiment
The training set serves to optimize the model’s parameters. During the training process, the difference between the input data and the corresponding labels before and after the model training is quantified by computing the loss function. The aim is to perform parameter optimization by minimizing this loss function. The loss curve of the training set provides insights into the model’s convergence speed and robustness. As depicted in Figure 12a, a comparison is made between the training set’s loss functions before and after the implementation of the improved algorithm proposed in this paper. The validation set is used for performance evaluation and parameter tuning at the end of the epoch to prevent overfitting. Here, the parameter tuning is mainly through hyperparameters, by choosing the best hyperparameter values to improve the performance such as generalization ability. The loss function example of its validation set is shown in Figure 12b. The YOLOv8n algorithm switches to the WIoU loss function, which converges faster during the training process and the reduction in the total loss is more obvious.
In order to assess the impact of the improvements made in the modules in this paper on the overall model performance, mAP was selected as the evaluation index of the model detection effect. A total of nine control groups were set up. The Precision–Recall Curve (P-R Curve) graphs before and after the improvement of YOLOv8n are shown in Figure 13; it can be seen that the mAP values of each classification target have been improved.
Figure 14 below show the comparison curves of Recall, mAP, and Precision before and after the improvement of YOLOv8n. From these figures, it can be observed that all the models have converged, and the performance of the improved indicators has been enhanced.
A comprehensive analysis of the model’s boosting effect was carried out and the results obtained from its training are shown in Table 1. As can be seen from Table 1, firstly, Experiment I achieves an mAP of 94.8% with the original YOLOv8n model. Experiment II embeds SCConv in the C2f structure, which improves the mAP by 1.6% while reducing the computational effort. Experiment III replaces the detection head with SLCD, which reduces the computation by 25.6% while ensuring the mAP. Next, experiment Ⅳ introduces the Triplet structure, which improves the mAP by 1.6%, and experiment Ⅴ uses WIoU instead of the original loss function, which improves the mAP by 2.8%. Experiment IX is the improved method proposed in this paper, and it can be seen that the detection is effectively improved: the recall rate reaches 96.5%, the mAP is able to reach 98.6%, and the precision P reaches 99.8%, which are improvements of 4.4%, 3.8%, and 3.1%, respectively, compared with the original model.
3.4. Comparison of Performance Analysis Results of Various Algorithms
In order to verify the advantages of the selected detector head, this paper selected some commonly used ones to improve the original YOLOv8n model; the experimental results are shown in Table 2, from which it can be seen that the detector head designed in this paper compared to other ones maintains a high accuracy while reducing the amount of model calculation.
In order to verify the advantages of the improved algorithms, other improved algorithms in the literature were selected to compare with the Improved_YOLOv5s model in this paper. The average accuracy comparison results of various algorithms are shown inTable 3. Xiao [21] proposed a PCB defect detection algorithm based on CDI-YOLO. The YOLOv7 is improved by introducing the CA (Coordinate Attention) mechanism, DSConv (Depthwise Separable Convolution), and Inner-CIoU; Zhou [22] proposed a PCB defect detection algorithm based on MSD-YOLOv5. On the basis of YOLOv5, the lightweight MobileNet-v3 network is combined with the CSPDarknet53 network. Additionally, an attention mechanism is introduced to highlight important feature channels. Finally, the coupled detection head is replaced with a decoupled detection head. Yuan [23] proposed a lightweight LW-YOLO model based on YOLOv8. On the basis of YOLOv8, a bidirectional feature pyramid network for multi-scale feature fusion, a partial convolution module for reducing redundant computations, and a minimum point distance intersection over union loss function for simplifying optimization and improving accuracy are integrated. Tang [24] proposed an improved PCB surface defect detection algorithm PCB-YOLO based on YOLOv5. Based on YOLOv5, the K-means++ algorithm is incorporated, and the Swin transformer is embedded into the backbone network to construct a joint attention mechanism. Finally, DSConv is introduced to achieve model size compression. Du [25] proposed an enhanced YOLOv5s network named YOLO-MBBi to detect PCB surface defects. On the basis of YOLOv5, we aim to replace layers in the YOLOv5s network with Mobile Inverted Residual Bottleneck (MBConv) blocks, Convolutional Block Attention Module (CBAM) attention, Bidirectional Feature Pyramid Network (BiFPN), and Depthwise Convolutions (DWConv). Furthermore, during training, we intend to substitute the CIoU loss function with the SIoU loss function. Moreover, in this paper, the latest YOLOv7n and YOLOv8n are selected for experiments. From Table 3, it is seen that the improved model in this paper improves the mAP by 2.8%, 2.1%, 1.3%, 3.1%, 2.2%, 3.5%, 3.3%, and 3.8% compared to other algorithms. The model volume is 4.1 M and FPS is 144.1, which meets the requirements of real-time as well as lightweight portable deployment.
In order to evaluate the impact of the improved algorithm in this paper on the overall model performance, the COCO_2017 dataset was selected to conduct a comparative experiment before and after the algorithm improvement. The experiment was used to verify the detection ability of the model for targets of various sizes. The experimental results are shown in Table 4, indicating that the improved model still has good detection performance for targets of various sizes.
The detection effects before and after the algorithm improvement are shown in Figure 15. A validation set of six defects was selected and tested on the optimized algorithm. From Figure 15a, it can be observed that the pre-improved model missed some defects on the PCB and had a lower detection accuracy for defects such as mouse_bite, spur, open_circuit, etc. In contrast, Figure 15b demonstrates that the improved model was able to better detect the location of defects, achieving a detection accuracy ranging from 90% to 100% for all six types of defects. In summary, the improved model is proven to be effective.
4. Conclusions
To address the need for replacing traditional machine detection methods with deep learning, this paper proposes a PCB defect detection algorithm based on YOLOv8n to solve the issues of missed and false detections caused by numerous tiny objects and complex background textures in PCBs, as well as the difficulty of embedding large-sized models into portable devices. Initially, the SCConv, a spatial and channel reconstruction convolution, is integrated into the C2f structure of the backbone network. This enhancement reduces redundant computations, thereby improving the model’s learning efficiency. Subsequently, an adaptive feature selection module is incorporated to enhance the network’s capability in recognizing micro-target defects on PCBs. Furthermore, the original Decoupled Head is replaced with a Lightweight Shared Convolution Head. This modification aims to streamline computational complexity and enhance detection accuracy. Finally, the WIoU loss function is introduced to refine evaluation metrics and bolster generalization capabilities. The proposed algorithm achieves a detection accuracy of 98.6% on an image-enhanced PCB dataset, outperforming other algorithms. Moreover, the network model size is optimized to 4.1 M, and the detection speed indicator FPS is 144.1, aligning it more closely with the requirements of industrial inspection.
Author Contributions
J.A.: Conceptualization, methodology, validation, writing—original draft; Z.S.: Resources, conceptualization, writing—review and editing, project administration. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study are openly available in HRI Lab PKU-Market-PCB series data set at Open Laboratory of Intelligent Robots, Peking University (https://pku.edu.cn/).
Conflicts of Interest
On behalf of all authors, the corresponding author states that there are no conflicts of interest.
References
- Lin, S. Research on Automatic Inspection System of Printed Circuit Board Based on Computer Vision. J. Phys. Conf. Ser. 2021, 1861, 012093. [Google Scholar] [CrossRef]
- Mahon, J. Automatic 3-D inspection of solder paste on surface mount printed circuit boards. J. Mater. Process. Technol. 1991, 26, 245–256. [Google Scholar] [CrossRef]
- Liu, G.; Wen, H. Printed circuit board defect detection based on MobileNet-Yolo-Fast. J. Electron. Imaging 2021, 30, 043004. [Google Scholar] [CrossRef]
- Qiang, W.; Ziyu, L.; Dejun, Z.; Wankou, Y. LiDAR-only 3D object detection based on spatial context. J. Vis. Commun. Image Represent. 2023, 93, 103805. [Google Scholar]
- Jianchen, H.; Jun, C.; Han, W. A lightweight and efficient one-stage detection framework. Comput. Electr. Eng. 2023, 105, 108520. [Google Scholar]
- Nikhil, K.; Pravendra, S. Small and Dim Target Detection in IR Imagery: A Review. arXiv 2023, arXiv:2311.16346. [Google Scholar]
- Li, J.; Gu, J.; Huang, Z.; Wen, J. Application Research of Improved YOLO V3 Algorithm in PCB Electronic Component Detection. Appl. Sci. 2019, 9, 3750. [Google Scholar] [CrossRef]
- Wu, L.; Zhang, L.; Zhou, Q. Printed Circuit Board Quality Detection Method Integrating Lightweight Network and Dual Attention Mechanism. IEEE Access 2022, 10, 87617–87629. [Google Scholar] [CrossRef]
- Xia, K.; Lv, Z.; Liu, K.; Lu, Z.; Zhou, C.; Zhu, H.; Chen, X. Global contextual attention augmented YOLO with ConvMixer prediction heads for PCB surface defect detection. Sci. Rep. 2023, 13, 9805. [Google Scholar] [CrossRef]
- Xiong, Z. A Design of Bare Printed Circuit Board Defect Detection System Based on YOLOv8. Highlights Sci. Eng. Technol. 2023, 57, 203–209. [Google Scholar] [CrossRef]
- Long, Y.; Li, Z.; Cai, Y.; Zhang, R.; Shen, K. PCB Defect Detection Algorithm Based on Improved YOLOv8. Acad. J. Sci. Technol. 2023, 7, 297–304. [Google Scholar] [CrossRef]
- Joseph, R.; Ali, F. YOLO9000: Better, Faster, Stronger. arXiv 2016, arXiv:1612.08242. [Google Scholar]
- Shu, L.; Lu, Q.; Haifang, Q.; Jianping, S.; Jiaya, J. Path Aggregation Network for Instance Segmentation. arXiv 2018, arXiv:1803.01534. [Google Scholar]
- Wang, C.; Zhong, C. Adaptive Feature Pyramid Networks for Object Detection. IEEE Access 2021, 9, 107024–107032. [Google Scholar] [CrossRef]
- Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020. [Google Scholar]
- He, J.; Zhang, S.; Yang, C.; Wang, H.; Gao, J.; Huang, W.; Wang, Q.; Wang, X.; Yuan, W.; Wu, Y.; et al. Pest recognition in microstates state: An improvement of YOLOv7 based on Spatial and Channel Reconstruction Convolution for feature redundancy and vision transformer with Bi-Level Routing Attention. Front. Plant Sci. 2024, 15, 1327237. [Google Scholar] [CrossRef]
- Diganta, M.; Trikay, N.; Ajay Uppili, A.; Qibin, H. Rotate to Attend: Convolutional Triplet Attention Module. arXiv 2020, arXiv:2010.03045. [Google Scholar]
- Yi, D.; Ahmedov, H.B.; Jiang, S.; Li, Y.; Flinn, S.J.; Fernandes, P.G. Coordinate-Aware Mask R-CNN with Group Normalization: A underwater marine animal instance segmentation framework. Neurocomputing 2024, 583, 127488. [Google Scholar] [CrossRef]
- Zhi, T.; Chunhua, S.; Hao, C.; Tong, H. FCOS: A Simple and Strong Anchor-Free Object Detector. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1922–1933. [Google Scholar] [CrossRef]
- Zanjia, T.; Yuhang, C.; Zewei, X.; Rong, Y. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023, arXiv:2301.10051. [Google Scholar]
- Xiao, G.; Hou, S.; Zhou, H. PCB defect detection algorithm based on, CDI-YOLO. Sci. Rep. 2024, 14, 7351. [Google Scholar] [CrossRef]
- Zhou, G.; Yu, L.; Su, Y.; Xu, B.; Zhou, G. Lightweight PCB defect detection algorithm based on MSD-YOLO. Clust. Comput. 2023, 27, 3559–3573. [Google Scholar] [CrossRef]
- Yuan, Z.; Tang, X.; Ning, H.; Yang, Z. LW-YOLO: Lightweight Deep Learning Model for Fast and Precise Defect Detection in Printed Circuit Boards. Symmetry 2024, 16, 418. [Google Scholar] [CrossRef]
- Tang, J.; Liu, S.; Zhao, D.; Tang, L.; Zou, W.; Zheng, B. PCB-YOLO: An Improved Detection Algorithm of PCB Surface Defects Based on YOLOv5. Sustainability 2023, 15, 5963. [Google Scholar] [CrossRef]
- Du, B.; Wan, F.; Lei, G.; Xu, L.; Xu, C.; Xiong, Y. YOLO-MBBi: PCB Surface Defect Detection Method Based on Enhanced YOLOv5. Electronics 2023, 12, 2821. [Google Scholar] [CrossRef]
Figure 1. YOLOv8n network. P1–P5 represent the changes in feature scales extracted by the backbone network; X3–X5 are the output features; Upsample refers to upsampling; C2f is the feature extraction module. The detection layer is Decoupled Head.
Figure 1. YOLOv8n network. P1–P5 represent the changes in feature scales extracted by the backbone network; X3–X5 are the output features; Upsample refers to upsampling; C2f is the feature extraction module. The detection layer is Decoupled Head.
Figure 2. Improved YOLOv8n network architecture. C2f_SCConv is the reconstructed feature extraction module, Triplet Attention serves as the adaptive feature selection module, and Detect_SLCD represents the improved detection layer structure. P1–P5 are the extracted features, and X3–X5 are the output features.
Figure 2. Improved YOLOv8n network architecture. C2f_SCConv is the reconstructed feature extraction module, Triplet Attention serves as the adaptive feature selection module, and Detect_SLCD represents the improved detection layer structure. P1–P5 are the extracted features, and X3–X5 are the output features.
Figure 3. C2f_SCConv structure. n represents the number of Bottleneck_SCConv; Split refers to the splitting mechanism.
Figure 3. C2f_SCConv structure. n represents the number of Bottleneck_SCConv; Split refers to the splitting mechanism.
Figure 4. SCConv structure. Features processed by 1 × 1 Conv, refined spatially by SRU and channel-wise by CRU, extracted again by 1 × 1 Conv, fused with originals.
Figure 4. SCConv structure. Features processed by 1 × 1 Conv, refined spatially by SRU and channel-wise by CRU, extracted again by 1 × 1 Conv, fused with originals.
Figure 5. Triplet Attention. The first branch performs interaction between channel and dimension , the second branch performs interaction between channel and dimension , and the third branch does not perform any interaction.
Figure 5. Triplet Attention. The first branch performs interaction between channel and dimension , the second branch performs interaction between channel and dimension , and the third branch does not perform any interaction.
Figure 6. Comparison of YOLOv5 and YOLOv8 Detection Heads. Conv and Conv2d refer to convolution operations. Bbox.Loss stands for the bounding box regression loss, and CIoU and DFL are regression loss functions. Cls.Loss represents the classification loss, and BCE is a classification loss function.
Figure 6. Comparison of YOLOv5 and YOLOv8 Detection Heads. Conv and Conv2d refer to convolution operations. Bbox.Loss stands for the bounding box regression loss, and CIoU and DFL are regression loss functions. Cls.Loss represents the classification loss, and BCE is a classification loss function.
Figure 7. SLCD structure. Distinct 1 × 1 Conv for scales, shared 3 × 3 Conv for extraction, then separate shared Conv for classification and regression.
Figure 7. SLCD structure. Distinct 1 × 1 Conv for scales, shared 3 × 3 Conv for extraction, then separate shared Conv for classification and regression.
Figure 8. Detect layer structure. All convolution operations are separate and independent, without sharing.
Figure 8. Detect layer structure. All convolution operations are separate and independent, without sharing.
Figure 9. Shared convolutional structural layer, with introduction of GN into the Conv layers, shared 3 × 3 Conv layers as well as the classification and regression Conv layers.
Figure 9. Shared convolutional structural layer, with introduction of GN into the Conv layers, shared 3 × 3 Conv layers as well as the classification and regression Conv layers.
Figure 10. Conv_ GN structure. Replace the BN in the original Conv structure with GN. SiLU serves as an activation function.
Figure 10. Conv_ GN structure. Replace the BN in the original Conv structure with GN. SiLU serves as an activation function.
Figure 11. PCB defect image.
Figure 11. PCB defect image.
Figure 12. Comparison of loss function curves before and after algorithm improvement.
Figure 12. Comparison of loss function curves before and after algorithm improvement.
Figure 13. The comparison of P-R Curve before and after model improvement.
Figure 13. The comparison of P-R Curve before and after model improvement.
Figure 14. Comparison of various metrics before and after algorithm improvement.
Figure 14. Comparison of various metrics before and after algorithm improvement.
Figure 15. Comparison of detection results before and after algorithm improvement.
Figure 15. Comparison of detection results before and after algorithm improvement.
Table 1. Comparative results of ablation experiments.
Table 1. Comparative results of ablation experiments.
Experiment | SCConv | Triplet | SLCD | WIoU | Recall | Precision | mAP | FLOPs/G |
---|---|---|---|---|---|---|---|---|
I | × | × | × | × | 0.921 | 0.967 | 0.948 | 8.2 |
II | √ | × | × | × | 0.936 | 0.983 | 0.964 | 8.1 |
III | × | √ | × | × | 0.941 | 0.986 | 0.964 | 8.2 |
IV | × | × | √ | × | 0.944 | 0.981 | 0.959 | 6.1 |
V | × | × | × | √ | 0.951 | 0.991 | 0.976 | 8.2 |
VI | √ | √ | × | × | 0.951 | 0.989 | 0.968 | 8.1 |
VII | √ | √ | √ | × | 0.949 | 0.984 | 0.969 | 5.9 |
VIII | √ | √ | × | √ | 0.954 | 0.995 | 0.981 | 5.9 |
IX | √ | √ | √ | √ | 0.965 | 0.998 | 0.986 | 5.9 |
Note: FLOPs/G measures the floating-point calculation number by 109 operations. √ represents adding this module, × represents not adding this module.
Table 2. Comparison of the effect of different detection heads.
Table 2. Comparison of the effect of different detection heads.
Head | Precision | mAP | FLOPs/G |
---|---|---|---|
Baseline | 0.967 | 0.948 | 8.2 |
AuxHead | 0.978 | 0.958 | 11.2 |
DynamicHead | 0.977 | 0.955 | 11.8 |
SEAMHead | 0.959 | 0.943 | 7.1 |
This work | 0.981 | 0.959 | 6.1 |
Table 3. Comparison of algorithms in each category.
Table 3. Comparison of algorithms in each category.
Literature | Precision/% | mAP/% | FPS | ||
---|---|---|---|---|---|
Xiao (2024) [21] | 93.2 | 95.8 | 5.8 | 12.6 | 128 |
Zhou (2023) [22] | 94.7 | 96.5 | 3.8 | 5.2 | 88.4 |
Yuan (2024) [23] | 93.9 | 97.3 | 6.5 | 6.8 | 141.5 |
Tang (2023) [24] | 91.6 | 95.5 | 92.3 | 41.4 | 92.5 |
Du (2023) [25] | 93.2 | 96.4 | 20.1 | 12.8 | 48.9 |
YOLOv5s | 97.1 | 95.1 | 13.7 | 15.8 | 110.1 |
YOLOv7-tiny | 93.4 | 95.3 | 6.03 | 10.2 | 82.5 |
YOLOv8n | 97.7 | 94.8 | 5.96 | 8.2 | 124.8 |
This work | 99.8 | 98.6 | 4.1 | 5.9 | 144.1 |
Table 4. COCO_2017 dataset.
Table 4. COCO_2017 dataset.
Model | Size (Pixels) | mAPval 50–95 | |
---|---|---|---|
YOLOv8n | 640 | 37.3 | 8.7 |
Improved_YOLOv8n | 640 | 37.28 | 6.3 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).