CMAGAN: classifier-aided minority augmentation generative adversarial networks for industrial imbalanced data and its application to fault prediction

doi:10.1007/s40436-024-00496-y

Abstract

Abstract: Class imbalance is a common characteristic of industrial data that adversely affects industrial data mining because it leads to the biased training of machine learning models. To address this issue, the augmentation of samples in minority classes based on generative adversarial networks (GANs) has been demonstrated as an effective approach. This study proposes a novel GAN-based minority class augmentation approach named classifier-aided minority augmentation generative adversarial network (CMAGAN). In the CMAGAN framework, an outlier elimination strategy is first applied to each class to minimize the negative impacts of outliers. Subsequently, a newly designed boundary-strengthening learning GAN (BSLGAN) is employed to generate additional samples for minority classes. By incorporating a supplementary classifier and innovative training mechanisms, the BSLGAN focuses on learning the distribution of samples near classification boundaries. Consequently, it can fully capture the characteristics of the target class and generate highly realistic samples with clear boundaries. Finally, the new samples are filtered based on the Mahalanobis distance to ensure that they are within the desired distribution. To evaluate the effectiveness of the proposed approach, CMAGAN was used to solve the class imbalance problem in eight real-world fault-prediction applications. The performance of CMAGAN was compared with that of seven other algorithms, including state-of-the-art GAN-based methods, and the results indicated that CMAGAN could provide higher-quality augmented results.

The full text can be downloaded at https://link.springer.com/article/10.1007/s40436-024-00496-y

Key words: Class imbalance, Minority class augmentation, Generative adversarial network (GAN), Boundary strengthening learning (BSL), Fault prediction

Wen-Jie Wang, Zhao Liu, Ping Zhu. CMAGAN: classifier-aided minority augmentation generative adversarial networks for industrial imbalanced data and its application to fault prediction[J]. Advances in Manufacturing, 2024, 12(3): 603-618.

TrendMD

References

1. Jiang X, Ge Z (2021) Data augmentation classifier for imbalanced fault classification. IEEE Trans Autom Sci Eng 18(3):1206-1217
2. Liu F, Dai Y (2022) Product processing quality classification model for small-sample and imbalanced data environment. Comput Intell Neurosci 2022:9024165. https://doi.org/10.1155/2022/9024165
3. Li Z, Wang Y, Wang K (2017) Intelligent predictive maintenance for fault diagnosis and prognosis in machine centers:Industry 4.0 scenario. Adv Manuf 5(4):377-387
4. Zhuo Y, Ge Z (2020) Gaussian discriminative analysis aided GAN for imbalanced big data augmentation and fault classification. J Process Control 92:271-287
5. Lan Z, Huang G, Li Y et al (2022) Conquering insufficient/imbalanced data learning for the internet of medical things. Neural Comput Appl 35(31):22949-22958
6. Shao S, Wang P, Yan R (2019) Generative adversarial networks for data augmentation in machine fault diagnosis. Comput Ind 106:85-93
7. Islam A, Belhaouari SB, Rehman AU et al (2022) KNNOR:an oversampling technique for imbalanced datasets. Appl Soft Comput 115:108288. https://doi.org/10.1016/j.asoc.2021.108288
8. Krawczyk B, Wozniak M, Schaefer G (2014) Cost-sensitive decision tree ensembles for effective imbalanced classification. Appl Soft Comput 14:554-562
9. Yang K, Yu Z, Wen X et al (2020) Hybrid classifier ensemble for imbalanced data. IEEE Trans Neural Netw Learn Syst 31(4):1387-1400
10. Madani M, Motameni H, Mohamadi H (2023) KNNGAN:an oversampling technique for textual imbalanced datasets. J Supercomput 79(5):5291-5326
11. Wei Z, Zhang L, Zhao L (2023) Minority-prediction-probability-based oversampling technique for imbalanced learning. Inf Sci 622:1273-1295
12. Koziarski M (2021) Potential anchoring for imbalanced data classification. Pattern Recognit 120:108114. https://doi.org/10.1016/j.patcog.2021.108114
13. Xie Y, Qiu M, Zhang H et al (2022) Gaussian distribution based oversampling for imbalanced data classification. IEEE Trans Knowl Data Eng 34(2):667-679
14. Kaur H, Pannu HS, Malhi AK (2019) A systematic review on imbalanced data challenges in machine learning:applications and solutions. ACM Comput Surv 52(4):1-36
15. Liu X, Wu J, Zhou Z (2009) Exploratory undersampling for class-Imbalance learning. IEEE Trans Syst Man Cybern B 39(2):539-550
16. Liu R (2023) A novel synthetic minority oversampling technique based on relative and absolute densities for imbalanced classification. Appl Intell 53(1):786-803
17. Son M, Jung S, Jung S et al (2021) BCGAN:a CGAN-based over-sampling model using the boundary class for data balancing. J Supercomput 77(9):10463-10487
18. Chawla NV, Bowyer KW, Hall LO et al (2002) SMOTE:synthetic minority over-sampling technique. J Artif Intell Res 16:321-357
19. He H, Bai Y, Garcia EA et al (2008) ADASYN:adaptive synthetic sampling approach for imbalanced learning. In:2008 IEEE international joint conference on neural networks, IEEE, pp 1322-1328
20. Han H, Wang WY, Mao BH (2005) Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning. Adv Intell Comput 3644:878-887
21. Douzas G, Bacao F, Last F (2018) Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE. Inf Sci 465:1-20
22. Goodfellow I, Pouget-Abadie J, Mirza M et al (2014) Generative adversarial nets. Adv Neural Inf Process Syst 27. https://doi.org/10.3156/jsoft.29.5_177_2
23. Qin Z, Liu Z, Zhu P et al (2022) Style transfer in conditional GANs for cross-modality synthesis of brain magnetic resonance images. Comput Biol Med 148:105928. https://doi.org/10.1016/j.compbiomed.2022.105928
24. Li Y, Gan Z, Shen Y et al (2019) StoryGAN:a sequential conditional GAN for story visualization. In:proceedings of the IEEE/CVF conference on computer vision and pattern recognition, Long Beach, CA, USA, pp 6322-6331
25. Yang G, Zhong Y, Yang L et al (2021) Fault diagnosis of harmonic drive with imbalanced data using generative adversarial network. IEEE Trans Instrum Meas 70:1-11
26. Li J, Cao L, Liu H et al (2023) Imbalanced data generation and fusion for in-situ monitoring of laser powder bed fusion. Mech Syst Signal Process 199:110508. https://doi.org/10.1016/j.ymssp.2023.110508
27. Li Y, Shi Z, Liu C et al (2022) Augmented time regularized generative adversarial network (ATR-GAN) for data augmentation in online process anomaly detection. IEEE Trans Autom Sci Eng 19(4):3338-3355
28. Yu Y, Guo L, Gao H et al (2022) PCWGAN-GP:a new method for imbalanced fault diagnosis of machines. IEEE Trans Instrum Meas 71:3180431. https://doi.org/10.1109/TIM.2022.3180431
29. Wang X, Jiang H, Liu Y et al (2023) Data-augmented patch variational autoencoding generative adversarial networks for rolling bearing fault diagnosis. Meas Sci Technol 34(5):055102. https://doi.org/10.1088/1361-6501/acb377
30. Wang X, Jiang H, Wu Z et al (2023) Adaptive variational autoencoding generative adversarial networks for rolling bearing fault diagnosis. Adv Eng Inform 56:102027. https://doi.org/10.1016/j.aei.2023.102027
31. Odena A, Olah C, Shlens J (2017) Conditional image synthesis with auxiliary classifier GANs. In:International conference on machine learning, Sydney, Australia, 2017
32. Park N, Mohammadi M, Gorde K et al (2018) Data synthesis based on generative adversarial networks. arXiv:1806.03384, https://doi.org/10.14778/3231751.3231757
33. Zhang Y, Zaidi N, Zhou J et al (2023) Interpretable tabular data generation. Knowl Inf Syst 65(7):2935-2963
34. Zhai J, Qi J, Zhang S (2022) Imbalanced data classification based on diverse sample generation and classifier fusion. Int J Mach Learn Cybern 13(3):735-750
35. Mirza M, Osindero S (2014) Conditional generative adversarial nets. https://doi.org/10.48550/arXiv.1411.1784
36. Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst, 32. https://doi.org/10.48550/arxiv.1907.00503
37. Dong Y, Xiao H, Dong Y (2022) SA-CGAN:an oversampling method based on single attribute guided conditional GAN for multi-class imbalanced learning. Neurocomputing 472:326-337
38. Choi E, Biswal S, Malin B et al (2017) Generating multi-label discrete patient records using generative adversarial networks.In:machine learning for healthcare conference, Northeastern University, 2017
39. Wen L, Zhang X, Li Q et al (2023) KGA:integrating KPCA and GAN for microbial data augmentation. Int J Mach Learn Cybern 14(4):1427-1444
40. De Maesschalck R, Jouan-Rimbaud D, Massart DL (2000) The mahalanobis distance. Chemometr Intell Lab Syst 50(1):1-18