SYNTHETIC DATA GENERATION FOR ENABLING PRIVACY-PRESERVING CYBERSECURITY RESEARCH AND MODEL TRAINING

ODARA RAPHEAL; CHUKWUMDIMMA PRECIOUS UBAH; EMMANUEL TOBA POPOOLA; JAMES OLAOLUWA ABIODUN

doi:10.70382/tijasdr.v09i2.059

VOL. 9 2025

Articles

SYNTHETIC DATA GENERATION FOR ENABLING PRIVACY-PRESERVING CYBERSECURITY RESEARCH AND MODEL TRAINING

Published 17-09-2025

ODARA RAPHEAL
CHUKWUMDIMMA PRECIOUS UBAH
EMMANUEL TOBA POPOOLA
JAMES OLAOLUWA ABIODUN

ODARA RAPHEAL
University of Benin, Nigeria. Department of Chemical Engineering

CHUKWUMDIMMA PRECIOUS UBAH
Federal University of Technology, Owerri. Department of Computer Science

EMMANUEL TOBA POPOOLA
Ladoke Akintola University of Technology, Oyo State, Nigeria. Department of Computer Science and Engineering.

JAMES OLAOLUWA ABIODUN
Federal Polytechnic Bida, Niger State, Nigeria. Department of Computer Science

PDF

DOI: 10.70382/tijasdr.v09i2.059

Article views: 40

PDF downloads: 11

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

ODARA RAPHEAL, CHUKWUMDIMMA PRECIOUS UBAH, EMMANUEL TOBA POPOOLA, & JAMES OLAOLUWA ABIODUN. (2025). SYNTHETIC DATA GENERATION FOR ENABLING PRIVACY-PRESERVING CYBERSECURITY RESEARCH AND MODEL TRAINING. International Journal of African Sustainable Development Research, 9(2). https://doi.org/10.70382/tijasdr.v09i2.059

Download Citation

Abstract

High-quality datasets are a priority in today's cybersecurity, but they are most likely unavailable because of privacy policies and dataset limitations. Synthetic data generation presents a strong solution, which enables intrusion detection systems and threat models to be trained without exposing actual sensitive data. This research addresses two primary issues: Can synthetic network traffic accurately emulate real-world data for cybersecurity purposes? Can privacy-preserving mechanisms defend against advanced attacks? We evaluate five generative methods, CTGAN, CopulaGAN, tabular diffusion models, and their differentially private (DP-augmented) variants, on NSL-KDD and CICIDS-2017 datasets. We quantify utility using statistical fidelity, classifier accuracy (AUC, F1-score), diversity, and resistance to membership inference and reconstruction attacks. Results indicate GAN-based models achieve more than 90% fidelity and keep classifier AUC 3% behind real-data baselines, with diffusion models enabling higher diversity at the cost of less computation. DP-SGD integration effectively thwarts attacks to within-random accuracy with little loss of utility. Some limitations continue, though. Synthetic data can potentially exclude intricate correlations of real traffic, and harsh privacy settings (ε ≤ 1) have a strong impact on downstream performance, demonstrating difficult trade-offs between fidelity, diversity, and privacy protection. Our work is twofold: (1) a strict, comparative benchmark of synthetic data methods for cybersecurity; (2) empirical validation of DP-augmented synthesis as a feasible and resilient option; and (3) a best-practice framework in equilibrium among utility, diversity, privacy, and known bounds. Our work enables responsible AI and ethical data sharing for cybersecurity, while demonstrating appreciation for the balance between privacy and utility in these trade-offs.

Keywords

CTGAN, conditional GAN, CopulaGAN, Cybersecurity, differential privacy, diffusion model, generative models, membership inference, synthetic data

Downloads

Download data is not yet available.

SYNTHETIC DATA GENERATION FOR ENABLING PRIVACY-PRESERVING CYBERSECURITY RESEARCH AND MODEL TRAINING

Abstract

Keywords

Downloads

Most read articles by the same author(s)

Similar Articles

Similar Articles

SIR MODEL ON THE REVIEW IMPACT OF SOCIAL DISTANCING FOR COVID-19

UNDERSTANDING THE INFLUENCE OF OUTLIERS ON MACHINE LEARNING MODEL INTERPRETABILITY

ARTIFICIAL INTELLIGENCE FOR URBAN TRANSPORTATION MANAGEMENT: DIGITAL TWINS, SIMULATION, AND OPTIMIZATION USING AI, GENAI, AND GAME THEORY

THE EVOLUTION OF PRIVACY LAWS IN THE DIGITAL AGE

AI-ENHANCED SUSTAINABLE SEISMIC DESIGN FOR NET-ZERO ENERGY BUILDINGS USING MACHINE LEARNING AND PARAMETRIC MODELLING

ENHANCED AND BETTER LIVING IN 21ST CENTURY THROUGH INFORMATION AND COMMUNICATION TECHNOLOGY (ICT)

NUMERICAL PREDICTION OF THE IMPACT OF EXPERIMENTAL TIME FOR FIXED PARAMETER VALUES IN THE INTERACTION BETWEEN PM2.5 AND RELATIVE HUMIDITY AS WELL AS THE INTERACTION BETWEEN PM1 AND RELATIVE HUMIDITY

SUSTAINABLE REAL ESTATE FINANCING AND INVESTMENT FOR 21ST CENTURY

ENHANCING UNIVERSITY COMPETITIVENESS THROUGH BENCHMARKING: A STUDY OF INDUSTRY AND BEST PRACTICE BENCHMARKING IN NIGERIAN PRIVATE UNIVERSITIES

HARNESSING ARTIFICIAL INTELLIGENCE TO ADDRESS RISING INSECURITY, INEFFECTIVE GOVERNANCE AND ECONOMIC DOWNTURNS