In This Story
Developing a robust cybersecurity system demands a lot of data, but keeping all that data in a single location introduces the risk of a data breach. Luckily, George Mason University researcher Zhuangdi Zhu in the Department of Cyber Security Engineering specializes in federated learning, a method of decentralized machine learning that allows multiple entities to collaboratively train a machine learning model without sharing their raw data.
Zhu is using this expertise to assist Wajih Ul Hassan of the University of Virginia develop scalable host-based intrusion detection systems (HIDS). Together, Zhu and Hassan are leveraging artificial intelligence (AI) to enhance a given host’s cybersecurity while ensuring data privacy.
HIDS monitor and analyze the activities within a single host or device. These systems use data, such as system logs, web browsing logs, and process logs, to detect irregularities that may indicate a security threat. For example, Zhu noted, a HIDS might detect malicious software that prompts a browser to download a harmful PDF file. By identifying such irregularities, the HIDS can prevent potential security breaches. Central management of these logs through managed security service providers (MSSPs) on cloud servers, however, could potentially breach user privacy. Enter: federated learning.

In a federated learning system, each entity trains a local model on its own data and only shares the model updates with a central server. Local models are trained on individual datasets, and their findings are sent to a central server, Zhu explained. The server then aggregates these parameters to create a global model, which is subsequently sent back to the local models. This iterative process continues until a robust and accurate model is developed. Federated learning thus ensures sensitive data remains within the local models, addressing privacy concerns and complying with data regulations.
The project, funded by the Commonwealth Cyber Initiative, is set to span one year, starting in the summer of 2025. In addition to developing a federated learning framework for HIDS, the team plans to create an open-source dataset for intrusion detection, which will serve as a valuable resource for future research and development in this field.
Zhu's work on decentralized machine learning not only addresses critical privacy concerns but also paves the way for more secure and efficient threat detection systems. Her work on this project represents a significant step forward in the ongoing effort to protect digital environments from evolving cyber threats.