AI That Keeps Data Private — Without Losing Its Informational Flexibility

Representational image of AI

This post is also available in: עברית (Hebrew)

A new advancement in federated learning promises improved AI performance for sensitive sectors like healthcare and finance, while preserving data privacy. Federated learning enables multiple institutions to collaboratively train artificial intelligence models without exchanging raw personal data. However, a known challenge arises when each organization fine-tunes the shared AI to its own data environment: the model can become overly specialized, losing its ability to generalize well across diverse data types—a problem known as local overfitting.

Researchers from the Korea Advanced Institute of Science and Technology (KAIST) have developed a technique that addresses this issue by introducing synthetic data during the fine-tuning phase. This synthetic data is generated by extracting core, non-personal features from each institution’s datasets, creating virtual data points that help the AI retain knowledge from all collaborators while adapting to local needs.

The approach works by mixing local data with “global synthetic data”, representing information from other institutions, effectively acting as a safeguard to prevent the AI from forgetting previously learned patterns. This method maintains a balance between specialized expertise and broad applicability, allowing the AI to stay versatile even as it fine-tunes for specific tasks.

According to TechXplore, testing demonstrated that this solution enhances AI stability and accuracy in environments where privacy is critical, such as hospitals and banks. It also performs well in fast-changing domains like social media and e-commerce, where new users and products regularly enter the system. The system showed resilience to changes, including the addition of new institutions to the federated network.

This innovation marks a step forward in enabling secure, collaborative AI development without compromising privacy or performance. It is particularly promising for areas where sharing sensitive data is restricted but collaboration is needed to build effective AI solutions, such as medical environments and fraud detection in finance.

The findings were published on the arXiv preprint server.