Quantifying the Privacy/Utility Trade-off in Generative Model based Synthetic Data
This project aims to develop methods and tools to assess privacy/utility trade-offs of synthetic data generation, and to build up a comprehensive picture of privacy and utility for existing generative models (GMs) that offer privacy guarantees.
Publication and other sharing of data sets, even if 'anonymised', has proven to have unacceptable privacy risks in many application areas involving significant amounts of personal data. Synthetic data generators (SDGs) is a so far promising alternative, but the justification of its use needs to be substantiated by methods to systematically assess its quality, in terms of both privacy and utility.
This project aims to develop methods and tools to assess privacy/utility trade-offs of synthetic data generation, and to build up a comprehensive picture of privacy and utility for existing generative models (GMs) that offer privacy guarantees.
This provides confidence for users of such methods, deeper insight into trade-offs between privacy and utility, and will enable the development of new refined SDG methods and the improved configuration of existing SDGs for new application domains in industry and the public sector. In addition, the project will contribute to improving the reproducibility of research results by defining a standard methodology and standard metrics that can be used for the evaluation of new SDGs.
This project is funded by The Alan Turing Institute under the Accenture / Turing Strategic Partnership.