Synthetic Financial Data Generation Opportunities – Tearsheet

0

While data can be exceptionally useful for analysis and policy development, mismanagement of access to it can lead to significant security risks for organizations and consumers. Personally identifiable information poses a challenge to organizations, which generally want to retain as much detail as possible, without exposing customers to privacy risks.

One solution is to generate synthetic data, which mimics real datasets but contains no PII. Additionally, synthetic data bypasses the labor and costs of collecting and organizing data, allowing teams to develop algorithms faster and with less bureaucracy.

Over the past year, companies like Microsoft, Google, and Amazon have all spoken about the importance of synthetic data and its use in their current architecture. San-Diego-based startup and creator of synthetic data Gretel.ai closed a $50 million Series B funding round in October, led by Anthos Capital. Their products, such as a privacy toolkit, protect synthetic data from contradictory attacks and also allows teams to debiase and anonymize their datasets, while allowing data to be shared between teams more securely.

JP Morgan AI Research has developed the following model to generate synthetic datasets:

Source: JP Morgan

The schematic representation is explained by JP Morgan as follows:

Step 1: Calculate metrics for actual data
2nd step: Develop a generator (Perhaps Statistical methods or one agent-based simulation)
Step 3: (Optional) Calibrate generator using real data
Step 4: Run the generator to generate synthetic data
Step 5: Calculate metrics for synthetic data
Step 6: Compare measurements real data and synthetic data
Step 7: (Optional) Refine generator to improve against comparison metrics

In their to research in this regard, JP Morgan found that tabular data in retail banking and time series of market microstructure data are the most to be protected by financial institutions.

Connect to our Data Day Conference on June 21 to learn more about how data is changing the fintech landscape.

Share.

Comments are closed.