Synthetic Data

Synthetic data is a fundamental concept in new data technologies that makes use of non-authentic, invented or automatically generated data that are not event-generated in the real world.

In contrasting real and synthetic data, it's possible to understand more about how machine learning and other new forms of artificial intelligence work.

The clearest way to explain the concept of synthetic data is that synthetic data is not “real” data created naturally in the real world, “IRL” or “in the meatspace” as pros sometimes refer to the non-digital world. Synthetic data is created without actual driving organic data events.

For example, while a real set of identifiers is collected about a customer who uses a platform, an engineer could ultimately just create the same identifiers for a fictional customer, and load them into the system – and that would be an example of synthetic data.

A better understanding of synthetic data has to do with how it's used in machine learning and similar technologies. First of all, synthetic data can help to give a machine learning program more to work with – but the key is in how that data is generated, because unlike real data, synthetic data has to be imagined and invented.

Synthetic data can also be used as a honeypot to foil hackers. Companies might create vast troves of synthetic data with non-authentic financial identifiers, for example, and put those on a system to see how they are targeted by outside attackers. That's another common use of synthetic data in IT systems.

The use of synthetic data is due to be a major issue in the development of future test and training data sets from machine learning technologies such as neural networks.

Post a Comment

0 Comments