menu_open Columnists
We use cookies to provide some features and experiences in QOSHE

More information  .  Close

When fake data is a good thing – how synthetic data trains AI to solve real problems

1 22
yesterday

You’ve just finished a strenuous hike to the top of a mountain. You’re exhausted but elated. The view of the city below is gorgeous, and you want to capture the moment on camera. But it’s already quite dark, and you’re not sure you’ll get a good shot. Fortunately, your phone has an AI-powered night mode that can take stunning photos even after sunset.

Here’s something you might not know: That night mode may have been trained on synthetic nighttime images, computer-generated scenes that were never actually photographed.

As artificial intelligence researchers exhaust the supply of real data on the web and in digitized archives, they are increasingly turning to synthetic data, artificially generated examples that mimic real ones. But that creates a paradox. In science, making up data is a cardinal sin. Fake data and misinformation are already undermining trust in information online. So how can synthetic data possibly be good? Is it just a polite euphemism for deception?

As a machine learning researcher, I think the answer lies in intent and transparency. Synthetic data is generally not created to manipulate results or mislead people. In fact, ethics may require AI companies to use synthetic data: Releasing real human face images, for example, can violate privacy, whereas synthetic faces can offer similar benefit with formal privacy guarantees.

There are other reasons that........

© The Conversation