- Subconscious Insights
- Posts
- Scaling Synthetic Data with 1 Billion Personas - A Research Summary
Scaling Synthetic Data with 1 Billion Personas - A Research Summary

Here's the gist:
1. A collection of 1 billion diverse personas created from web data, representing 13% of the world’s population. These personas serve as distributed carriers of world knowledge, allowing for the creation of synthetic data from various perspectives.
2. The methodology leverages a large language model (LLM) to create high-quality synthetic data across domains, including math problems, logical reasoning, game NPCs, and more.
3. This process is adaptable to different data synthesis scenarios, potentially influencing how LLMs are developed and researched.
4. This approach in data creation can simulate diverse user behaviors, predict reactions to new products or policies, and support the development of virtual societies in the metaverse.
5. The paper emphasizes the importance of ethical and responsible application to avoid misuse and ensure the technology benefits society.
If you’d like to try out the Subconscious.ai platform for your market research needs, you can run 2 free experiments by clicking “Get Started” here.