Background: This study investigates the capability of artificial intelligence (AI) to effectively illustrate scientific concepts for communication and educational purposes. It aims to evaluate the performance of three leading text-to-image Generative Artificial Intelligence (GenAI) models-Midjourney, DALL-E, and Stable Diffusion-in generating scientifically accurate visuals.
Methods: To assess the models, we employed a benchmarking approach, generating 120 images based on scientifically informed prompts. Each model's output was analyzed for aesthetics, core scientific concept representation, contextual relevance, and factual accuracy.
Results: The evaluation revealed that while GenAI models excelled in aesthetic quality, achieving a score of 90.83%, their success in capturing core scientific concepts was moderate at 48.30%. More concerning were the significant limitations in contextual relevance (9.17%) and factual accuracy, which scored a troubling 0%.
Conclusion: These findings underscore the current deficiencies of GenAI in producing effective educational illustrations. They highlight the urgent need for targeted training using domain-specific scientific datasets to enhance the precision of generated visual aids. Although the potential for AI in scientific communication is promising, substantial advancements are required to ensure both factual and contextual accuracy, facilitating a clearer understanding of complex concepts.