The performance of GPT-3.5 and GPT-4 on genetic tests at PhD-level: GPT-4 as a promising tool for genomic medicine and education

Khosravi , Teymoor; Rahimzadeh, Arian; Motallebi , Farzaneh; Vaghefi , Fatemeh; Mohammad Al Sudani , Zainab; Oladnabi , Morteza

doi:10.29252/JCBR.8.4.22

Volume 8, Issue 4 (Journal of Clinical and Basic Research (JCBR) 2024) jcbr 2024, 8(4): 22-26 | Back to browse issues page

‎ 10.29252/JCBR.8.4.22

Mendeley

Zotero

RefWorks

Khosravi T, Rahimzadeh A, Motallebi F, Vaghefi F, Mohammad Al Sudani Z, Oladnabi M. The performance of GPT-3.5 and GPT-4 on genetic tests at PhD-level: GPT-4 as a promising tool for genomic medicine and education. jcbr 2024; 8 (4) :22-26
URL: http://jcbr.goums.ac.ir/article-1-476-en.html

The performance of GPT-3.5 and GPT-4 on genetic tests at PhD-level: GPT-4 as a promising tool for genomic medicine and education

Teymoor Khosravi¹

, Arian Rahimzadeh¹

, Farzaneh Motallebi¹

, Fatemeh Vaghefi¹

, Zainab Mohammad Al Sudani¹

, Morteza Oladnabi ^*²

1- Student Research Committee, Golestan University of Medical Sciences, Gorgan, Iran
2- Gorgan Congenital Malformations Research Center, Golestan University of Medical Sciences, Gorgan, Iran , Department of Medical Genetics, School of Advanced Technologies in Medicine, Golestan University of Medical Sciences, Gorgan, Iran , Ischemic Disorders Research Center, Golestan University of Medical Sciences, Gorgan, Iran , oladnabidozin@yahoo.com

Abstract: (1760 Views)

Background: Natural Language Processing (NLP) has empowered AI models to understand and generate human language, with transformer-based architectures like GPT-3 and GPT-4 marking significant advancements. GPT-4, equipped with a larger parameter count and multimodal capabilities, offers enhanced accuracy and contextual understanding over its predecessor, GPT-3.5. However, challenges such as factual inaccuracies remain. This study aims to evaluate GPT-4’s performance on genetics-related tasks, assessing its strengths and limitations compared to GPT-3.5.
Methods: We assessed GPT-4's performance across five key genetic tasks: (1) understanding basic genetic concepts, (2) interpreting family pedigrees, (3) analyzing genetic mutations, (4) solving population genetics problems, and (5) answering medical genetics Ph.D. entrance exam questions. Both open-ended and multiple-choice questions (MCQs) were used, some of which required forced justification to evaluate reasoning. GPT-4’s multimodal capabilities were also tested using pedigree images for inheritance pattern analysis.
Results: GPT-4 demonstrated perfect accuracy in Task 1 (basic genetic concepts) and Task 3 (genetic mutation interpretation), correctly answering all 10 and 16 questions, respectively. In Task 2 (pedigree analysis), GPT-4 answered 24 out of 71 questions correctly, with 47 incorrect responses. For Task 4 (population genetics problems), GPT-4 provided 30 correct answers out of 34. In Task 5, which assessed performance on a Ph.D. entrance exam, GPT-4 correctly answered 58 out of 80 questions. Performance was notably higher for MCQs than for open-ended questions.
Conclusion: GPT-4 substantially improves over GPT-3.5, particularly in understanding genetic concepts and interpreting genetic mutations. Despite these advances, its performance in more complex tasks, such as pedigree analysis, reveals areas that require further refinement. These findings highlight GPT-4's potential in advancing genetic education and research. Future studies should further explore GPT-4's capabilities and address its limitations in tasks that demand higher reasoning and factual accuracy.

Keywords: Natural Language Processing, Generative Artificial Intelligence, Genetics

Full-Text [PDF 617 kb] (725 Downloads) | | Full-Text (HTML) (674 Views)

Article Type: Research | Subject: Informatics

References

1. Khurana D, Koli A, Khatter K, Singh S. Natural language processing: State of the art, current trends and challenges. Multimed Tools Appl. 2023;82(3):3713-44. [View at Publisher] [DOI] [PMID] [Google Scholar]

2. Kocoń J, Cichecki I, Kaszyca O, Kochanek M, Szydło D, Baran J, et al. ChatGPT: Jack of all trades, master of none. Information Fusion. 2023;99:101861. [View at Publisher] [DOI] [Google Scholar]

3. Liu Y, Han T, Ma S, Zhang J, Yang Y, Tian J, et al. Summary of chatgpt-related research and perspective towards the future of large language models. Meta-Radiology. 2023;1(2):100017. [View at Publisher] [DOI] [Google Scholar]

4. Han X, Zhang Z, Ding N, Gu Y, Liu X, Huo Y, et al. Pre-trained models: Past, present and future. AI Open. 2021;2:225-50. [View at Publisher] [DOI] [Google Scholar]

5. Ouyang L, Wu J, Jiang X, Almeida D, Wainwright C, Mishkin P, et al. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35. 2022:27730-44. [View at Publisher] [Google Scholar]

6. Wei J, Tay Y, Bommasani R, Raffel C, Zoph B, Borgeaud S, et al. Emergent abilities of large language models. arXiv preprint arXiv:220607682. 2022. [View at Publisher] [Google Scholar]

7. Rahaman MS, Ahsan MT, Anjum N, Terano HJR, Rahman MM. From ChatGPT-3 to GPT-4: a significant advancement in ai-driven NLP tools. Journal of Engineering and Emerging Technologies. 2023;2(1):1-11. [View at Publisher] [DOI] [Google Scholar]

8. Chang EY, editor. Examining GPT-4: Capabilities, Implications and Future Directions. The 10th International Conference on Computational Science and Computational Intelligence; 2023. [View at Publisher] [Google Scholar]

9. Koubaa A. GPT-4 vs. GPT-3.5: A concise showdown. 2023. [PPR] [View at Publisher] [DOI] [Google Scholar]

10. Ghosn Y, El Sardouk O, Jabbour Y, Jrad M, Hussein Kamareddine M, Abbas N, et al. ChatGPT 4 Versus ChatGPT 3.5 on The Final FRCR Part A Sample Questions. Assessing Performance and Accuracy of Explanations. medRxiv. 2023. [PPR] [View at Publisher] [DOI] [Google Scholar]

11. Egli A. ChatGPT, GPT-4, and other large language models: The next revolution for clinical microbiology? Clinical Infectious Diseases. 2023;77(9):1322-8. [DOI] [PMID]

12. Espejel JL, Ettifouri EH, Alassan MSY, Chouham EM, Dahhane W. GPT-3.5, GPT-4, or BARD? Evaluating LLMs reasoning ability in zero-shot setting and performance boosting through prompts. Natural Language Processing Journal. 2023;5:100032. [View at Publisher] [DOI] [Google Scholar]

13. Kozachek D, editor. Investigating the Perception of the Future in GPT-3,-3.5 and GPT-4. Proceedings of the 15th Conference on Creativity and Cognition; 2023. [View at Publisher] [DOI] [Google Scholar]

14. Kilic ME. AI in Medical Education: A Comparative Analysis of GPT-4 and GPT-3.5 on Turkish Medical Specialization Exam Performance. medRxiv. 2023. [PPR] [View at Publisher] [DOI] [Google Scholar]

15. Wang W, Shi J, Tu Z, Yuan Y, Huang J-t, Jiao W, et al. The Earth is Flat? Unveiling Factual Errors in Large Language Models. arXiv preprint. 2024. [View at Publisher] [Google Scholar]

16. Adesso G. Towards the ultimate brain: Exploring scientific discovery with ChatGPT AI. AI Magazine. 2023;44(3):328-42. [View at Publisher] [DOI] [Google Scholar]

17. Hadi MU, Qureshi R, Shah A, Irfan M, Zafar A, Shaikh MB, et al. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Preprints. 2023. [View at Publisher] [DOI] [Google Scholar]

18. Ray PP. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems. 2023;3:121-154. [View at Publisher] [DOI] [Google Scholar]

19. Khosravi T, Al Sudani ZM, Oladnabi M. To what extent does ChatGPT understand genetics? Innovations in Education and Teaching International. 2024;61(6):1320-1329. [View at Publisher] [DOI] [Google Scholar]

20. Kung JE, Marshall C, Gauthier C, Gonzalez TA, Jackson JB, 3rd. Evaluating ChatGPT Performance on the Orthopaedic In-Training Examination. JB JS Open Access. 2023;8(3):e23.00056. [View at Publisher] [DOI] [PMID] [Google Scholar]

22. Rosoł M, Gąsior JS, Łaba J, Korzeniewski K, Młyńczak M. Evaluation of the performance of GPT-3.5 and GPT-4 on the Medical Final Examination. medRxiv. 2023. [PPR] [View at Publisher] [DOI] [Google Scholar]

23. Cebesoy UB, Oztekin C. Genetics literacy: Insights from science teachers' knowledge, attitude, and teaching perceptions. International Journal of Science and Mathematics Education. 2018;16:1247-68. [View at Publisher] [DOI] [Google Scholar]

24. Gu Y, Tinn R, Cheng H, Lucas M, Usuyama N, Liu X, et al. Domain-specific language model pretraining for biomedical natural language processing. ACM Transactions on Computing for Healthcare (HEALTH). 2021;3(1):1-23. [View at Publisher] [DOI] [Google Scholar]

Send email to the article author

Rights and permissions
	This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0).

Copyright Policy
This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License which allows users to read, copy, distribute and make derivative works for non-commercial purposes from the material, as long as the author of the original work is cited properly.

Contact us

Email: JCBR@goums.ac.ir

Telephone: +98-17324255166