What are the key points?

Researchers discovered the Reversal Curse, where LLMs fail to learn the reverse of learned relationships. Fine-tuning tests on GPT-3 and Llama-1 show models cannot infer 'B is A' from 'A is B'. GPT-4 correctly identifies celebrity mothers 79% of the time but correctly identifies their children only 33%.

LLMs Fail to Generalize Reverse Relationships

•Researchers discovered the Reversal Curse, where LLMs fail to learn the reverse of learned relationships.
•Fine-tuning tests on GPT-3 and Llama-1 show models cannot infer 'B is A' from 'A is B'.
•GPT-4 correctly identifies celebrity mothers 79% of the time but correctly identifies their children only 33%.

Researchers identified a failure in large language models known as the Reversal Curse, where models trained on a statement of the form "A is B" fail to generalize to the reverse "B is A." This phenomenon prevents models from automatically inferring a bidirectional relationship even when the training data contains strong patterns indicating that such relationships are usually symmetric. For example, a model trained on the statement "Valentina Tereshkova was the first woman to travel to space" fails to correctly answer the query, "Who was the first woman to travel to space?"

The study provides empirical evidence for this limitation by fine-tuning GPT-3 and Llama-1 models on fictitious statements like "Uriah Hawthorne is the composer of Abyssal Melodies." These models consistently failed to identify the composer when prompted with the reverse query. The Reversal Curse remains robust across various model sizes and families, and standard data augmentation techniques do not alleviate the issue. Models only successfully infer the reverse relationship if it is provided explicitly within the context window.

Evaluations of ChatGPT, specifically GPT-3.5 and GPT-4, on real-world celebrity data further confirm this failure. When asked about parent-child relationships, such as identifying a celebrity's mother, GPT-4 performed well. However, when the question was reversed to ask for the child of that mother, performance dropped significantly. GPT-4 correctly answered the initial relationship questions 79% of the time, while the reverse questions were answered correctly only 33% of the time, demonstrating a clear asymmetry in the model's learned associations.

Researchers identified a failure in large language models known as the Reversal Curse, where models trained on a statement of the form "A is B" fail to generalize to the reverse "B is A." This phenomenon prevents models from automatically inferring a bidirectional relationship even when the training data contains strong patterns indicating that such relationships are usually symmetric. For example, a model trained on the statement "Valentina Tereshkova was the first woman to travel to space" fails to correctly answer the query, "Who was the first woman to travel to space?"

The study provides empirical evidence for this limitation by fine-tuning GPT-3 and Llama-1 models on fictitious statements like "Uriah Hawthorne is the composer of Abyssal Melodies." These models consistently failed to identify the composer when prompted with the reverse query. The Reversal Curse remains robust across various model sizes and families, and standard data augmentation techniques do not alleviate the issue. Models only successfully infer the reverse relationship if it is provided explicitly within the context window.

Evaluations of ChatGPT, specifically GPT-3.5 and GPT-4, on real-world celebrity data further confirm this failure. When asked about parent-child relationships, such as identifying a celebrity's mother, GPT-4 performed well. However, when the question was reversed to ask for the child of that mother, performance dropped significantly. GPT-4 correctly answered the initial relationship questions 79% of the time, while the reverse questions were answered correctly only 33% of the time, demonstrating a clear asymmetry in the model's learned associations.