37 of 66 in this stack · #282 / 468
LongMemEval
The LongMemEval dataset consists of 500 human-curated question-answer pairs, with answers embedded within a scalable set of user-assistant chat histories. The dataset is designed to test beyond simple fact recall with many tasks requiring complex temporal reasoning.