AI 비교하기AI 사용하기AI 최신정보AI 커뮤니티
Our VisionTermsPrivacyContact

GateMem Benchmarks Shared-Memory AI Agent Governance

GateMem Benchmarks Shared-Memory AI Agent Governance

HuggingFace
Tuesday, June 23, 2026
  • •GateMem evaluates shared-memory AI agents across medical, office, educational, and household domains.
  • •The benchmark includes 91 multi-party episodes and 2,218 hidden evaluation checkpoints for rigorous testing.
  • •Current models fail to balance utility, secure access control, and reliable forgetting of deleted information.
  • •GateMem evaluates shared-memory AI agents across medical, office, educational, and household domains.
  • •The benchmark includes 91 multi-party episodes and 2,218 hidden evaluation checkpoints for rigorous testing.
  • •Current models fail to balance utility, secure access control, and reliable forgetting of deleted information.

Researchers introduced GateMem, a benchmark designed to evaluate how AI agents manage shared memory in multi-user settings. Unlike standard benchmarks that assume a single-user environment, GateMem examines the performance of agents deployed in institutional contexts such as hospitals, workplaces, schools, and households. In these environments, multiple users interact with a common memory pool, necessitating not only information recall but also strict governance regarding access rights and privacy.

The benchmark assesses three core competencies: utility for long-horizon requests, access control based on user authorization, and active forgetting (the ability to securely remove information after explicit deletion requests). It includes 91 long-form multi-party episodes, 2,218 hidden evaluation checkpoints, and covers four domains. Results across seven memory-agent baselines and six backbone models indicate that current systems struggle to balance these requirements. While long-context prompting provides superior governance, it incurs high costs. Conversely, retrieval-based and external-memory methods offer lower costs but remain prone to leaking deleted or unauthorized information.

Researchers introduced GateMem, a benchmark designed to evaluate how AI agents manage shared memory in multi-user settings. Unlike standard benchmarks that assume a single-user environment, GateMem examines the performance of agents deployed in institutional contexts such as hospitals, workplaces, schools, and households. In these environments, multiple users interact with a common memory pool, necessitating not only information recall but also strict governance regarding access rights and privacy.

The benchmark assesses three core competencies: utility for long-horizon requests, access control based on user authorization, and active forgetting (the ability to securely remove information after explicit deletion requests). It includes 91 long-form multi-party episodes, 2,218 hidden evaluation checkpoints, and covers four domains. Results across seven memory-agent baselines and six backbone models indicate that current systems struggle to balance these requirements. While long-context prompting provides superior governance, it incurs high costs. Conversely, retrieval-based and external-memory methods offer lower costs but remain prone to leaking deleted or unauthorized information.

Read original (English)·Jun 23, 2026
#benchmark#memory#agentic ai#access control#privacy