Researchers Call for More Standards and Tests for AI Models

The rapid increase in AI usage has led to more harmful outcomes, including hate speech and copyright infringements, exacerbated by insufficient regulations and testing.
Current research indicates that achieving desired behavior in AI models remains challenging, with limited progress over the past 15 years in understanding these complexities.
Red teaming, involving rigorous testing by external experts, is advocated to better evaluate AI risks, but there is a shortage of personnel in this field.
Project Moonshot seeks to improve AI evaluation through a toolkit that incorporates benchmarking and continuous assessment, with aims for customization in various industries.
Experts emphasize the need for stricter evaluation standards for AI, akin to those in pharmaceuticals, to prevent misuse and ensure safety before models are deployed.