Meta strengthens the security of artificial intelligence (AI)

Meta has announced a series of new security tools for its artificial intelligence models, with the aim of making the development and use of AI safer, both for creators and defenders in the field of cybersecurity.

The news particularly concerns the Llama model family, which now has updated and more sophisticated resources to tackle emerging threats.

Llama Guard 4: multimodal security for text and images in the new AI program by Meta

One of the main updates is represented by Llama Guard 4, the evolution of Meta’s customizable security filter.

The great innovation of this version is its multimodal capability, meaning the ability to analyze and apply security rules not only to text but also to images. This step is crucial, considering that AI applications are becoming increasingly visual.

Llama Guard 4 is already integrated into the new API Llama by Meta, currently available in limited preview. Developers can access this tool through the official Llama protections page, or via the Hugging Face and GitHub platforms.

Another significant innovation is LlamaFirewall, a system designed to serve as the command center of security in artificial intelligence systems. This tool allows for the coordination of different protection models and integrates with other security tools from Meta.

LlamaFirewall is designed to counter sophisticated threats such as prompt injection, the generation of potentially dangerous code, and risky behaviors of AI plug-ins.

In essence, it represents a bulwark against the most insidious attacks that can compromise the integrity of systems based on artificial intelligence.

Meta has also updated its system for detecting jailbreak attempts and prompt injection with the new Prompt Guard 2 (86M). This model has been designed to more accurately identify attempts to manipulate the AI.

Alongside this, Prompt Guard 2 22M has been introduced, a lighter and faster version. With a reduction in latency and computing costs of up to 75%, this version is ideal for those working with limited resources but who do not want to compromise on security.

“`html

CyberSec Eval 4: new benchmarks for AI security

“`

Meta has not only provided tools for developers, but has also updated its CyberSec Eval 4 benchmark suite, designed to evaluate the capabilities of AI systems in the field of cybersecurity.

This open source suite helps organizations measure the effectiveness of artificial intelligence in detecting and responding to digital threats.

Two new tools enrich this suite:

– CyberSOC Evaluation: developed in collaboration with CrowdStrike, this framework evaluates the performance of AI in a real Security Operation Center (SOC) context, offering a concrete view of the operational effectiveness of artificial intelligence.
– AutoPatchBench: a benchmark that tests the ability of AI models, including those from the Llama family, to automatically identify and correct vulnerabilities in code before they are exploited by malicious actors.

To facilitate the adoption of these tools, Meta has launched the Llama Defenders program, which offers privileged access to a selection of AI solutions – some open source, others in preview or proprietary – designed to tackle specific challenges in the field of security.

Among the shared tools is also the automatic classifier of sensitive documents, used internally by Meta.

This system applies security labels to documents within an organization, preventing confidential information from being accidentally entered into AI systems where it could be exposed.

Meta has also addressed the growing problem of fake audio generated by artificial intelligence, increasingly used in scams and phishing attempts. Two new tools have been made available to partners:

– Llama Generated Audio Detector
– Llama Audio Watermark Detector

These tools help to identify synthetic voices in suspicious calls. Companies like ZenDesk, Bell Canada, and AT&T are already evaluating the integration of these technologies into their security systems.

Private Processing: Useful AI Without Compromising Privacy

Finally, Meta provided a preview of a technology under development for WhatsApp, called private processing.

The goal is to enable artificial intelligence to provide useful features – such as summarizing unread messages or suggesting replies – without either Meta or WhatsApp being able to access the content of the messages.

This technology represents an important step towards a privacy-respecting AI. Meta is adopting a transparent approach, publishing its own threat model and inviting the research community to test its robustness before the official release.

With this series of announcements, Meta demonstrates a concrete commitment to strengthening the security of artificial intelligence, both from the development and defense perspectives.

The objective is twofold. Namely, to protect end users and provide developers and security professionals with advanced tools to tackle the ever-evolving digital threats.

In a rapidly changing technological landscape, where AI plays an increasingly central role, initiatives like these are essential to ensure a safer, more transparent, and responsible digital future.

Source link