Scoring AI models: Endor Labs unveils evaluation tool

Video game leaderboard illustrating Endor Labs' new tool for evaluating and scoring AI models.

.pp-multiple-authors-boxes-wrapper {display:none;}
img {width:100%;}

Endor Labs has begun scoring AI models based on their security, popularity, quality, and activity.

Dubbed ‘Endor Scores for AI Models,’ this unique capability aims to simplify the process of identifying the most secure open-source AI models currently available on Hugging Face – a platform for sharing Large Language Models (LLMs), machine learning models, and other open-source AI models and datasets – by providing straightforward scores.

The announcement comes as developers increasingly turn to platforms like Hugging Face for ready-made AI models, mirroring the early days of readily-available open-source software (OSS). This new release improves AI governance by enabling developers to “start clean” with AI models, a goal that has so far proved elusive.

Varun Badhwar, Co-Founder and CEO of Endor Labs, said: “It’s always been our mission to secure everything your code depends on, and AI models are the next great frontier in that critical task.

“Every organisation is experimenting with AI models, whether to power particular applications or build entire AI-based businesses. Security has to keep pace, and there’s a rare opportunity here to start clean and avoid risks and high maintenance costs down the road.”

George Apostolopoulos, Founding Engineer at Endor Labs, added: “Everybody is experimenting with AI models right now. Some teams are building brand new AI-based businesses while others are looking for ways to slap a ‘powered by AI’ sticker on their product. One thing is for sure, your developers are playing with AI models.”

However, this convenience does not come without risks. Apostolopoulos warns that the current landscape resembles “the wild west,” with people grabbing models that fit their needs without considering potential vulnerabilities.

Endor Labs’ approach treats AI models as dependencies within the software supply chain

“Our mission at Endor Labs is to ‘secure everything your code depends on,’” Apostolopoulos states. This perspective allows organisations to apply similar risk evaluation methodologies to AI models as they do to other open-source components.

Endor’s tool for scoring AI models focuses on several key risk areas:

  • Security vulnerabilities: Pre-trained models can harbour malicious code or vulnerabilities within model weights, potentially leading to security breaches when integrated into an organisation’s environment.
  • Legal and licensing issues: Compliance with licensing terms is crucial, especially considering the complex lineage of AI models and their training sets.
  • Operational risks: The dependency on pre-trained models creates a complex graph that can be challenging to manage and secure.

To combat these issues, Endor Labs’ evaluation tool applies 50 out-of-the-box checks to AI models on Hugging Face. The system generates an “Endor Score” based on factors such as the number of maintainers, corporate sponsorship, release frequency, and known vulnerabilities.

Screenshot of Endor Labs' tool for scoring AI models.

Positive factors in the system for scoring AI models include the use of safe weight formats, the presence of licensing information, and high download and engagement metrics. Negative factors encompass incomplete documentation, lack of performance data, and the use of unsafe weight formats.

A key feature of Endor Scores is its user-friendly approach. Developers don’t need to know specific model names; they can start their search with general questions like “What models can I use to classify sentiments?” or “What are the most popular models from Meta?” The tool then provides clear scores ranking both positive and negative aspects of each model, allowing developers to select the most appropriate options for their needs.

“Your teams are being asked about AI every single day, and they’ll look for the models they can use to accelerate innovation,” Apostolopoulos notes. “Evaluating Open Source AI models with Endor Labs helps you make sure the models you’re using do what you expect them to do, and are safe to use.”

(Photo by Element5 Digital)

See also: China Telecom trains AI model with 1 trillion parameters on domestic chips

Want to learn more about AI and big data from industry leaders? Check out AI & Big Data Expo taking place in Amsterdam, California, and London. The comprehensive event is co-located with other leading events including Intelligent Automation Conference, BlockX, Digital Transformation Week, and Cyber Security & Cloud Expo.

Explore other upcoming enterprise technology events and webinars powered by TechForge here.

Tags: ai, artificial intelligence, endor labs, evaluation, machine learning, model evaluation, models, scores