Giving Superpowers to Small Language Models with Model Context Protocol (MCP)

2025-02-26 in Artificial Intelligence tagged LLM / AI Agent / SLM / Model Context Protocol (MCP) / Kubernetes / OpenShift by Marc Nuri | Last updated: 2025-03-01

Versión en Español

Introduction

Large Language Models (LLMs) have revolutionized how we interact with artificial intelligence (AI), but their intelligence comes at a cost. The best models handle complex operations seamlessly because they have deeply embedded knowledge about programming, APIs, and structured data formats. However, this also makes them expensive to run since they require significant computational resources, which translates into high operational costs.

What if we could give smaller, more efficient LLMs the same superpowers without requiring them to know everything upfront? Enter Model Context Protocol (MCP) servers, a game-changing approach that equips small language models (SLMs) with "superpowers" by abstracting away complexity.

The Challenge: Knowledge vs. Efficiency

When an LLM is asked to perform a task, it needs to understand the syntax, structure, and nuances of that task.

For example:

Creating a Spotify playlist requires knowledge of the Spotify API, the playlist creation process, and crafting the appropriate HTTP API request.
Downloading a file from Google Drive requires knowledge of the Google Drive API and the file download process.
Writing an SQL query requires knowledge of the database schema and the SQL query syntax.
Managing Kubernetes resources requires knowledge of the Kubernetes API and the resource configuration format (YAML).

Larger LLMs can handle many of these tasks because they've been trained on extensive datasets with detailed technical knowledge. However, this comes at the expense of high computational requirements, making them impractical for many use cases.

Smaller models, on the other hand, struggle because they lack the necessary embedded knowledge. They might not know the intricacies of crafting an HTTP request, specific product APIs, writing an optimized SQL query, or configuring a Kubernetes deployment. But what if we could bridge that gap?

MCP Servers: The Missing Piece

MCP servers act as middleware between the LLM and the real world. Instead of requiring the model to generate precise low-level instructions, MCP servers can provide tools that abstract complex actions.

These tools can operate at different levels of abstraction:

1. Low-level Tools

Provide basic building blocks for common tasks, such as sending an HTTP request or performing an SQL query. These require the model to have detailed knowledge about a task.

For example, an MCP server might provide a tool to perform a SELECT query on a database, but the model must know the SQL syntax and the database schema to use it correctly.

These tools are better suited for LLMs with some domain-specific knowledge.

2. High-level Tools

Abstract away the complexity of a task, allowing the model to focus on the intent rather than the implementation.

For example, an MCP server might provide a tool to fetch a user's profile information from a database without requiring the model to know the SQL syntax or the database schema.

These tools are better suited for SLMs.

By offloading complexity to MCP servers, even smaller language models can achieve expert-level performance on tasks they were never explicitly trained for.

A Real-World Example: Deploying to Kubernetes

Let's consider a common scenario: deploying a containerized application to a Kubernetes cluster. According to the abstraction levels I defined earlier, there are two ways to approach this task with an MCP server:

1. Low-Level Approach

The MCP server can provide a low-level tool or function resources_create_or_update(yaml string) that allows the model to create or update Kubernetes resources by providing its YAML.

Server-wise, the implementation of this tool would be extremely simple:

// pseudo-code
func resources_create_or_update(yaml string) {
  yaml := parseYAML(yaml)
  kubernetes.Apply(yaml)
}

However, the model needs to fully understand the Kubernetes YAML syntax to use this tool effectively. It would need to know how to define a deployment, a service, and an ingress, as well as how to configure them correctly.

2. High-Level Approach

The MCP server provides a higher-level tool or function deploy_application(image string) that abstracts away the complexity of deploying an application to Kubernetes.

The implementation of this tool would be more complex server-wise:

// pseudo-code
func deploy_application(image string) {
  // Fetch image details from a registry (exposed port, environment variables, etc.)
  imageData := fetchImage(image)
  // Create a Kubernetes deployment based on the provided image data
  deployment := createDeployment(imageData)
  // May return a service object in case there is an exposed port
  service := createService(imageData)
  // May return an ingress object in case there is an exposed port
  ingress := createIngress(imageData)
  // Apply the required resources
  applyResources(deployment, service, ingress)
}

In this scenario, a small model with zero knowledge of Kubernetes can still deploy applications effortlessly to a Kubernetes cluster. It's the MCP server that performs the heavy lifting, allowing the model to focus on the high-level task of deploying an application.

Why This Matters

Reduces LLM training costs: Smaller models can achieve expert-level performance without requiring extensive training on domain-specific knowledge by leveraging MCP servers.
Reduces operational costs: Smaller models are more cost-effective to run, making AI automation more accessible to a wider audience.
Makes AI more accessible: Even lightweight models can perform complex tasks, making powerful AI accessible in resource-constrained environments.
Improves efficiency: Instead of generating verbose code or structured data, models can interact with high-level tools, reducing error rates and execution time.

Tip

Considering the pricing (at the time of writing) of OpenAI GPT-4o vs. GPT-4o mini:

Input tokens
- 4o: $2.50/1M tokens
- 4o-mini: $0.15/1M tokens
Output tokens:
- 4o: $10.00/1M tokens
- 4o-mini: $0.60/1M tokens

The cost of running the larger model is almost 17x higher than the smaller model.

Conclusion

MCP servers are transforming how we think about AI capabilities by bridging the gap between small and large language models. Instead of building ever-larger models with increasingly unsustainable costs, we can focus on making existing systems smarter through collaboration between LLMs and intelligent infrastructure like MCP servers.

As the MCP ecosystem evolves, the line between "small" and "large" models will blur. The real power won't come from the LLM itself but from the tools and context we provide. With MCP, even the smallest models can become AI superheroes.

Model Context Protocol (MCP) introduction
What is a Small Language Model (SLM)?
Introducing Goose, the on-machine AI agent
Kubernetes MCP Server: MCP server for Kubernetes that provides high-level and low-level tools for interacting with a Kubernetes or OpenShift cluster.

Recent Posts

Categories

Archives