PeerLLM Developer Guide

OpenAI-compatible REST API — use any OpenAI SDK or HTTP client

← Back to PeerLLM

Overview

PeerLLM exposes an OpenAI-compatible REST API so you can use any OpenAI SDK or HTTP client to interact with the PeerLLM network. Requests are routed to distributed community hosts running open-weight models, or to LLooMA1.0 — PeerLLM's network-native orchestration model that splits complex tasks across multiple hosts in parallel.

You can access PeerLLM in three ways:


Remote API

The Remote API connects you to the public PeerLLM cloud. This is the recommended way to use PeerLLM for production applications.

Base URL

https://api.peerllm.com

All endpoints are prefixed with /v1.

Authentication

Every request must include a valid API key in the Authorization header:

Authorization: Bearer <YOUR_API_KEY>

Generating an API Key

  1. Go to the Hosts Portal at hosts.peerllm.com.
  2. Sign in or create an account.
  3. Navigate to the API Management section.
  4. Click Generate New Key.
  5. Copy and securely store your key — it will not be shown again.
Important: Treat your API key like a password. Do not commit it to source control or share it publicly.

Tokens & Billing

PeerLLM uses a token balance system. You must have a positive token balance before you can make API calls.

How to Get Tokens

MethodSteps
Purchase Go to hosts.peerllm.comAPI Management → click Purchase Tokens and select a package and complete payment.
Redeem a Token Code Go to hosts.peerllm.comAPI ManagementRedeem Code → enter your code.

If your balance reaches zero, all /v1/chat/completions requests will be rejected with a 402 Payment Required error until you add more tokens.

Endpoints

GET /v1/models

Returns the list of all approved models currently available on the PeerLLM network.

Note: This endpoint does not require authentication.

Request

GET /v1/models HTTP/1.1
Host: api.peerllm.com

Response 200 OK

{
  "object": "list",
  "data": [
    {
      "id": "LLooMA1.0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "orchestration",
        "repo": null,
        "file": null,
        "qtype": null,
        "size": null,
        "ram": null,
        "gpu": null,
        "checksum": null,
        "description": "LLooMA Orchestration Model - Centralized AI orchestration with distributed execution across PeerLLM hosts. Automatically splits complex tasks into parallel subtasks for optimal performance."
      }
    },
    {
      "id": "mistral-7b-instruct-v0.2.Q8_0",
      "object": "model",
      "created": 1776572039,
      "owned_by": "peerllm",
      "metadata": {
        "source": "huggingface",
        "repo": "TheBloke/Mistral-7B-Instruct-v0.2-GGUF",
        "file": "mistral-7b-instruct-v0.2.Q8_0.gguf",
        "qtype": "Q8_0",
        "size": 7241732096,
        "ram": "16GB+",
        "gpu": "RTX 4060 or higher",
        "checksum": "sha256:3a6fbf4a41a1d52e415a4958cde6856d34b2db93",
        "description": "Quantized Mistral 7B Instruct v0.2 model hosted by TheBloke. Improved reasoning, context depth (32K tokens), and conversational performance."
      }
    }
  ]
}
Model Metadata Fields
FieldTypeDescription
idstringThe model identifier — use this value in the model field of chat completions.
objectstringAlways "model".
createdintegerUnix timestamp.
owned_bystring "owner of model".
metadata.sourcestringOrigin of the model ("huggingface", "orchestration", etc.).
metadata.repostring?Source repository (e.g., HuggingFace repo).
metadata.filestring?Model filename.
metadata.qtypestring?Quantization type (e.g., Q8_0, Q4_K_M).
metadata.sizeinteger?Model file size in bytes.
metadata.ramstring?Recommended RAM.
metadata.gpustring?GPU requirement.
metadata.checksumstring?File checksum for integrity verification.
metadata.descriptionstring?Human-readable model description.

POST /v1/chat/completions

Send a chat completion request. Supports both streaming (stream: true, the default) and non-streaming (stream: false) modes.

Request
POST /v1/chat/completions HTTP/1.1
Host: api.peerllm.com
Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json

{
  "model": "LLooMA1.0",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Explain quantum computing in simple terms." }
  ],
  "stream": true,
  "temperature": 0.7,
  "max_tokens": 1024
}
Request Body
FieldTypeRequiredDefaultDescription
modelstringYesModel ID from /v1/models. Use "LLooMA1.0" for orchestrated multi-host responses.
messagesarrayYesArray of message objects with role ("system", "user", "assistant") and content.
streambooleanNotrueWhether to stream the response via SSE.
temperaturenumberNonullSampling temperature (0.0–2.0).
max_tokensnumberNonullMaximum tokens to generate.
Non-Streaming Response 200 OK
{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1750000000,
  "model": "LLooMA1.0",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing uses qubits..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 25,
    "completion_tokens": 150,
    "total_tokens": 175
  }
}
Streaming Response 200 OK

The response is sent as Server-Sent Events (text/event-stream). Each event contains a JSON chunk:

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"role":"assistant","content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","created":1750000000,"model":"LLooMA1.0","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

...

data: [DONE]
About LLooMA1.0

LLooMA1.0 is a network-native orchestration model. It does not run on any single host. Instead, it:

  1. Analyzes your prompt and determines if it can be split into parallel subtasks.
  2. If splittable, it distributes subtasks across multiple PeerLLM hosts and synthesizes the results.
  3. If not splittable, it races two PeerLLM hosts and uses the fastest response. Strict latency enforcement (1 s first-token deadline, 1 s inter-token timeout, 5 s total cap) ensures quality — if hosts are slow, a centralized AI fallback kicks in automatically.

For all other model IDs, the request is sent directly to a host running that specific model.

Error Reference

All errors follow a consistent JSON structure:

{
  "error": "Error message string"
}

Or in some cases the OpenAI-style structured format:

{
  "error": {
    "message": "Detailed error message",
    "type": "error_type",
    "code": "error_code"
  }
}
HTTP StatusErrorCauseResolution
400 "Missing model name." The model field is empty or missing. Provide a valid model ID.
400 "Unknown model '{model}'." The requested model is not in the PeerLLM catalog. Does not apply to LLooMA1.0. Call GET /v1/models to see available models.
400 "Missing messages." The messages array is null or empty. Provide at least one message with role "user".
401 "Invalid or expired API key." The Authorization header is missing or the key is invalid/expired. Generate a new API key at hosts.peerllm.com.
402 "Insufficient token balance." Your account's token balance is zero or negative. Purchase tokens or redeem a code at hosts.peerllm.com.
404 "No hosts available." No PeerLLM hosts are online for the requested model. Try again later, or use LLooMA1.0 which has centralized AI fallback.
500 "Failed to process request with centralized AI." The centralized AI fallback encountered an internal error. Retry the request. If persistent, check PeerLLM status.

Quick Start

Using curl

# List available models
curl https://api.peerllm.com/v1/models

# Chat completion (streaming)
curl https://api.peerllm.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello!"}],
    "stream": true
  }'

Using an OpenAI SDK

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.peerllm.com/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "What is PeerLLM?"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.peerllm.com/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "What is PeerLLM?" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
using OpenAI;
using OpenAI.Chat;

var client = new ChatClient(
    model: "LLooMA1.0",
    credential: new ApiKeyCredential("YOUR_API_KEY"),
    options: new OpenAIClientOptions
    {
        Endpoint = new Uri("https://api.peerllm.com/v1")
    });

var stream = client.CompleteChatStreamingAsync(
    new List<ChatMessage>
    {
        new UserChatMessage("What is PeerLLM?")
    });

await foreach (var update in stream)
{
    foreach (var part in update.ContentUpdate)
    {
        Console.Write(part.Text);
    }
}
using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("https://api.peerllm.com/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("What is PeerLLM?");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}

Network API

The Network API allows you to connect to a PeerLLM host running on your local network. This is useful for:

Enable Server Capabilities

Network Server Mode
  1. Open the PeerLLM application on the host machine.
  2. Navigate to Server.
  3. Click Start Server
  4. Note the port number (default is usually 3000 or similar).

Generate API Key

  1. In the PeerLLM application, go to SettingsAPI Management.
  2. Click Generate New Key.
  3. Copy and securely store your key — you'll need it to authenticate requests.
Security Note: Make sure your network is secure and trusted. Exposing the API server to untrusted networks can pose security risks.

Find Your IP Address

To connect to the host from another device on your network, you need the host's local IP address.

On macOS/Linux

ifconfig | grep "inet " | grep -v 127.0.0.1

Look for an address like 192.168.1.x or 10.0.0.x.

On Windows

ipconfig

Look for the IPv4 Address under your active network adapter (e.g., 192.168.1.x).

Usage

Once you have the IP address and port, construct the base URL:

http://<HOST_IP>:<PORT>

For example:

http://192.168.1.100:3000

Then use it with any OpenAI SDK or HTTP client:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://192.168.1.100:3000/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "Hello from my network!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "http://192.168.1.100:3000/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "Hello from my network!" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("http://192.168.1.100:3000/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("Hello from my network!");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}
curl http://192.168.1.100:3000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello from my network!"}],
    "stream": true
  }'

Local Development

The Local Development setup allows you to connect to PeerLLM running on the same machine. This is ideal for:

The setup is identical to the Network API section above, but instead of using your machine's IP address, you'll use localhost or 0.0.0.0 with the configured port number.

Usage

Use localhost with the port number:

http://localhost:<PORT>

For example:

http://localhost:3000
Note: Some machines may require http://0.0.0.0:<PORT> instead of localhost.

Then use it with any OpenAI SDK or HTTP client:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="http://localhost:3000/v1"
)

response = client.chat.completions.create(
    model="LLooMA1.0",
    messages=[
        {"role": "user", "content": "Hello locally!"}
    ],
    stream=True
)

for chunk in response:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
import OpenAI from "openai";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "http://localhost:3000/v1",
});

const stream = await client.chat.completions.create({
  model: "LLooMA1.0",
  messages: [{ role: "user", content: "Hello locally!" }],
  stream: true,
});

for await (const chunk of stream) {
  process.stdout.write(chunk.choices[0]?.delta?.content || "");
}
using System.Text;
using System.Threading.Tasks;
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

namespace PeerLLM.Demos.SemanticKernel
{
    internal class Program
    {
        static async Task Main(string[] args)
        {
            var builder = Kernel.CreateBuilder();

            builder.AddOpenAIChatCompletion(
                modelId: "LLooMA1.0",
                apiKey: "YOUR_API_KEY",
                endpoint: new Uri("http://localhost:3000/v1"));

            var kernel = builder.Build();
            var chat = kernel.GetRequiredService<IChatCompletionService>();
            var chatHistory = new ChatHistory();
            var fullResponse = new StringBuilder();

            chatHistory.AddUserMessage("Hello locally!");

            await foreach (var message in chat.GetStreamingChatMessageContentsAsync(
                chatHistory,
                kernel: kernel))
            {
                Console.Write(message.ToString());
                fullResponse.Append(message.ToString());
            }
        }
    }
}
curl http://localhost:3000/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "LLooMA1.0",
    "messages": [{"role": "user", "content": "Hello locally!"}],
    "stream": true
  }'

Support

For questions, issues, or feedback, visit the Hosts Portal at hosts.peerllm.com.