FyreToolkit API Reference
Production-grade ML infrastructure APIs. Test everything right here with our interactive playground featuring realistic mock responses and state management.
Quickstart
Get started in 60 seconds. Follow these steps to make your first successful request.
1) Get an API Key
Click Get API Key in the top navigation. For this playground, use the demo key: sk_live_demo_12345
2) Make your first request
List your deployed models to verify authentication:
GET /models
3) Try it in the playground
Click List Models in the sidebar, then hit Send Request. You'll get a realistic response showing available models.
4) Explore capabilities
Use the Quick Test wizard to:
- Run predictions on sentiment models
- Create and monitor batch jobs
- Deploy new models
- View performance metrics
Authentication
The FyreToolkit API uses API keys to authenticate requests. API keys carry privileges, so keep them secure and never expose them in client-side code.
Authentication is performed via HTTP Basic Auth. Provide your API key as the basic auth username. Password should be left empty.
| Header | Type | Description |
|---|---|---|
| Authorization | string | HTTP Basic Auth value. Format: Basic
base64(api_key:) Example: Basic c2tfbGl2ZV9kZW1vXzEyMzQ1Og== |
- Never commit API keys to version control
- Use environment variables for key storage
- Rotate keys immediately if exposed
- Use separate keys for dev/staging/production
Rate Limits
Rate limits ensure platform stability and fair usage. Limits vary by endpoint type and subscription plan.
| Category | Default Limit | Applies To |
|---|---|---|
| Read operations | 120 requests/min | GET endpoints (models, metrics, job status) |
| Write operations | 60 requests/min | POST/DELETE (deployments, batch jobs) |
| Real-time inference | Plan-dependent | POST /predict/* endpoints |
Rate Limit Headers
Response headers provide limit information:
| Header | Description |
|---|---|
| X-RateLimit-Limit | Maximum requests per time window |
| X-RateLimit-Remaining | Requests remaining in current window |
| X-RateLimit-Reset | Unix timestamp when limit resets |
Pagination
List endpoints return paginated results using cursor-based pagination for consistency and performance.
Request Parameters
| Parameter | Type | Description |
|---|---|---|
| limit | integer | Max records to return (default: 20, max: 100) |
| starting_after | string | Cursor ID. Returns items after this ID |
Response Format
| Field | Type | Description |
|---|---|---|
| object | string | Always "list" |
| data | array | Array of resource objects |
| has_more | boolean | Whether more records exist |
Errors
FyreToolkit uses conventional HTTP response codes and provides detailed error objects for debugging.
HTTP Status Codes
| Code | Meaning | Description |
|---|---|---|
| 200 | OK | Request succeeded |
| 201 | Created | Resource created successfully |
| 202 | Accepted | Request accepted for async processing |
| 400 | Bad Request | Invalid request (missing params, malformed JSON) |
| 401 | Unauthorized | Invalid or missing API key |
| 404 | Not Found | Resource doesn't exist |
| 409 | Conflict | Request conflicts with current state |
| 422 | Unprocessable | Valid syntax but semantic errors |
| 429 | Too Many Requests | Rate limit exceeded |
| 500 | Internal Server Error | Server error (contact support with request_id) |
Error Response Schema
| Field | Type | Description |
|---|---|---|
| error.code | string | Machine-readable error code (e.g., "model_not_found") |
| error.message | string | Human-readable error description |
| error.request_id | string | Unique request identifier for support |
| error.details | object | Optional additional context (field errors, etc.) |
Real-time Prediction
Generate low-latency predictions from deployed models. Optimized for real-time inference with sub-100ms response times for most models.
Request
POST /predict/{model_id}
Path Parameters
| Name | Type | Description |
|---|---|---|
| model_id | string | Unique identifier of the deployed model |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| inputs | array | Yes | Input data (text, JSON, tensors). Format depends on model. |
| metadata | object | No | Optional metadata (correlation_id, user_id, etc.) |
Response
| Field | Type | Description |
|---|---|---|
| id | string | Unique prediction identifier |
| object | string | Always "prediction" |
| created | integer | Unix timestamp (seconds) |
| model | string | Model ID used for prediction |
| results | array | Model outputs (one per input) |
| usage | object | Latency and compute usage metadata |
Create Batch Job
Process large datasets asynchronously. Ideal for non-time-sensitive workloads requiring high throughput.
Request
POST /batch/{model_id}
Path Parameters
| Name | Type | Description |
|---|---|---|
| model_id | string | Model to use for batch processing |
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| input_url | string | Yes | URL to CSV or JSONL file containing inputs |
| webhook | string | No | Callback URL for completion notification |
| idempotency_key | string | No | Prevents duplicate job creation on retries |
Response
| Field | Type | Description |
|---|---|---|
| id | string | Job identifier |
| status | string | "queued" initially |
| created | integer | Unix timestamp |
Get Batch Job
Retrieve status and results for a batch job. Poll this endpoint to monitor progress.
Request
GET /jobs/{job_id}
Response
| Field | Type | Description |
|---|---|---|
| id | string | Job identifier |
| status | string | queued | processing | completed | failed | canceled |
| progress | object | Processed/total record counts |
| output_url | string | Results URL (when status=completed) |
Cancel Batch Job
Cancel a queued or processing batch job. Completed jobs cannot be canceled.
Request
POST /jobs/{job_id}/cancel
List Models
Retrieve all models available in your organization. Supports pagination.
Request
GET /models
Query Parameters
| Parameter | Type | Description |
|---|---|---|
| limit | integer | Max models to return (default: 20) |
| starting_after | string | Cursor for pagination |
Response
| Field | Type | Description |
|---|---|---|
| object | string | Always "list" |
| data | array | Array of model objects |
| has_more | boolean | Whether more models exist |
Get Model
Retrieve detailed information about a specific model.
Request
GET /models/{model_id}
Response
| Field | Type | Description |
|---|---|---|
| id | string | Model identifier |
| status | string | ready | training | archived |
| created | integer | Unix timestamp |
| description | string | Model description |
Create Deployment
Deploy a model to production infrastructure. Provisions resources and creates endpoints.
Request
POST /deployments
Request Body
| Field | Type | Required | Description |
|---|---|---|---|
| model_uri | string | Yes | Model artifact URI (e.g., s3://...) |
| region | string | Yes | Target region (us-east-1, eu-west-1, etc.) |
| replicas | integer | No | Initial replica count (default: 1) |
| idempotency_key | string | No | Prevents duplicate deployments |
List Deployments
Retrieve all active deployments in your organization.
Request
GET /deployments
Delete Deployment
Remove a deployment and stop serving traffic. Model artifacts are preserved.
Request
DELETE /deployments/{deployment_id}
Get Metrics
Retrieve performance metrics for a deployment (latency, throughput, errors).
Request
GET /metrics/{deployment_id}
Response
| Field | Type | Description |
|---|---|---|
| deployment_id | string | Deployment identifier |
| period | string | Time window (e.g., "1h", "24h") |
| metrics | object | Performance metrics (requests, latency, errors) |