Image Annotation API Reference¶

API documentation for the Image Annotation service.

Overview¶

The Image Annotation API provides endpoints for annotating images using Vision-Language Models (VLMs).

Work in Progress

This API is under active development. See the GitHub repository for the latest updates.

REST API Endpoints¶

POST `/api/annotate`¶

Annotate an image using a VLM.

Request:

{
  "image_path": "path/to/image.png",
  "model": "gpt-4-vision",
  "prompt": "Describe this image for HED annotation"
}

Response:

{
  "description": "A person riding a bicycle...",
  "objects": ["person", "bicycle", "street"],
  "hed_annotation": "Sensory-event, Visual-presentation, ..."
}

GET `/api/annotations/{image_id}`¶

Retrieve stored annotations for an image.

GET `/api/models`¶

List available VLM models.

Services¶

VLM Service¶

The VLM service handles communication with vision-language models:

Ollama: Local models via Ollama
OpenAI: GPT-4 Vision
Anthropic: Claude Vision

Annotation Storage¶

Annotations are stored in JSON format in the annotations/ directory.

Configuration¶

Environment variables:

Variable	Description	Default
`OPENAI_API_KEY`	OpenAI API key	-
`OLLAMA_BASE_URL`	Ollama server URL	`http://localhost:11434`
`ANNOTATION_DIR`	Output directory	`./annotations`

Python API¶

from image_annotation.services import VLMService

# Initialize service
service = VLMService(model="gpt-4-vision")

# Annotate image
result = await service.annotate("path/to/image.png")
print(result.description)
print(result.hed_annotation)

Full API Reference

Detailed Python API documentation will be auto-generated once the package structure is finalized. See the source code for current implementation.