Multimodal

On this page, we’ll dive into the different types of multimodal tasks you can run on PeerAI.

These tasks are all powered by the huggingface's transformers.js library. You can use any of the pre-trained ONNX models from the huggingface models or use your own.

Headers

Name
x-api-group(optional)
Type
string, default 'main'
Description
The id of the peer-ai compute group you want to run this compute on.
Name
x-api-key
Type
string
Description
The API key for your PeerAI account.

Image Captioning

The Image Captioning pipeline generates a caption for an image. The pipeline uses a pre-trained model to generate the caption.

Body

Name
task
Type
string
Description
The task of the pipeline. Use 'image-to-text' for this pipeline.
Name
model(optional)
Type
string, default null
Description
The name of the pre-trained model to use. If not specified, the default model for the task will be used.
Name
inputs.0
Type
string
Description
The URL of the image to analyze.

Request

curl -X POST https://api.peer-ai.com/v1/pipeline \
  -H "X-API-Group: {YOUR_COMPUTE_GROUP}" \
  -H "X-API-Key: {YOUR_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"task\": \"image-to-text\", \"inputs\": [\"https://example.com/image.jpg\"]}"

Response

[
  {
    "generated_text": "a brown and white striped zebra laying on a tree stump"
  }
]

Zero-Shot Image Classification

Zero-shot image classification is the process of classifying an image into predefined categories without the need for training on specific labeled data. It allows you to classify images based on a set of target labels, even if those labels were not part of the training data.

Body

Name
task
Type
string
Description
The task of the pipeline. e.g., 'zero-shot-image-classification', 'text-classification'
Name
model(optional)
Type
string, default null
Description
The name of the pre-trained model to use. If not specified, the default model for the task will be used.
Name
inputs.0
Type
string
Description
The URL of the image to classify.
Name
inputs.1
Type
array
Description
An array of target labels to classify the image into.

Request

curl -X POST https://api.peer-ai.com/v1/pipeline \
  -H "X-API-Group: {YOUR_COMPUTE_GROUP}" \
  -H "X-API-Key: {YOUR_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"task\": \"zero-shot-image-classification\", \"inputs\": [\"https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg\", [\"tiger\", \"cat\", \"dog\"]]}"

Response

[
  {
    "score": 0.995894193649292,
    "label": "tiger"
  },
  {
    "score": 0.003875702852383256,
    "label": "cat"
  },
  {
    "score": 0.00023012972087599337,
    "label": "dog"
  }
]

Feature Extraction

Transforming raw data into numerical features that can be processed while preserving the information in the original dataset.

Coming Soon

Document Question Answering

Answering questions on document images.

Coming Soon

Visual Question Answering

Answering questions on images.

Coming Soon