Multimodal

On this page, we’ll dive into the different types of multimodal tasks you can run on PeerAI.

These tasks are all powered by the huggingface's transformers.js library. You can use any of the pre-trained ONNX models from the huggingface models or use your own.

Headers

  • Name
    x-api-group(optional)
    Type
    string, default 'main'
    Description

    The id of the peer-ai compute group you want to run this compute on.

  • Name
    x-api-key
    Type
    string
    Description

    The API key for your PeerAI account.


Image Captioning

The Image Captioning pipeline generates a caption for an image. The pipeline uses a pre-trained model to generate the caption.

Body

  • Name
    task
    Type
    string
    Description

    The task of the pipeline. Use 'image-to-text' for this pipeline.

  • Name
    model(optional)
    Type
    string, default null
    Description

    The name of the pre-trained model to use. If not specified, the default model for the task will be used.

  • Name
    inputs.0
    Type
    string
    Description

    The URL of the image to analyze.

Request

curl -X POST https://api.peer-ai.com/v1/pipeline \
  -H "X-API-Group: {YOUR_COMPUTE_GROUP}" \
  -H "X-API-Key: {YOUR_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"task\": \"image-to-text\", \"inputs\": [\"https://example.com/image.jpg\"]}"

Response

[
  {
    "generated_text": "a brown and white striped zebra laying on a tree stump"
  }
]

Zero-Shot Image Classification

Zero-shot image classification is the process of classifying an image into predefined categories without the need for training on specific labeled data. It allows you to classify images based on a set of target labels, even if those labels were not part of the training data.

Body

  • Name
    task
    Type
    string
    Description

    The task of the pipeline. e.g., 'zero-shot-image-classification', 'text-classification'

  • Name
    model(optional)
    Type
    string, default null
    Description

    The name of the pre-trained model to use. If not specified, the default model for the task will be used.

  • Name
    inputs.0
    Type
    string
    Description

    The URL of the image to classify.

  • Name
    inputs.1
    Type
    array
    Description

    An array of target labels to classify the image into.

Request

curl -X POST https://api.peer-ai.com/v1/pipeline \
  -H "X-API-Group: {YOUR_COMPUTE_GROUP}" \
  -H "X-API-Key: {YOUR_API_KEY}" \
  -H "Content-Type: application/json" \
  -d "{\"task\": \"zero-shot-image-classification\", \"inputs\": [\"https://huggingface.co/datasets/Xenova/transformers.js-docs/resolve/main/tiger.jpg\", [\"tiger\", \"cat\", \"dog\"]]}"

Response

[
  {
    "score": 0.995894193649292,
    "label": "tiger"
  },
  {
    "score": 0.003875702852383256,
    "label": "cat"
  },
  {
    "score": 0.00023012972087599337,
    "label": "dog"
  }
]

Feature Extraction

Transforming raw data into numerical features that can be processed while preserving the information in the original dataset.

Coming Soon


Document Question Answering

Answering questions on document images.

Coming Soon


Visual Question Answering

Answering questions on images.

Coming Soon