Overview

Endpoints AI offers Serverless GPU computing tailored for AI inference, training, and general compute tasks, allowing users to pay for their compute usage by the second. This adaptable platform is engineered to dynamically scale, catering to the computational demands of AI workloads ranging from small to large scales.

Usage Methods:

Quick Deploy: Instantly deploy pre-built custom endpoints featuring popular AI models.
Handler Functions: Bring your own functions and execute them in the cloud.

For pre-built endpoints of popular AI models, refer to AI APIs.

Why Choose Endpoints AI Serverless?

Endpoints AI Serverless instances are preferred for the following reasons:

AI Inference: Capable of handling millions of inference requests daily, scalable to billions, making it ideal for machine learning inference tasks. This facilitates scaling machine learning inference while keeping costs minimal.
AI Training: Flexible solution for machine learning training needs, capable of accommodating tasks lasting up to 12 hours. GPUs can be provisioned per request and scaled down upon task completion.
Autoscaling: Dynamically scale workers from 0 to 100 on the Secure Cloud platform, ensuring highly available and globally distributed computational resources precisely when needed.
Container Support: Bring any Docker container to Endpoints AI, supporting both public and private image repositories, allowing users to configure their environment according to their preferences.
3s Cold-Start: Proactively pre-warms workers to reduce cold-start times. Total start time varies based on the runtime, with stable diffusion having a total start time of 3 seconds cold-start plus 5 seconds runtime.
Metrics and Debugging: Provides access to GPU, CPU, Memory, and other metrics for transparency in debugging. Full debugging capabilities for workers are available through logs and SSH, with a web terminal for easier access.
Webhooks: Utilize webhooks to receive data output as soon as a request is completed, pushing data directly to the user's Webhook API for instant access to results.

Endpoints AI Serverless GPUs aren't limited to AI Inference and Training; they're versatile for a range of use cases like rendering, molecular dynamics, or any computational task of your choice.

PreviousManaging Endpoints NextQuick Deploys

Last updated 1 year ago