Together AI
AI Acceleration Cloud for fast inference, fine-tuning, and training.
Why Choose Together AI?
Choose this if you want a powerful, all-in-one AI acceleration platform that handles everything from training to fine-tuning and inference with ease. It’s perfect for folks who need scalable GPU clusters and support for tons of generative AI models, plus it’s got OpenAI-compatible APIs so you can plug in smoothly. The serverless inference and dedicated endpoints make deploying custom hardware setups a breeze.
AI Acceleration Cloud for fast inference, fine-tuning, and training.
Together AI Introduction
What is Together AI?
Together AI is an AI Acceleration Cloud providing an end-to-end platform for the full generative AI lifecycle. It offers fast inference, fine-tuning, and training capabilities for generative AI models using easy-to-use APIs and highly scalable infrastructure. Users can run and fine-tune open-source models, train and deploy models at scale on their AI Acceleration Cloud and scalable GPU clusters, and optimize performance and cost. The platform supports over 200 generative AI models across various modalities like chat, images, code, and more, with OpenAI-compatible APIs.
How to use Together AI?
Users can interact with Together AI through easy-to-use APIs for serverless inference or deploy models on custom hardware via dedicated endpoints. Fine-tuning is available through simple commands or by controlling hyperparameters via API. GPU clusters can be requested for large-scale training. The platform also offers a web UI, API, or CLI to start or stop endpoints and manage services. Code execution environments are available for building and running AI development tasks.
Why Choose Together AI?
Choose this if you want a powerful, all-in-one AI acceleration platform that handles everything from training to fine-tuning and inference with ease. It’s perfect for folks who need scalable GPU clusters and support for tons of generative AI models, plus it’s got OpenAI-compatible APIs so you can plug in smoothly. The serverless inference and dedicated endpoints make deploying custom hardware setups a breeze.
Together AI Features
AI API
- ✓Serverless Inference API for open-source models
- ✓Dedicated Endpoints for custom hardware deployment
- ✓Fine-Tuning (LoRA and full fine-tuning)
- ✓Together Chat app for open-source AI
- ✓Code Sandbox for AI development environments
- ✓Code Interpreter for executing LLM-generated code
- ✓GPU Clusters (Instant and Reserved) with NVIDIA GPUs (GB200, B200, H200, H100, A100)
- ✓Extensive Model Library (200+ generative AI models)
- ✓OpenAI-compatible APIs
- ✓Accelerated Software Stack (e.g., FlashAttention-3, custom CUDA kernels)
- ✓High-Speed Interconnects (InfiniBand, NVLink)
- ✓Robust Management Tools (Slurm, Kubernetes)
FAQ?
Pricing
Serverless Inference
Prices are per 1 million tokens (input and output for Chat, Multimodal, Language, Code; input only for Embedding; image size/steps for Image models). Batch inference is available at an introductory 50% discount. Specific model prices range from $0.06 to $7.00 per 1M tokens depending on model size and type.
Dedicated Endpoints
Deploy models on customizable GPU endpoints with per-minute billing. Supports various NVIDIA GPUs like RTX-6000, L40, A100, H100, H200. Prices range from $0.025/minute ($1.49/hour) for RTX-6000/L40 to $0.083/minute ($4.99/hour) for H200.