Together AI
AI Acceleration Cloud for fast inference, fine-tuning, and training.
Please wait while we load the page
Together AI is an AI Acceleration Cloud providing an end-to-end platform for the full generative AI lifecycle. It offers fast inference, fine-tuning, and training capabilities for generative AI models using easy-to-use APIs and highly scalable infrastructure. Users can run and fine-tune open-source models, train and deploy models at scale on their AI Acceleration Cloud and scalable GPU clusters, and optimize performance and cost. The platform supports over 200 generative AI models across various modalities like chat, images, code, and more, with OpenAI-compatible APIs.
Users can interact with Together AI through easy-to-use APIs for serverless inference or deploy models on custom hardware via dedicated endpoints. Fine-tuning is available through simple commands or by controlling hyperparameters via API. GPU clusters can be requested for large-scale training. The platform also offers a web UI, API, or CLI to start or stop endpoints and manage services. Code execution environments are available for building and running AI development tasks.
Choose this if you want a powerful, all-in-one AI acceleration platform that handles everything from training to fine-tuning and inference with ease. It’s perfect for folks who need scalable GPU clusters and support for tons of generative AI models, plus it’s got OpenAI-compatible APIs so you can plug in smoothly. The serverless inference and dedicated endpoints make deploying custom hardware setups a breeze.
Prices are per 1 million tokens (input and output for Chat, Multimodal, Language, Code; input only for Embedding; image size/steps for Image models). Batch inference is available at an introductory 50% discount. Specific model prices range from $0.06 to $7.00 per 1M tokens depending on model size and type.
Deploy models on customizable GPU endpoints with per-minute billing. Supports various NVIDIA GPUs like RTX-6000, L40, A100, H100, H200. Prices range from $0.025/minute ($1.49/hour) for RTX-6000/L40 to $0.083/minute ($4.99/hour) for H200.
Pricing is based on model size, dataset size, and number of epochs. Supervised Fine-tuning (LoRA) ranges from $0.48 to $2.90 per 1M tokens. Full Fine-tuning ranges from $0.54 to $3.20 per 1M tokens. DPO (LoRA) ranges from $1.20 to $7.25 per 1M tokens. DPO (Full FT) ranges from $1.35 to $8.00 per 1M tokens.
State-of-the-art clusters with NVIDIA Blackwell and Hopper GPUs (H200, H100, A100) for optimal AI training and inference. H200 starts at $2.09/hr, H100 at $1.75/hr, A100 at $1.30/hr. GB200 and B200 pricing requires contact.
Together Code Sandbox is priced per vCPU ($0.0446/hour) and per GiB RAM ($0.0149/hour). Together Code Interpreter is priced per session ($0.03 for 60 minutes).