What types of AI models does Together AI support?

Together AI supports over 200 generative AI models, including Chat, Multimodal, Language, Image, Code, and Embedding models, with a strong focus on open-source options.

What GPU hardware is available on Together AI?

Together AI offers state-of-the-art NVIDIA GPUs, including GB200, B200, H200, H100, A100, L40, and L40S, for both inference and training workloads.

How does Together AI optimize performance and cost?

Together AI optimizes performance and cost through custom transformer-optimized kernels (e.g., FP8 inference kernels, FlashAttention-3), quality-preserving quantization (QTIP), speculative decoding, and competitive pricing models.

Can I fine-tune my own models on Together AI?

Yes, Together AI provides comprehensive fine-tuning capabilities, including LoRA and full fine-tuning, allowing users to train and improve high-quality models with complete model ownership and no vendor lock-in.

Is Together AI suitable for enterprise use?

Yes, Together AI offers secure, reliable AI infrastructure, SOC 2 and HIPAA compliance, dedicated endpoints, and expert AI advisory services, making it suitable for enterprise-scale deployments.

Together AI

Name: Together AI
Brand: Together
Price: Varies by model and token count USD
Availability: InStock

No rating0 Saved

AI Acceleration Cloud for fast inference, fine-tuning, and training.

Social Media

twitter linkedin

WebsiteFreemiumPaidAI API AI Developer Tools AI Models Large Language Models (LLMs)Open Source AI Models

Together AI Introduction

What is Together AI?

Together AI is an AI Acceleration Cloud providing an end-to-end platform for the full generative AI lifecycle. It offers fast inference, fine-tuning, and training capabilities for generative AI models using easy-to-use APIs and highly scalable infrastructure. Users can run and fine-tune open-source models, train and deploy models at scale on their AI Acceleration Cloud and scalable GPU clusters, and optimize performance and cost. The platform supports over 200 generative AI models across various modalities like chat, images, code, and more, with OpenAI-compatible APIs.

How to use Together AI?

Users can interact with Together AI through easy-to-use APIs for serverless inference or deploy models on custom hardware via dedicated endpoints. Fine-tuning is available through simple commands or by controlling hyperparameters via API. GPU clusters can be requested for large-scale training. The platform also offers a web UI, API, or CLI to start or stop endpoints and manage services. Code execution environments are available for building and running AI development tasks.

Why Choose Together AI?

Choose this if you want a powerful, all-in-one AI acceleration platform that handles everything from training to fine-tuning and inference with ease. It’s perfect for folks who need scalable GPU clusters and support for tons of generative AI models, plus it’s got OpenAI-compatible APIs so you can plug in smoothly. The serverless inference and dedicated endpoints make deploying custom hardware setups a breeze.

Together AI Features

AI API

✓Serverless Inference API for open-source models
✓Dedicated Endpoints for custom hardware deployment
✓Fine-Tuning (LoRA and full fine-tuning)
✓Together Chat app for open-source AI
✓Code Sandbox for AI development environments
✓Code Interpreter for executing LLM-generated code
✓GPU Clusters (Instant and Reserved) with NVIDIA GPUs (GB200, B200, H200, H100, A100)
✓Extensive Model Library (200+ generative AI models)
✓OpenAI-compatible APIs
✓Accelerated Software Stack (e.g., FlashAttention-3, custom CUDA kernels)
✓High-Speed Interconnects (InfiniBand, NVLink)
✓Robust Management Tools (Slurm, Kubernetes)

FAQ?

Pricing

Serverless Inference

Varies by model and token count

Prices are per 1 million tokens (input and output for Chat, Multimodal, Language, Code; input only for Embedding; image size/steps for Image models). Batch inference is available at an introductory 50% discount. Specific model prices range from $0.06 to $7.00 per 1M tokens depending on model size and type.

Dedicated Endpoints

Varies by GPU type, per minute/hour

Deploy models on customizable GPU endpoints with per-minute billing. Supports various NVIDIA GPUs like RTX-6000, L40, A100, H100, H200. Prices range from $0.025/minute ($1.49/hour) for RTX-6000/L40 to $0.083/minute ($4.99/hour) for H200.

Fine-tuning

Per 1M Tokens processed

Pricing is based on model size, dataset size, and number of epochs. Supervised Fine-tuning (LoRA) ranges from $0.48 to $2.90 per 1M tokens. Full Fine-tuning ranges from $0.54 to $3.20 per 1M tokens. DPO (LoRA) ranges from $1.20 to $7.25 per 1M tokens. DPO (Full FT) ranges from $1.35 to $8.00 per 1M tokens.

Together GPU Clusters

Starting at $1.30/hour

State-of-the-art clusters with NVIDIA Blackwell and Hopper GPUs (H200, H100, A100) for optimal AI training and inference. H200 starts at $2.09/hr, H100 at $1.75/hr, A100 at $1.30/hr. GB200 and B200 pricing requires contact.

Code Execution

Per hour or per session

Together Code Sandbox is priced per vCPU ($0.0446/hour) and per GiB RAM ($0.0149/hour). Together Code Interpreter is priced per session ($0.03 for 60 minutes).

Together AI Tags

Loading...

Together AI

Social Media

Together AI Introduction

What is Together AI?

How to use Together AI?

Why Choose Together AI?

Together AI Features

AI API

FAQ?

Pricing

Serverless Inference

Dedicated Endpoints

Fine-tuning

Together GPU Clusters

Code Execution

Together AI Tags

Would you recommend Together AI? Leave a comment

Reviews (0)

Alternatives to Together AI

Actionize

Launchpadquick

DirectoryStack

Editors' Choice

Cofe AI

Giftgpt

Codehub AI

Learnery

Geometrik