---
title: 🤖 Large language models (LLMs)
---

## Overview

Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface.

<CardGroup cols={4}>
  <Card title="OpenAI" href="#openai"></Card>
  <Card title="Google AI" href="#google-ai"></Card>
  <Card title="Azure OpenAI" href="#azure-openai"></Card>
  <Card title="Anthropic" href="#anthropic"></Card>
  <Card title="Cohere" href="#cohere"></Card>
  <Card title="Together" href="#together"></Card>
  <Card title="Ollama" href="#ollama"></Card>
  <Card title="vLLM" href="#vllm"></Card>
  <Card title="Clarifai" href="#clarifai"></Card>
  <Card title="GPT4All" href="#gpt4all"></Card>
  <Card title="JinaChat" href="#jinachat"></Card>
  <Card title="Hugging Face" href="#hugging-face"></Card>
  <Card title="Llama2" href="#llama2"></Card>
  <Card title="Vertex AI" href="#vertex-ai"></Card>
  <Card title="Mistral AI" href="#mistral-ai"></Card>
  <Card title="AWS Bedrock" href="#aws-bedrock"></Card>
  <Card title="Groq" href="#groq"></Card>
  <Card title="NVIDIA AI" href="#nvidia-ai"></Card>
</CardGroup>

## OpenAI

To use OpenAI LLM models, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).

Once you have obtained the key, you can use it like this:

```python
import os
from embedchain import App

os.environ['OPENAI_API_KEY'] = 'xxx'

app = App()
app.add("https://en.wikipedia.org/wiki/OpenAI")
app.query("What is OpenAI?")
```

If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file.

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ['OPENAI_API_KEY'] = 'xxx'

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: openai
  config:
    model: 'gpt-3.5-turbo'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false
```
</CodeGroup>

### Function Calling
Embedchain supports OpenAI [Function calling](https://platform.openai.com/docs/guides/function-calling) with a single function. It accepts inputs in accordance with the [Langchain interface](https://python.langchain.com/docs/modules/model_io/chat/function_calling#legacy-args-functions-and-function_call).

<Accordion title="Pydantic Model">
  ```python
  from pydantic import BaseModel

  class multiply(BaseModel):
      """Multiply two integers together."""

      a: int = Field(..., description="First integer")
      b: int = Field(..., description="Second integer")
  ```
</Accordion>

<Accordion title="Python function">
  ```python
  def multiply(a: int, b: int) -> int:
      """Multiply two integers together.

      Args:
          a: First integer
          b: Second integer
      """
      return a * b
  ```
</Accordion>
<Accordion title="OpenAI tool dictionary">
  ```python
  multiply = {
    "type": "function",
    "function": {
      "name": "multiply",
      "description": "Multiply two integers together.",
      "parameters": {
        "type": "object",
        "properties": {
          "a": {
            "description": "First integer",
            "type": "integer"
          },
          "b": {
            "description": "Second integer",
            "type": "integer"
          }
        },
        "required": [
          "a",
          "b"
        ]
      }
    }
  }
  ```
</Accordion>

With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function.

```python
import os
from embedchain import App
from embedchain.llm.openai import OpenAILlm

os.environ["OPENAI_API_KEY"] = "sk-xxx"

llm = OpenAILlm(tools=multiply)
app = App(llm=llm)

result = app.query("What is the result of 125 multiplied by fifteen?")
```

## Google AI

To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)

<CodeGroup>
```python main.py
import os
from embedchain import App

os.environ["GOOGLE_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("What is the net worth of Elon Musk?")
if app.llm.config.stream: # if stream is enabled, response is a generator
    for chunk in response:
        print(chunk)
else:
    print(response)
```

```yaml config.yaml
llm:
  provider: google
  config:
    model: gemini-pro
    max_tokens: 1000
    temperature: 0.5
    top_p: 1
    stream: false

embedder:
  provider: google
  config:
    model: 'models/embedding-001'
    task_type: "retrieval_document"
    title: "Embeddings for Embedchain"
```
</CodeGroup>

## Azure OpenAI

To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below:

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
os.environ["AZURE_OPENAI_KEY"] = "xxx"
os.environ["OPENAI_API_VERSION"] = "xxx"

app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: azure_openai
  config:
    model: gpt-3.5-turbo
    deployment_name: your_llm_deployment_name
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

embedder:
  provider: azure_openai
  config:
    model: text-embedding-ada-002
    deployment_name: you_embedding_model_deployment_name
```
</CodeGroup>

You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).

## Anthropic

To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on their [Account Settings Page](https://console.anthropic.com/account/keys).

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["ANTHROPIC_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: anthropic
  config:
    model: 'claude-instant-1'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false
```

</CodeGroup>

## Cohere

Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[cohere]'
```

Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys).

Once you have the API key, you are all set to use it with Embedchain.

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["COHERE_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: cohere
  config:
    model: large
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
```

</CodeGroup>

## Together

Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[together]'
```

Set the `TOGETHER_API_KEY` as environment variable which you can find on their [Account settings page](https://api.together.xyz/settings/api-keys).

Once you have the API key, you are all set to use it with Embedchain.

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["TOGETHER_API_KEY"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: together
  config:
    model: togethercomputer/RedPajama-INCITE-7B-Base
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
```

</CodeGroup>

## Ollama

Setup Ollama using https://github.com/jmorganca/ollama

<CodeGroup>

```python main.py
import os
os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434"
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: ollama
  config:
    model: 'llama2'
    temperature: 0.5
    top_p: 1
    stream: true
    base_url: 'http://localhost:11434'
embedder:
  provider: ollama
  config:
    model: znbang/bge:small-en-v1.5-q8_0
    base_url: http://localhost:11434

```

</CodeGroup>


## vLLM

Setup vLLM by following instructions given in [their docs](https://docs.vllm.ai/en/latest/getting_started/installation.html).

<CodeGroup>

```python main.py
import os
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: vllm
  config:
    model: 'meta-llama/Llama-2-70b-hf'
    temperature: 0.5
    top_p: 1
    top_k: 10
    stream: true
    trust_remote_code: true
```

</CodeGroup>

## Clarifai

Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[clarifai]'
```

set the `CLARIFAI_PAT` as environment variable which you can find in the [security page](https://clarifai.com/settings/security). Optionally you can also pass the PAT key as parameters in LLM/Embedder class.

Now you are all set with exploring Embedchain.

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["CLARIFAI_PAT"] = "XXX"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")

#Now let's add some data.
app.add("https://www.forbes.com/profile/elon-musk")

#Query the app
response = app.query("what college degrees does elon musk have?")
```
Head to [Clarifai Platform](https://clarifai.com/explore/models?page=1&perPage=24&filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D) to browse various State-of-the-Art LLM models for your use case.
For passing model inference parameters use `model_kwargs` argument in the config file. Also you can use `api_key` argument to pass `CLARIFAI_PAT` in the config.

```yaml config.yaml
llm:
 provider: clarifai
 config:
   model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct"
   model_kwargs:
     temperature: 0.5
     max_tokens: 1000  
embedder:
 provider: clarifai
 config:
   model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15"
```
</CodeGroup>


## GPT4ALL

Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[opensource]'
```

GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code:

<CodeGroup>

```python main.py
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: gpt4all
  config:
    model: 'orca-mini-3b-gguf2-q4_0.gguf'
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false

embedder:
  provider: gpt4all
```
</CodeGroup>


## JinaChat

First, set `JINACHAT_API_KEY` in environment variable which you can obtain from [their platform](https://chat.jina.ai/api).

Once you have the key, load the app using the config yaml file:

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["JINACHAT_API_KEY"] = "xxx"
# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: jina
  config:
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
    stream: false
```
</CodeGroup>


## Hugging Face


Install related dependencies using the following command:

```bash
pip install --upgrade 'embedchain[huggingface-hub]'
```

First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens).

You can load the LLMs from Hugging Face using three ways:

- [Hugging Face Hub](#hugging-face-hub)
- [Hugging Face Local Pipelines](#hugging-face-local-pipelines)
- [Hugging Face Inference Endpoint](#hugging-face-inference-endpoint)

### Hugging Face Hub

To load the model from Hugging Face Hub, use the following code:

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx"

config = {
  "app": {"config": {"id": "my-app"}},
  "llm": {
      "provider": "huggingface",
      "config": {
          "model": "bigscience/bloom-1b7",
          "top_p": 0.5,
          "max_length": 200,
          "temperature": 0.1,
      },
  },
}

app = App.from_config(config=config)
```
</CodeGroup>

### Hugging Face Local Pipelines

If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below:

<CodeGroup>
```python main.py
from embedchain import App

config = {
  "app": {"config": {"id": "my-app"}},
  "llm": {
      "provider": "huggingface",
      "config": {
          "model": "Trendyol/Trendyol-LLM-7b-chat-v0.1",
          "local": True,  # Necessary if you want to run model locally
          "top_p": 0.5,
          "max_tokens": 1000,
          "temperature": 0.1,
      },
  }
}
app = App.from_config(config=config)
```
</CodeGroup>

### Hugging Face Inference Endpoint

You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above.

Then, load the app using the config yaml file:

<CodeGroup>

```python main.py
from embedchain import App

config = {
  "app": {"config": {"id": "my-app"}},
  "llm": {
      "provider": "huggingface",
      "config": {
        "endpoint": "https://api-inference.huggingface.co/models/gpt2",
        "model_params": {"temprature": 0.1, "max_new_tokens": 100}
      },
  },
}
app = App.from_config(config=config)

```
</CodeGroup>

Currently only supports `text-generation` and `text2text-generation` for now [[ref](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html?highlight=huggingfaceendpoint#)].

See langchain's [hugging face endpoint](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfaceendpoint) for more information. 

## Llama2

Llama2 is integrated through [Replicate](https://replicate.com/).  Set `REPLICATE_API_TOKEN` in environment variable which you can obtain from [their platform](https://replicate.com/account/api-tokens).

Once you have the token, load the app using the config yaml file:

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["REPLICATE_API_TOKEN"] = "xxx"

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: llama2
  config:
    model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5'
    temperature: 0.5
    max_tokens: 1000
    top_p: 0.5
    stream: false
```
</CodeGroup>

## Vertex AI

Setup Google Cloud Platform application credentials by following the instruction on [GCP](https://cloud.google.com/docs/authentication/external/set-up-adc). Once setup is done, use the following code to create an app using VertexAI as provider:

<CodeGroup>

```python main.py
from embedchain import App

# load llm configuration from config.yaml file
app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: vertexai
  config:
    model: 'chat-bison'
    temperature: 0.5
    top_p: 0.5
```
</CodeGroup>


## Mistral AI

Obtain the Mistral AI api key from their [console](https://console.mistral.ai/).

<CodeGroup>
 
 ```python main.py
os.environ["MISTRAL_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("what is the net worth of Elon Musk?")
# As of January 16, 2024, Elon Musk's net worth is $225.4 billion.

response = app.chat("which companies does elon own?")
# Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X.

response = app.chat("what question did I ask you already?")
# You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X.
```
  
```yaml config.yaml
llm:
  provider: mistralai
  config:
    model: mistral-tiny
    temperature: 0.5
    max_tokens: 1000
    top_p: 1
embedder:
  provider: mistralai
  config:
    model: mistral-embed
```
</CodeGroup>


## AWS Bedrock

### Setup
- Before using the AWS Bedrock LLM, make sure you have the appropriate model access from [Bedrock Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess).
- You will also need to authenticate the `boto3` client by using a method in the [AWS documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials)
- You can optionally export an `AWS_REGION`


### Usage

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ["AWS_ACCESS_KEY_ID"] = "xxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx"
os.environ["AWS_REGION"] = "us-west-2"

app = App.from_config(config_path="config.yaml")
```

```yaml config.yaml
llm:
  provider: aws_bedrock
  config:
    model: amazon.titan-text-express-v1
    # check notes below for model_kwargs
    model_kwargs:
      temperature: 0.5
      topP: 1
      maxTokenCount: 1000
```
</CodeGroup>

<br />
<Note>
  The model arguments are different for each providers. Please refer to the [AWS Bedrock Documentation](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/providers) to find the appropriate arguments for your model.
</Note>

<br/ >

## Groq

[Groq](https://groq.com/) is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine.


### Usage

In order to use LLMs from Groq, go to their [platform](https://console.groq.com/keys) and get the API key.

Set the API key as `GROQ_API_KEY` environment variable or pass in your app configuration to use the model as given below in the example.

<CodeGroup>

```python main.py
import os
from embedchain import App

# Set your API key here or pass as the environment variable
groq_api_key = "gsk_xxxx"

config = {
    "llm": {
        "provider": "groq",
        "config": {
            "model": "mixtral-8x7b-32768",
            "api_key": groq_api_key,
            "stream": True
        }
    }
}

app = App.from_config(config=config)
# Add your data source here
app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap")
app.query("Write a poem about Embedchain")

# In the realm of data, vast and wide,
# Embedchain stands with knowledge as its guide.
# A platform open, for all to try,
# Building bots that can truly fly.

# With REST API, data in reach,
# Deployment a breeze, as easy as a speech.
# Updating data sources, anytime, anyday,
# Embedchain's power, never sway.

# A knowledge base, an assistant so grand,
# Connecting to platforms, near and far.
# Discord, WhatsApp, Slack, and more,
# Embedchain's potential, never a bore.
```
</CodeGroup>

## NVIDIA AI

[NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup.


### Usage

In order to use LLMs from NVIDIA AI, create an account on [NVIDIA NGC Service](https://catalog.ngc.nvidia.com/).

Generate an API key from their dashboard. Set the API key as `NVIDIA_API_KEY` environment variable. Note that the `NVIDIA_API_KEY` will start with `nvapi-`.

Below is an example of how to use LLM model and embedding model from NVIDIA AI:

<CodeGroup>

```python main.py
import os
from embedchain import App

os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx'

config = {
    "app": {
        "config": {
            "id": "my-app",
        },
    },
    "llm": {
        "provider": "nvidia",
        "config": {
            "model": "nemotron_steerlm_8b",
        },
    },
    "embedder": {
        "provider": "nvidia",
        "config": {
            "model": "nvolveqa_40k",
            "vector_dimension": 1024,
        },
    },
}

app = App.from_config(config=config)

app.add("https://www.forbes.com/profile/elon-musk")
answer = app.query("What is the net worth of Elon Musk today?")
# Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies.
# As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors.
# Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio.
```
</CodeGroup>

## Token Usage

You can get the cost of the query by setting `token_usage` to `True` in the config file. This will return the token details: `prompt_tokens`, `completion_tokens`, `total_tokens`, `total_cost`, `cost_currency`.
The list of paid LLMs that support token usage are:
- OpenAI
- Vertex AI
- Anthropic
- Cohere
- Together
- Groq
- Mistral AI
- NVIDIA AI

Here is an example of how to use token usage:
<CodeGroup>
 
```python main.py
os.environ["OPENAI_API_KEY"] = "xxx"

app = App.from_config(config_path="config.yaml")

app.add("https://www.forbes.com/profile/elon-musk")

response = app.query("what is the net worth of Elon Musk?")
# {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.',
#   'usage': {'prompt_tokens': 1228,
#   'completion_tokens': 21, 
#   'total_tokens': 1249, 
#   'total_cost': 0.001884, 
#   'cost_currency': 'USD'}
# }


response = app.chat("Which companies did Elon Musk found?")
# {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.',
#   'usage': {'prompt_tokens': 1616,
#   'completion_tokens': 34,
#   'total_tokens': 1650,
#   'total_cost': 0.002492,
#   'cost_currency': 'USD'}
# }
```
  
```yaml config.yaml
llm:
  provider: openai
  config:
    model: gpt-3.5-turbo
    temperature: 0.5
    max_tokens: 1000
    token_usage: true
```
</CodeGroup>

If a model is missing and you'd like to add it to `model_prices_and_context_window.json`, please feel free to open a PR.

<br/ >

<Snippet file="missing-llm-tip.mdx" />