--- title: 🤖 Large language models (LLMs) --- ## Overview Embedchain comes with built-in support for various popular large language models. We handle the complexity of integrating these models for you, allowing you to easily customize your language model interactions through a user-friendly interface. ## OpenAI To use OpenAI LLM models, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys). Once you have obtained the key, you can use it like this: ```python import os from embedchain import App os.environ['OPENAI_API_KEY'] = 'xxx' app = App() app.add("https://en.wikipedia.org/wiki/OpenAI") app.query("What is OpenAI?") ``` If you are looking to configure the different parameters of the LLM, you can do so by loading the app using a [yaml config](https://github.com/embedchain/embedchain/blob/main/configs/chroma.yaml) file. ```python main.py import os from embedchain import App os.environ['OPENAI_API_KEY'] = 'xxx' # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: openai config: model: 'gpt-3.5-turbo' temperature: 0.5 max_tokens: 1000 top_p: 1 stream: false ``` ### Function Calling Embedchain supports OpenAI [Function calling](https://platform.openai.com/docs/guides/function-calling) with a single function. It accepts inputs in accordance with the [Langchain interface](https://python.langchain.com/docs/modules/model_io/chat/function_calling#legacy-args-functions-and-function_call). ```python from pydantic import BaseModel class multiply(BaseModel): """Multiply two integers together.""" a: int = Field(..., description="First integer") b: int = Field(..., description="Second integer") ``` ```python def multiply(a: int, b: int) -> int: """Multiply two integers together. Args: a: First integer b: Second integer """ return a * b ``` ```python multiply = { "type": "function", "function": { "name": "multiply", "description": "Multiply two integers together.", "parameters": { "type": "object", "properties": { "a": { "description": "First integer", "type": "integer" }, "b": { "description": "Second integer", "type": "integer" } }, "required": [ "a", "b" ] } } } ``` With any of the previous inputs, the OpenAI LLM can be queried to provide the appropriate arguments for the function. ```python import os from embedchain import App from embedchain.llm.openai import OpenAILlm os.environ["OPENAI_API_KEY"] = "sk-xxx" llm = OpenAILlm(tools=multiply) app = App(llm=llm) result = app.query("What is the result of 125 multiplied by fifteen?") ``` ## Google AI To use Google AI model, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey) ```python main.py import os from embedchain import App os.environ["GOOGLE_API_KEY"] = "xxx" app = App.from_config(config_path="config.yaml") app.add("https://www.forbes.com/profile/elon-musk") response = app.query("What is the net worth of Elon Musk?") if app.llm.config.stream: # if stream is enabled, response is a generator for chunk in response: print(chunk) else: print(response) ``` ```yaml config.yaml llm: provider: google config: model: gemini-pro max_tokens: 1000 temperature: 0.5 top_p: 1 stream: false embedder: provider: google config: model: 'models/embedding-001' task_type: "retrieval_document" title: "Embeddings for Embedchain" ``` ## Azure OpenAI To use Azure OpenAI model, you have to set some of the azure openai related environment variables as given in the code block below: ```python main.py import os from embedchain import App os.environ["OPENAI_API_TYPE"] = "azure" os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/" os.environ["AZURE_OPENAI_KEY"] = "xxx" os.environ["OPENAI_API_VERSION"] = "xxx" app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: azure_openai config: model: gpt-3.5-turbo deployment_name: your_llm_deployment_name temperature: 0.5 max_tokens: 1000 top_p: 1 stream: false embedder: provider: azure_openai config: model: text-embedding-ada-002 deployment_name: you_embedding_model_deployment_name ``` You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal). ## Anthropic To use anthropic's model, please set the `ANTHROPIC_API_KEY` which you find on their [Account Settings Page](https://console.anthropic.com/account/keys). ```python main.py import os from embedchain import App os.environ["ANTHROPIC_API_KEY"] = "xxx" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: anthropic config: model: 'claude-instant-1' temperature: 0.5 max_tokens: 1000 top_p: 1 stream: false ``` ## Cohere Install related dependencies using the following command: ```bash pip install --upgrade 'embedchain[cohere]' ``` Set the `COHERE_API_KEY` as environment variable which you can find on their [Account settings page](https://dashboard.cohere.com/api-keys). Once you have the API key, you are all set to use it with Embedchain. ```python main.py import os from embedchain import App os.environ["COHERE_API_KEY"] = "xxx" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: cohere config: model: large temperature: 0.5 max_tokens: 1000 top_p: 1 ``` ## Together Install related dependencies using the following command: ```bash pip install --upgrade 'embedchain[together]' ``` Set the `TOGETHER_API_KEY` as environment variable which you can find on their [Account settings page](https://api.together.xyz/settings/api-keys). Once you have the API key, you are all set to use it with Embedchain. ```python main.py import os from embedchain import App os.environ["TOGETHER_API_KEY"] = "xxx" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: together config: model: togethercomputer/RedPajama-INCITE-7B-Base temperature: 0.5 max_tokens: 1000 top_p: 1 ``` ## Ollama Setup Ollama using https://github.com/jmorganca/ollama ```python main.py import os os.environ["OLLAMA_HOST"] = "http://127.0.0.1:11434" from embedchain import App # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: ollama config: model: 'llama2' temperature: 0.5 top_p: 1 stream: true base_url: 'http://localhost:11434' embedder: provider: ollama config: model: znbang/bge:small-en-v1.5-q8_0 base_url: http://localhost:11434 ``` ## vLLM Setup vLLM by following instructions given in [their docs](https://docs.vllm.ai/en/latest/getting_started/installation.html). ```python main.py import os from embedchain import App # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: vllm config: model: 'meta-llama/Llama-2-70b-hf' temperature: 0.5 top_p: 1 top_k: 10 stream: true trust_remote_code: true ``` ## Clarifai Install related dependencies using the following command: ```bash pip install --upgrade 'embedchain[clarifai]' ``` set the `CLARIFAI_PAT` as environment variable which you can find in the [security page](https://clarifai.com/settings/security). Optionally you can also pass the PAT key as parameters in LLM/Embedder class. Now you are all set with exploring Embedchain. ```python main.py import os from embedchain import App os.environ["CLARIFAI_PAT"] = "XXX" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") #Now let's add some data. app.add("https://www.forbes.com/profile/elon-musk") #Query the app response = app.query("what college degrees does elon musk have?") ``` Head to [Clarifai Platform](https://clarifai.com/explore/models?page=1&perPage=24&filterData=%5B%7B%22field%22%3A%22use_cases%22%2C%22value%22%3A%5B%22llm%22%5D%7D%5D) to browse various State-of-the-Art LLM models for your use case. For passing model inference parameters use `model_kwargs` argument in the config file. Also you can use `api_key` argument to pass `CLARIFAI_PAT` in the config. ```yaml config.yaml llm: provider: clarifai config: model: "https://clarifai.com/mistralai/completion/models/mistral-7B-Instruct" model_kwargs: temperature: 0.5 max_tokens: 1000 embedder: provider: clarifai config: model: "https://clarifai.com/clarifai/main/models/BAAI-bge-base-en-v15" ``` ## GPT4ALL Install related dependencies using the following command: ```bash pip install --upgrade 'embedchain[opensource]' ``` GPT4all is a free-to-use, locally running, privacy-aware chatbot. No GPU or internet required. You can use this with Embedchain using the following code: ```python main.py from embedchain import App # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: gpt4all config: model: 'orca-mini-3b-gguf2-q4_0.gguf' temperature: 0.5 max_tokens: 1000 top_p: 1 stream: false embedder: provider: gpt4all ``` ## JinaChat First, set `JINACHAT_API_KEY` in environment variable which you can obtain from [their platform](https://chat.jina.ai/api). Once you have the key, load the app using the config yaml file: ```python main.py import os from embedchain import App os.environ["JINACHAT_API_KEY"] = "xxx" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: jina config: temperature: 0.5 max_tokens: 1000 top_p: 1 stream: false ``` ## Hugging Face Install related dependencies using the following command: ```bash pip install --upgrade 'embedchain[huggingface-hub]' ``` First, set `HUGGINGFACE_ACCESS_TOKEN` in environment variable which you can obtain from [their platform](https://huggingface.co/settings/tokens). You can load the LLMs from Hugging Face using three ways: - [Hugging Face Hub](#hugging-face-hub) - [Hugging Face Local Pipelines](#hugging-face-local-pipelines) - [Hugging Face Inference Endpoint](#hugging-face-inference-endpoint) ### Hugging Face Hub To load the model from Hugging Face Hub, use the following code: ```python main.py import os from embedchain import App os.environ["HUGGINGFACE_ACCESS_TOKEN"] = "xxx" config = { "app": {"config": {"id": "my-app"}}, "llm": { "provider": "huggingface", "config": { "model": "bigscience/bloom-1b7", "top_p": 0.5, "max_length": 200, "temperature": 0.1, }, }, } app = App.from_config(config=config) ``` ### Hugging Face Local Pipelines If you want to load the locally downloaded model from Hugging Face, you can do so by following the code provided below: ```python main.py from embedchain import App config = { "app": {"config": {"id": "my-app"}}, "llm": { "provider": "huggingface", "config": { "model": "Trendyol/Trendyol-LLM-7b-chat-v0.1", "local": True, # Necessary if you want to run model locally "top_p": 0.5, "max_tokens": 1000, "temperature": 0.1, }, } } app = App.from_config(config=config) ``` ### Hugging Face Inference Endpoint You can also use [Hugging Face Inference Endpoints](https://huggingface.co/docs/inference-endpoints/index#-inference-endpoints) to access custom endpoints. First, set the `HUGGINGFACE_ACCESS_TOKEN` as above. Then, load the app using the config yaml file: ```python main.py from embedchain import App config = { "app": {"config": {"id": "my-app"}}, "llm": { "provider": "huggingface", "config": { "endpoint": "https://api-inference.huggingface.co/models/gpt2", "model_params": {"temprature": 0.1, "max_new_tokens": 100} }, }, } app = App.from_config(config=config) ``` Currently only supports `text-generation` and `text2text-generation` for now [[ref](https://api.python.langchain.com/en/latest/llms/langchain_community.llms.huggingface_endpoint.HuggingFaceEndpoint.html?highlight=huggingfaceendpoint#)]. See langchain's [hugging face endpoint](https://python.langchain.com/docs/integrations/chat/huggingface#huggingfaceendpoint) for more information. ## Llama2 Llama2 is integrated through [Replicate](https://replicate.com/). Set `REPLICATE_API_TOKEN` in environment variable which you can obtain from [their platform](https://replicate.com/account/api-tokens). Once you have the token, load the app using the config yaml file: ```python main.py import os from embedchain import App os.environ["REPLICATE_API_TOKEN"] = "xxx" # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: llama2 config: model: 'a16z-infra/llama13b-v2-chat:df7690f1994d94e96ad9d568eac121aecf50684a0b0963b25a41cc40061269e5' temperature: 0.5 max_tokens: 1000 top_p: 0.5 stream: false ``` ## Vertex AI Setup Google Cloud Platform application credentials by following the instruction on [GCP](https://cloud.google.com/docs/authentication/external/set-up-adc). Once setup is done, use the following code to create an app using VertexAI as provider: ```python main.py from embedchain import App # load llm configuration from config.yaml file app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: vertexai config: model: 'chat-bison' temperature: 0.5 top_p: 0.5 ``` ## Mistral AI Obtain the Mistral AI api key from their [console](https://console.mistral.ai/). ```python main.py os.environ["MISTRAL_API_KEY"] = "xxx" app = App.from_config(config_path="config.yaml") app.add("https://www.forbes.com/profile/elon-musk") response = app.query("what is the net worth of Elon Musk?") # As of January 16, 2024, Elon Musk's net worth is $225.4 billion. response = app.chat("which companies does elon own?") # Elon Musk owns Tesla, SpaceX, Boring Company, Twitter, and X. response = app.chat("what question did I ask you already?") # You have asked me several times already which companies Elon Musk owns, specifically Tesla, SpaceX, Boring Company, Twitter, and X. ``` ```yaml config.yaml llm: provider: mistralai config: model: mistral-tiny temperature: 0.5 max_tokens: 1000 top_p: 1 embedder: provider: mistralai config: model: mistral-embed ``` ## AWS Bedrock ### Setup - Before using the AWS Bedrock LLM, make sure you have the appropriate model access from [Bedrock Console](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/modelaccess). - You will also need to authenticate the `boto3` client by using a method in the [AWS documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html#configuring-credentials) - You can optionally export an `AWS_REGION` ### Usage ```python main.py import os from embedchain import App os.environ["AWS_ACCESS_KEY_ID"] = "xxx" os.environ["AWS_SECRET_ACCESS_KEY"] = "xxx" os.environ["AWS_REGION"] = "us-west-2" app = App.from_config(config_path="config.yaml") ``` ```yaml config.yaml llm: provider: aws_bedrock config: model: amazon.titan-text-express-v1 # check notes below for model_kwargs model_kwargs: temperature: 0.5 topP: 1 maxTokenCount: 1000 ```
The model arguments are different for each providers. Please refer to the [AWS Bedrock Documentation](https://us-east-1.console.aws.amazon.com/bedrock/home?region=us-east-1#/providers) to find the appropriate arguments for your model.
## Groq [Groq](https://groq.com/) is the creator of the world's first Language Processing Unit (LPU), providing exceptional speed performance for AI workloads running on their LPU Inference Engine. ### Usage In order to use LLMs from Groq, go to their [platform](https://console.groq.com/keys) and get the API key. Set the API key as `GROQ_API_KEY` environment variable or pass in your app configuration to use the model as given below in the example. ```python main.py import os from embedchain import App # Set your API key here or pass as the environment variable groq_api_key = "gsk_xxxx" config = { "llm": { "provider": "groq", "config": { "model": "mixtral-8x7b-32768", "api_key": groq_api_key, "stream": True } } } app = App.from_config(config=config) # Add your data source here app.add("https://docs.embedchain.ai/sitemap.xml", data_type="sitemap") app.query("Write a poem about Embedchain") # In the realm of data, vast and wide, # Embedchain stands with knowledge as its guide. # A platform open, for all to try, # Building bots that can truly fly. # With REST API, data in reach, # Deployment a breeze, as easy as a speech. # Updating data sources, anytime, anyday, # Embedchain's power, never sway. # A knowledge base, an assistant so grand, # Connecting to platforms, near and far. # Discord, WhatsApp, Slack, and more, # Embedchain's potential, never a bore. ``` ## NVIDIA AI [NVIDIA AI Foundation Endpoints](https://www.nvidia.com/en-us/ai-data-science/foundation-models/) let you quickly use NVIDIA's AI models, such as Mixtral 8x7B, Llama 2 etc, through our API. These models are available in the [NVIDIA NGC catalog](https://catalog.ngc.nvidia.com/ai-foundation-models), fully optimized and ready to use on NVIDIA's AI platform. They are designed for high speed and easy customization, ensuring smooth performance on any accelerated setup. ### Usage In order to use LLMs from NVIDIA AI, create an account on [NVIDIA NGC Service](https://catalog.ngc.nvidia.com/). Generate an API key from their dashboard. Set the API key as `NVIDIA_API_KEY` environment variable. Note that the `NVIDIA_API_KEY` will start with `nvapi-`. Below is an example of how to use LLM model and embedding model from NVIDIA AI: ```python main.py import os from embedchain import App os.environ['NVIDIA_API_KEY'] = 'nvapi-xxxx' config = { "app": { "config": { "id": "my-app", }, }, "llm": { "provider": "nvidia", "config": { "model": "nemotron_steerlm_8b", }, }, "embedder": { "provider": "nvidia", "config": { "model": "nvolveqa_40k", "vector_dimension": 1024, }, }, } app = App.from_config(config=config) app.add("https://www.forbes.com/profile/elon-musk") answer = app.query("What is the net worth of Elon Musk today?") # Answer: The net worth of Elon Musk is subject to fluctuations based on the market value of his holdings in various companies. # As of March 1, 2024, his net worth is estimated to be approximately $210 billion. However, this figure can change rapidly due to stock market fluctuations and other factors. # Additionally, his net worth may include other assets such as real estate and art, which are not reflected in his stock portfolio. ``` ## Token Usage You can get the cost of the query by setting `token_usage` to `True` in the config file. This will return the token details: `prompt_tokens`, `completion_tokens`, `total_tokens`, `total_cost`, `cost_currency`. The list of paid LLMs that support token usage are: - OpenAI - Vertex AI - Anthropic - Cohere - Together - Groq - Mistral AI - NVIDIA AI Here is an example of how to use token usage: ```python main.py os.environ["OPENAI_API_KEY"] = "xxx" app = App.from_config(config_path="config.yaml") app.add("https://www.forbes.com/profile/elon-musk") response = app.query("what is the net worth of Elon Musk?") # {'answer': 'Elon Musk's net worth is $209.9 billion as of 6/9/24.', # 'usage': {'prompt_tokens': 1228, # 'completion_tokens': 21, # 'total_tokens': 1249, # 'total_cost': 0.001884, # 'cost_currency': 'USD'} # } response = app.chat("Which companies did Elon Musk found?") # {'answer': 'Elon Musk founded six companies, including Tesla, which is an electric car maker, SpaceX, a rocket producer, and the Boring Company, a tunneling startup.', # 'usage': {'prompt_tokens': 1616, # 'completion_tokens': 34, # 'total_tokens': 1650, # 'total_cost': 0.002492, # 'cost_currency': 'USD'} # } ``` ```yaml config.yaml llm: provider: openai config: model: gpt-3.5-turbo temperature: 0.5 max_tokens: 1000 token_usage: true ``` If a model is missing and you'd like to add it to `model_prices_and_context_window.json`, please feel free to open a PR.