123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173 |
- ---
- title: 🧩 Embedding models
- ---
- ## Overview
- Embedchain supports several embedding models from the following providers:
- <CardGroup cols={4}>
- <Card title="OpenAI" href="#openai"></Card>
- <Card title="Azure OpenAI" href="#azure-openai"></Card>
- <Card title="GPT4All" href="#gpt4all"></Card>
- <Card title="Hugging Face" href="#hugging-face"></Card>
- <Card title="Vertex AI" href="#vertex-ai"></Card>
- </CardGroup>
- ## OpenAI
- To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
- Once you have obtained the key, you can use it like this:
- <CodeGroup>
- ```python main.py
- import os
- from embedchain import Pipeline as App
- os.environ['OPENAI_API_KEY'] = 'xxx'
- # load embedding model configuration from config.yaml file
- app = App.from_config(yaml_path="config.yaml")
- app.add("https://en.wikipedia.org/wiki/OpenAI")
- app.query("What is OpenAI?")
- ```
- ```yaml config.yaml
- embedder:
- provider: openai
- config:
- model: 'text-embedding-ada-002'
- ```
- </CodeGroup>
- ## Azure OpenAI
- To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
- <CodeGroup>
- ```python main.py
- import os
- from embedchain import Pipeline as App
- os.environ["OPENAI_API_TYPE"] = "azure"
- os.environ["OPENAI_API_BASE"] = "https://xxx.openai.azure.com/"
- os.environ["OPENAI_API_KEY"] = "xxx"
- os.environ["OPENAI_API_VERSION"] = "xxx"
- app = App.from_config(yaml_path="config.yaml")
- ```
- ```yaml config.yaml
- llm:
- provider: azure_openai
- config:
- model: gpt-35-turbo
- deployment_name: your_llm_deployment_name
- temperature: 0.5
- max_tokens: 1000
- top_p: 1
- stream: false
- embedder:
- provider: azure_openai
- config:
- model: text-embedding-ada-002
- deployment_name: you_embedding_model_deployment_name
- ```
- </CodeGroup>
- You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
- ## GPT4ALL
- GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
- <CodeGroup>
- ```python main.py
- from embedchain import Pipeline as App
- # load embedding model configuration from config.yaml file
- app = App.from_config(yaml_path="config.yaml")
- ```
- ```yaml config.yaml
- llm:
- provider: gpt4all
- config:
- model: 'orca-mini-3b-gguf2-q4_0.gguf'
- temperature: 0.5
- max_tokens: 1000
- top_p: 1
- stream: false
- embedder:
- provider: gpt4all
- ```
- </CodeGroup>
- ## Hugging Face
- Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
- <CodeGroup>
- ```python main.py
- from embedchain import Pipeline as App
- # load embedding model configuration from config.yaml file
- app = App.from_config(yaml_path="config.yaml")
- ```
- ```yaml config.yaml
- llm:
- provider: huggingface
- config:
- model: 'google/flan-t5-xxl'
- temperature: 0.5
- max_tokens: 1000
- top_p: 0.5
- stream: false
- embedder:
- provider: huggingface
- config:
- model: 'sentence-transformers/all-mpnet-base-v2'
- ```
- </CodeGroup>
- ## Vertex AI
- Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
- <CodeGroup>
- ```python main.py
- from embedchain import Pipeline as App
- # load embedding model configuration from config.yaml file
- app = App.from_config(yaml_path="config.yaml")
- ```
- ```yaml config.yaml
- llm:
- provider: vertexai
- config:
- model: 'chat-bison'
- temperature: 0.5
- top_p: 0.5
- embedder:
- provider: vertexai
- config:
- model: 'textembedding-gecko'
- ```
- </CodeGroup>
|