embedding-models.mdx 5.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222
  1. ---
  2. title: 🧩 Embedding models
  3. ---
  4. ## Overview
  5. Embedchain supports several embedding models from the following providers:
  6. <CardGroup cols={4}>
  7. <Card title="OpenAI" href="#openai"></Card>
  8. <Card title="GoogleAI" href="#google-ai"></Card>
  9. <Card title="Azure OpenAI" href="#azure-openai"></Card>
  10. <Card title="GPT4All" href="#gpt4all"></Card>
  11. <Card title="Hugging Face" href="#hugging-face"></Card>
  12. <Card title="Vertex AI" href="#vertex-ai"></Card>
  13. </CardGroup>
  14. ## OpenAI
  15. To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
  16. Once you have obtained the key, you can use it like this:
  17. <CodeGroup>
  18. ```python main.py
  19. import os
  20. from embedchain import App
  21. os.environ['OPENAI_API_KEY'] = 'xxx'
  22. # load embedding model configuration from config.yaml file
  23. app = App.from_config(config_path="config.yaml")
  24. app.add("https://en.wikipedia.org/wiki/OpenAI")
  25. app.query("What is OpenAI?")
  26. ```
  27. ```yaml config.yaml
  28. embedder:
  29. provider: openai
  30. config:
  31. model: 'text-embedding-3-small'
  32. ```
  33. </CodeGroup>
  34. * OpenAI announced two new embedding models: `text-embedding-3-small` and `text-embedding-3-large`. Embedchain supports both these models. Below you can find YAML config for both:
  35. <CodeGroup>
  36. ```yaml text-embedding-3-small.yaml
  37. embedder:
  38. provider: openai
  39. config:
  40. model: 'text-embedding-3-small'
  41. ```
  42. ```yaml text-embedding-3-large.yaml
  43. embedder:
  44. provider: openai
  45. config:
  46. model: 'text-embedding-3-large'
  47. ```
  48. </CodeGroup>
  49. ## Google AI
  50. To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)
  51. <CodeGroup>
  52. ```python main.py
  53. import os
  54. from embedchain import App
  55. os.environ["GOOGLE_API_KEY"] = "xxx"
  56. app = App.from_config(config_path="config.yaml")
  57. ```
  58. ```yaml config.yaml
  59. embedder:
  60. provider: google
  61. config:
  62. model: 'models/embedding-001'
  63. task_type: "retrieval_document"
  64. title: "Embeddings for Embedchain"
  65. ```
  66. </CodeGroup>
  67. <br/>
  68. <Note>
  69. For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).
  70. </Note>
  71. ## Azure OpenAI
  72. To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
  73. <CodeGroup>
  74. ```python main.py
  75. import os
  76. from embedchain import App
  77. os.environ["OPENAI_API_TYPE"] = "azure"
  78. os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
  79. os.environ["AZURE_OPENAI_API_KEY"] = "xxx"
  80. os.environ["OPENAI_API_VERSION"] = "xxx"
  81. app = App.from_config(config_path="config.yaml")
  82. ```
  83. ```yaml config.yaml
  84. llm:
  85. provider: azure_openai
  86. config:
  87. model: gpt-35-turbo
  88. deployment_name: your_llm_deployment_name
  89. temperature: 0.5
  90. max_tokens: 1000
  91. top_p: 1
  92. stream: false
  93. embedder:
  94. provider: azure_openai
  95. config:
  96. model: text-embedding-ada-002
  97. deployment_name: you_embedding_model_deployment_name
  98. ```
  99. </CodeGroup>
  100. You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
  101. ## GPT4ALL
  102. GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
  103. <CodeGroup>
  104. ```python main.py
  105. from embedchain import App
  106. # load embedding model configuration from config.yaml file
  107. app = App.from_config(config_path="config.yaml")
  108. ```
  109. ```yaml config.yaml
  110. llm:
  111. provider: gpt4all
  112. config:
  113. model: 'orca-mini-3b-gguf2-q4_0.gguf'
  114. temperature: 0.5
  115. max_tokens: 1000
  116. top_p: 1
  117. stream: false
  118. embedder:
  119. provider: gpt4all
  120. ```
  121. </CodeGroup>
  122. ## Hugging Face
  123. Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
  124. <CodeGroup>
  125. ```python main.py
  126. from embedchain import App
  127. # load embedding model configuration from config.yaml file
  128. app = App.from_config(config_path="config.yaml")
  129. ```
  130. ```yaml config.yaml
  131. llm:
  132. provider: huggingface
  133. config:
  134. model: 'google/flan-t5-xxl'
  135. temperature: 0.5
  136. max_tokens: 1000
  137. top_p: 0.5
  138. stream: false
  139. embedder:
  140. provider: huggingface
  141. config:
  142. model: 'sentence-transformers/all-mpnet-base-v2'
  143. ```
  144. </CodeGroup>
  145. ## Vertex AI
  146. Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
  147. <CodeGroup>
  148. ```python main.py
  149. from embedchain import App
  150. # load embedding model configuration from config.yaml file
  151. app = App.from_config(config_path="config.yaml")
  152. ```
  153. ```yaml config.yaml
  154. llm:
  155. provider: vertexai
  156. config:
  157. model: 'chat-bison'
  158. temperature: 0.5
  159. top_p: 0.5
  160. embedder:
  161. provider: vertexai
  162. config:
  163. model: 'textembedding-gecko'
  164. ```
  165. </CodeGroup>