embedding-models.mdx 4.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202
  1. ---
  2. title: 🧩 Embedding models
  3. ---
  4. ## Overview
  5. Embedchain supports several embedding models from the following providers:
  6. <CardGroup cols={4}>
  7. <Card title="OpenAI" href="#openai"></Card>
  8. <Card title="GoogleAI" href="#google-ai"></Card>
  9. <Card title="Azure OpenAI" href="#azure-openai"></Card>
  10. <Card title="GPT4All" href="#gpt4all"></Card>
  11. <Card title="Hugging Face" href="#hugging-face"></Card>
  12. <Card title="Vertex AI" href="#vertex-ai"></Card>
  13. </CardGroup>
  14. ## OpenAI
  15. To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
  16. Once you have obtained the key, you can use it like this:
  17. <CodeGroup>
  18. ```python main.py
  19. import os
  20. from embedchain import App
  21. os.environ['OPENAI_API_KEY'] = 'xxx'
  22. # load embedding model configuration from config.yaml file
  23. app = App.from_config(config_path="config.yaml")
  24. app.add("https://en.wikipedia.org/wiki/OpenAI")
  25. app.query("What is OpenAI?")
  26. ```
  27. ```yaml config.yaml
  28. embedder:
  29. provider: openai
  30. config:
  31. model: 'text-embedding-ada-002'
  32. ```
  33. </CodeGroup>
  34. ## Google AI
  35. To use Google AI embedding function, you have to set the `GOOGLE_API_KEY` environment variable. You can obtain the Google API key from the [Google Maker Suite](https://makersuite.google.com/app/apikey)
  36. <CodeGroup>
  37. ```python main.py
  38. import os
  39. from embedchain import App
  40. os.environ["GOOGLE_API_KEY"] = "xxx"
  41. app = App.from_config(config_path="config.yaml")
  42. ```
  43. ```yaml config.yaml
  44. embedder:
  45. provider: google
  46. config:
  47. model: 'models/embedding-001'
  48. task_type: "retrieval_document"
  49. title: "Embeddings for Embedchain"
  50. ```
  51. </CodeGroup>
  52. <br/>
  53. <Note>
  54. For more details regarding the Google AI embedding model, please refer to the [Google AI documentation](https://ai.google.dev/tutorials/python_quickstart#use_embeddings).
  55. </Note>
  56. ## Azure OpenAI
  57. To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
  58. <CodeGroup>
  59. ```python main.py
  60. import os
  61. from embedchain import App
  62. os.environ["OPENAI_API_TYPE"] = "azure"
  63. os.environ["AZURE_OPENAI_ENDPOINT"] = "https://xxx.openai.azure.com/"
  64. os.environ["AZURE_OPENAI_API_KEY"] = "xxx"
  65. os.environ["OPENAI_API_VERSION"] = "xxx"
  66. app = App.from_config(config_path="config.yaml")
  67. ```
  68. ```yaml config.yaml
  69. llm:
  70. provider: azure_openai
  71. config:
  72. model: gpt-35-turbo
  73. deployment_name: your_llm_deployment_name
  74. temperature: 0.5
  75. max_tokens: 1000
  76. top_p: 1
  77. stream: false
  78. embedder:
  79. provider: azure_openai
  80. config:
  81. model: text-embedding-ada-002
  82. deployment_name: you_embedding_model_deployment_name
  83. ```
  84. </CodeGroup>
  85. You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
  86. ## GPT4ALL
  87. GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
  88. <CodeGroup>
  89. ```python main.py
  90. from embedchain import App
  91. # load embedding model configuration from config.yaml file
  92. app = App.from_config(config_path="config.yaml")
  93. ```
  94. ```yaml config.yaml
  95. llm:
  96. provider: gpt4all
  97. config:
  98. model: 'orca-mini-3b-gguf2-q4_0.gguf'
  99. temperature: 0.5
  100. max_tokens: 1000
  101. top_p: 1
  102. stream: false
  103. embedder:
  104. provider: gpt4all
  105. ```
  106. </CodeGroup>
  107. ## Hugging Face
  108. Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
  109. <CodeGroup>
  110. ```python main.py
  111. from embedchain import App
  112. # load embedding model configuration from config.yaml file
  113. app = App.from_config(config_path="config.yaml")
  114. ```
  115. ```yaml config.yaml
  116. llm:
  117. provider: huggingface
  118. config:
  119. model: 'google/flan-t5-xxl'
  120. temperature: 0.5
  121. max_tokens: 1000
  122. top_p: 0.5
  123. stream: false
  124. embedder:
  125. provider: huggingface
  126. config:
  127. model: 'sentence-transformers/all-mpnet-base-v2'
  128. ```
  129. </CodeGroup>
  130. ## Vertex AI
  131. Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
  132. <CodeGroup>
  133. ```python main.py
  134. from embedchain import App
  135. # load embedding model configuration from config.yaml file
  136. app = App.from_config(config_path="config.yaml")
  137. ```
  138. ```yaml config.yaml
  139. llm:
  140. provider: vertexai
  141. config:
  142. model: 'chat-bison'
  143. temperature: 0.5
  144. top_p: 0.5
  145. embedder:
  146. provider: vertexai
  147. config:
  148. model: 'textembedding-gecko'
  149. ```
  150. </CodeGroup>