embedding-models.mdx 3.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175
  1. ---
  2. title: 🧩 Embedding models
  3. ---
  4. ## Overview
  5. Embedchain supports several embedding models from the following providers:
  6. <CardGroup cols={4}>
  7. <Card title="OpenAI" href="#openai"></Card>
  8. <Card title="Azure OpenAI" href="#azure-openai"></Card>
  9. <Card title="GPT4All" href="#gpt4all"></Card>
  10. <Card title="Hugging Face" href="#hugging-face"></Card>
  11. <Card title="Vertex AI" href="#vertex-ai"></Card>
  12. </CardGroup>
  13. ## OpenAI
  14. To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
  15. Once you have obtained the key, you can use it like this:
  16. <CodeGroup>
  17. ```python main.py
  18. import os
  19. from embedchain import App
  20. os.environ['OPENAI_API_KEY'] = 'xxx'
  21. # load embedding model configuration from config.yaml file
  22. app = App.from_config(yaml_path="config.yaml")
  23. app.add("https://en.wikipedia.org/wiki/OpenAI")
  24. app.query("What is OpenAI?")
  25. ```
  26. ```yaml config.yaml
  27. embedder:
  28. provider: openai
  29. config:
  30. model: 'text-embedding-ada-002'
  31. ```
  32. </CodeGroup>
  33. ## Azure OpenAI
  34. To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
  35. <CodeGroup>
  36. ```python main.py
  37. import os
  38. from embedchain import App
  39. os.environ["OPENAI_API_TYPE"] = "azure"
  40. os.environ["OPENAI_API_BASE"] = "https://xxx.openai.azure.com/"
  41. os.environ["OPENAI_API_KEY"] = "xxx"
  42. os.environ["OPENAI_API_VERSION"] = "xxx"
  43. app = App.from_config(yaml_path="config.yaml")
  44. ```
  45. ```yaml config.yaml
  46. llm:
  47. provider: azure_openai
  48. config:
  49. model: gpt-35-turbo
  50. deployment_name: your_llm_deployment_name
  51. temperature: 0.5
  52. max_tokens: 1000
  53. top_p: 1
  54. stream: false
  55. embedder:
  56. provider: azure_openai
  57. config:
  58. model: text-embedding-ada-002
  59. deployment_name: you_embedding_model_deployment_name
  60. ```
  61. </CodeGroup>
  62. You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
  63. ## GPT4ALL
  64. GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
  65. <CodeGroup>
  66. ```python main.py
  67. from embedchain import App
  68. # load embedding model configuration from config.yaml file
  69. app = App.from_config(yaml_path="config.yaml")
  70. ```
  71. ```yaml config.yaml
  72. llm:
  73. provider: gpt4all
  74. config:
  75. model: 'orca-mini-3b.ggmlv3.q4_0.bin'
  76. temperature: 0.5
  77. max_tokens: 1000
  78. top_p: 1
  79. stream: false
  80. embedder:
  81. provider: gpt4all
  82. config:
  83. model: 'all-MiniLM-L6-v2'
  84. ```
  85. </CodeGroup>
  86. ## Hugging Face
  87. Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
  88. <CodeGroup>
  89. ```python main.py
  90. from embedchain import App
  91. # load embedding model configuration from config.yaml file
  92. app = App.from_config(yaml_path="config.yaml")
  93. ```
  94. ```yaml config.yaml
  95. llm:
  96. provider: huggingface
  97. config:
  98. model: 'google/flan-t5-xxl'
  99. temperature: 0.5
  100. max_tokens: 1000
  101. top_p: 0.5
  102. stream: false
  103. embedder:
  104. provider: huggingface
  105. config:
  106. model: 'sentence-transformers/all-mpnet-base-v2'
  107. ```
  108. </CodeGroup>
  109. ## Vertex AI
  110. Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
  111. <CodeGroup>
  112. ```python main.py
  113. from embedchain import App
  114. # load embedding model configuration from config.yaml file
  115. app = App.from_config(yaml_path="config.yaml")
  116. ```
  117. ```yaml config.yaml
  118. llm:
  119. provider: vertexai
  120. config:
  121. model: 'chat-bison'
  122. temperature: 0.5
  123. top_p: 0.5
  124. embedder:
  125. provider: vertexai
  126. config:
  127. model: 'textembedding-gecko'
  128. ```
  129. </CodeGroup>