embedding-models.mdx 3.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173
  1. ---
  2. title: 🧩 Embedding models
  3. ---
  4. ## Overview
  5. Embedchain supports several embedding models from the following providers:
  6. <CardGroup cols={4}>
  7. <Card title="OpenAI" href="#openai"></Card>
  8. <Card title="Azure OpenAI" href="#azure-openai"></Card>
  9. <Card title="GPT4All" href="#gpt4all"></Card>
  10. <Card title="Hugging Face" href="#hugging-face"></Card>
  11. <Card title="Vertex AI" href="#vertex-ai"></Card>
  12. </CardGroup>
  13. ## OpenAI
  14. To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
  15. Once you have obtained the key, you can use it like this:
  16. <CodeGroup>
  17. ```python main.py
  18. import os
  19. from embedchain import Pipeline as App
  20. os.environ['OPENAI_API_KEY'] = 'xxx'
  21. # load embedding model configuration from config.yaml file
  22. app = App.from_config(yaml_path="config.yaml")
  23. app.add("https://en.wikipedia.org/wiki/OpenAI")
  24. app.query("What is OpenAI?")
  25. ```
  26. ```yaml config.yaml
  27. embedder:
  28. provider: openai
  29. config:
  30. model: 'text-embedding-ada-002'
  31. ```
  32. </CodeGroup>
  33. ## Azure OpenAI
  34. To use Azure OpenAI embedding model, you have to set some of the azure openai related environment variables as given in the code block below:
  35. <CodeGroup>
  36. ```python main.py
  37. import os
  38. from embedchain import Pipeline as App
  39. os.environ["OPENAI_API_TYPE"] = "azure"
  40. os.environ["OPENAI_API_BASE"] = "https://xxx.openai.azure.com/"
  41. os.environ["OPENAI_API_KEY"] = "xxx"
  42. os.environ["OPENAI_API_VERSION"] = "xxx"
  43. app = App.from_config(yaml_path="config.yaml")
  44. ```
  45. ```yaml config.yaml
  46. llm:
  47. provider: azure_openai
  48. config:
  49. model: gpt-35-turbo
  50. deployment_name: your_llm_deployment_name
  51. temperature: 0.5
  52. max_tokens: 1000
  53. top_p: 1
  54. stream: false
  55. embedder:
  56. provider: azure_openai
  57. config:
  58. model: text-embedding-ada-002
  59. deployment_name: you_embedding_model_deployment_name
  60. ```
  61. </CodeGroup>
  62. You can find the list of models and deployment name on the [Azure OpenAI Platform](https://oai.azure.com/portal).
  63. ## GPT4ALL
  64. GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
  65. <CodeGroup>
  66. ```python main.py
  67. from embedchain import Pipeline as App
  68. # load embedding model configuration from config.yaml file
  69. app = App.from_config(yaml_path="config.yaml")
  70. ```
  71. ```yaml config.yaml
  72. llm:
  73. provider: gpt4all
  74. config:
  75. model: 'orca-mini-3b-gguf2-q4_0.gguf'
  76. temperature: 0.5
  77. max_tokens: 1000
  78. top_p: 1
  79. stream: false
  80. embedder:
  81. provider: gpt4all
  82. ```
  83. </CodeGroup>
  84. ## Hugging Face
  85. Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
  86. <CodeGroup>
  87. ```python main.py
  88. from embedchain import Pipeline as App
  89. # load embedding model configuration from config.yaml file
  90. app = App.from_config(yaml_path="config.yaml")
  91. ```
  92. ```yaml config.yaml
  93. llm:
  94. provider: huggingface
  95. config:
  96. model: 'google/flan-t5-xxl'
  97. temperature: 0.5
  98. max_tokens: 1000
  99. top_p: 0.5
  100. stream: false
  101. embedder:
  102. provider: huggingface
  103. config:
  104. model: 'sentence-transformers/all-mpnet-base-v2'
  105. ```
  106. </CodeGroup>
  107. ## Vertex AI
  108. Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
  109. <CodeGroup>
  110. ```python main.py
  111. from embedchain import Pipeline as App
  112. # load embedding model configuration from config.yaml file
  113. app = App.from_config(yaml_path="config.yaml")
  114. ```
  115. ```yaml config.yaml
  116. llm:
  117. provider: vertexai
  118. config:
  119. model: 'chat-bison'
  120. temperature: 0.5
  121. top_p: 0.5
  122. embedder:
  123. provider: vertexai
  124. config:
  125. model: 'textembedding-gecko'
  126. ```
  127. </CodeGroup>