embedding-models.mdx 2.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135
  1. ---
  2. title: 🧩 Embedding models
  3. ---
  4. ## Overview
  5. Embedchain supports several embedding models from the following providers:
  6. <CardGroup cols={4}>
  7. <Card title="OpenAI" href="#openai"></Card>
  8. <Card title="GPT4All" href="#gpt4all"></Card>
  9. <Card title="Hugging Face" href="#hugging-face"></Card>
  10. <Card title="Vertex AI" href="#vertex-ai"></Card>
  11. </CardGroup>
  12. ## OpenAI
  13. To use OpenAI embedding function, you have to set the `OPENAI_API_KEY` environment variable. You can obtain the OpenAI API key from the [OpenAI Platform](https://platform.openai.com/account/api-keys).
  14. Once you have obtained the key, you can use it like this:
  15. <CodeGroup>
  16. ```python main.py
  17. import os
  18. from embedchain import App
  19. os.environ['OPENAI_API_KEY'] = 'xxx'
  20. # load embedding model configuration from config.yaml file
  21. app = App.from_config(yaml_path="config.yaml")
  22. app.add("https://en.wikipedia.org/wiki/OpenAI")
  23. app.query("What is OpenAI?")
  24. ```
  25. ```yaml config.yaml
  26. embedder:
  27. provider: openai
  28. config:
  29. model: 'text-embedding-ada-002'
  30. ```
  31. </CodeGroup>
  32. ## GPT4ALL
  33. GPT4All supports generating high quality embeddings of arbitrary length documents of text using a CPU optimized contrastively trained Sentence Transformer.
  34. <CodeGroup>
  35. ```python main.py
  36. from embedchain import App
  37. # load embedding model configuration from config.yaml file
  38. app = App.from_config(yaml_path="config.yaml")
  39. ```
  40. ```yaml config.yaml
  41. llm:
  42. provider: gpt4all
  43. model: 'orca-mini-3b.ggmlv3.q4_0.bin'
  44. config:
  45. temperature: 0.5
  46. max_tokens: 1000
  47. top_p: 1
  48. stream: false
  49. embedder:
  50. provider: gpt4all
  51. config:
  52. model: 'all-MiniLM-L6-v2'
  53. ```
  54. </CodeGroup>
  55. ## Hugging Face
  56. Hugging Face supports generating embeddings of arbitrary length documents of text using Sentence Transformer library. Example of how to generate embeddings using hugging face is given below:
  57. <CodeGroup>
  58. ```python main.py
  59. from embedchain import App
  60. # load embedding model configuration from config.yaml file
  61. app = App.from_config(yaml_path="config.yaml")
  62. ```
  63. ```yaml config.yaml
  64. llm:
  65. provider: huggingface
  66. model: 'google/flan-t5-xxl'
  67. config:
  68. temperature: 0.5
  69. max_tokens: 1000
  70. top_p: 0.5
  71. stream: false
  72. embedder:
  73. provider: huggingface
  74. config:
  75. model: 'sentence-transformers/all-mpnet-base-v2'
  76. ```
  77. </CodeGroup>
  78. ## Vertex AI
  79. Embedchain supports Google's VertexAI embeddings model through a simple interface. You just have to pass the `model_name` in the config yaml and it would work out of the box.
  80. <CodeGroup>
  81. ```python main.py
  82. from embedchain import App
  83. # load embedding model configuration from config.yaml file
  84. app = App.from_config(yaml_path="config.yaml")
  85. ```
  86. ```yaml config.yaml
  87. llm:
  88. provider: vertexai
  89. model: 'chat-bison'
  90. config:
  91. temperature: 0.5
  92. top_p: 0.5
  93. embedder:
  94. provider: vertexai
  95. config:
  96. model: 'textembedding-gecko'
  97. ```
  98. </CodeGroup>