vector-databases.mdx 5.7 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258
  1. ---
  2. title: 🗄️ Vector databases
  3. ---
  4. ## Overview
  5. Utilizing a vector database alongside Embedchain is a seamless process. All you need to do is configure it within the YAML configuration file. We've provided examples for each supported database below:
  6. <CardGroup cols={4}>
  7. <Card title="ChromaDB" href="#chromadb"></Card>
  8. <Card title="Elasticsearch" href="#elasticsearch"></Card>
  9. <Card title="OpenSearch" href="#opensearch"></Card>
  10. <Card title="Zilliz" href="#zilliz"></Card>
  11. <Card title="LanceDB" href="#lancedb"></Card>
  12. <Card title="Pinecone" href="#pinecone"></Card>
  13. <Card title="Qdrant" href="#qdrant"></Card>
  14. <Card title="Weaviate" href="#weaviate"></Card>
  15. </CardGroup>
  16. ## ChromaDB
  17. <CodeGroup>
  18. ```python main.py
  19. from embedchain import App
  20. # load chroma configuration from yaml file
  21. app = App.from_config(config_path="config1.yaml")
  22. ```
  23. ```yaml config1.yaml
  24. vectordb:
  25. provider: chroma
  26. config:
  27. collection_name: 'my-collection'
  28. dir: db
  29. allow_reset: true
  30. ```
  31. ```yaml config2.yaml
  32. vectordb:
  33. provider: chroma
  34. config:
  35. collection_name: 'my-collection'
  36. host: localhost
  37. port: 5200
  38. allow_reset: true
  39. ```
  40. </CodeGroup>
  41. ## Elasticsearch
  42. Install related dependencies using the following command:
  43. ```bash
  44. pip install --upgrade 'embedchain[elasticsearch]'
  45. ```
  46. <Note>
  47. You can configure the Elasticsearch connection by providing either `es_url` or `cloud_id`. If you are using the Elasticsearch Service on Elastic Cloud, you can find the `cloud_id` on the [Elastic Cloud dashboard](https://cloud.elastic.co/deployments).
  48. </Note>
  49. You can authorize the connection to Elasticsearch by providing either `basic_auth`, `api_key`, or `bearer_auth`.
  50. <CodeGroup>
  51. ```python main.py
  52. from embedchain import App
  53. # load elasticsearch configuration from yaml file
  54. app = App.from_config(config_path="config.yaml")
  55. ```
  56. ```yaml config.yaml
  57. vectordb:
  58. provider: elasticsearch
  59. config:
  60. collection_name: 'es-index'
  61. cloud_id: 'deployment-name:xxxx'
  62. basic_auth:
  63. - elastic
  64. - <your_password>
  65. verify_certs: false
  66. ```
  67. </CodeGroup>
  68. ## OpenSearch
  69. Install related dependencies using the following command:
  70. ```bash
  71. pip install --upgrade 'embedchain[opensearch]'
  72. ```
  73. <CodeGroup>
  74. ```python main.py
  75. from embedchain import App
  76. # load opensearch configuration from yaml file
  77. app = App.from_config(config_path="config.yaml")
  78. ```
  79. ```yaml config.yaml
  80. vectordb:
  81. provider: opensearch
  82. config:
  83. collection_name: 'my-app'
  84. opensearch_url: 'https://localhost:9200'
  85. http_auth:
  86. - admin
  87. - admin
  88. vector_dimension: 1536
  89. use_ssl: false
  90. verify_certs: false
  91. ```
  92. </CodeGroup>
  93. ## Zilliz
  94. Install related dependencies using the following command:
  95. ```bash
  96. pip install --upgrade 'embedchain[milvus]'
  97. ```
  98. Set the Zilliz environment variables `ZILLIZ_CLOUD_URI` and `ZILLIZ_CLOUD_TOKEN` which you can find it on their [cloud platform](https://cloud.zilliz.com/).
  99. <CodeGroup>
  100. ```python main.py
  101. import os
  102. from embedchain import App
  103. os.environ['ZILLIZ_CLOUD_URI'] = 'https://xxx.zillizcloud.com'
  104. os.environ['ZILLIZ_CLOUD_TOKEN'] = 'xxx'
  105. # load zilliz configuration from yaml file
  106. app = App.from_config(config_path="config.yaml")
  107. ```
  108. ```yaml config.yaml
  109. vectordb:
  110. provider: zilliz
  111. config:
  112. collection_name: 'zilliz_app'
  113. uri: https://xxxx.api.gcp-region.zillizcloud.com
  114. token: xxx
  115. vector_dim: 1536
  116. metric_type: L2
  117. ```
  118. </CodeGroup>
  119. ## LanceDB
  120. _Coming soon_
  121. ## Pinecone
  122. Install pinecone related dependencies using the following command:
  123. ```bash
  124. pip install --upgrade 'embedchain[pinecone]'
  125. ```
  126. In order to use Pinecone as vector database, set the environment variable `PINECONE_API_KEY` which you can find on [Pinecone dashboard](https://app.pinecone.io/).
  127. <CodeGroup>
  128. ```python main.py
  129. from embedchain import App
  130. # load pinecone configuration from yaml file
  131. app = App.from_config(config_path="pod_config.yaml")
  132. # or
  133. app = App.from_config(config_path="serverless_config.yaml")
  134. ```
  135. ```yaml pod_config.yaml
  136. vectordb:
  137. provider: pinecone
  138. config:
  139. metric: cosine
  140. vector_dimension: 1536
  141. index_name: my-pinecone-index
  142. pod_config:
  143. environment: gcp-starter
  144. metadata_config:
  145. indexed:
  146. - "url"
  147. - "hash"
  148. ```
  149. ```yaml serverless_config.yaml
  150. vectordb:
  151. provider: pinecone
  152. config:
  153. metric: cosine
  154. vector_dimension: 1536
  155. index_name: my-pinecone-index
  156. serverless_config:
  157. cloud: aws
  158. region: us-west-2
  159. ```
  160. </CodeGroup>
  161. <br />
  162. <Note>
  163. You can find more information about Pinecone configuration [here](https://docs.pinecone.io/docs/manage-indexes#create-a-pod-based-index).
  164. You can also optionally provide `index_name` as a config param in yaml file to specify the index name. If not provided, the index name will be `{collection_name}-{vector_dimension}`.
  165. </Note>
  166. ## Qdrant
  167. In order to use Qdrant as a vector database, set the environment variables `QDRANT_URL` and `QDRANT_API_KEY` which you can find on [Qdrant Dashboard](https://cloud.qdrant.io/).
  168. <CodeGroup>
  169. ```python main.py
  170. from embedchain import App
  171. # load qdrant configuration from yaml file
  172. app = App.from_config(config_path="config.yaml")
  173. ```
  174. ```yaml config.yaml
  175. vectordb:
  176. provider: qdrant
  177. config:
  178. collection_name: my_qdrant_index
  179. ```
  180. </CodeGroup>
  181. ## Weaviate
  182. In order to use Weaviate as a vector database, set the environment variables `WEAVIATE_ENDPOINT` and `WEAVIATE_API_KEY` which you can find on [Weaviate dashboard](https://console.weaviate.cloud/dashboard).
  183. <CodeGroup>
  184. ```python main.py
  185. from embedchain import App
  186. # load weaviate configuration from yaml file
  187. app = App.from_config(config_path="config.yaml")
  188. ```
  189. ```yaml config.yaml
  190. vectordb:
  191. provider: weaviate
  192. config:
  193. collection_name: my_weaviate_index
  194. ```
  195. </CodeGroup>
  196. <Snippet file="missing-vector-db-tip.mdx" />