pinecone.mdx 2.6 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106
  1. ---
  2. title: Pinecone
  3. ---
  4. ## Overview
  5. Install pinecone related dependencies using the following command:
  6. ```bash
  7. pip install --upgrade 'pinecone-client pinecone-text'
  8. ```
  9. In order to use Pinecone as vector database, set the environment variable `PINECONE_API_KEY` which you can find on [Pinecone dashboard](https://app.pinecone.io/).
  10. <CodeGroup>
  11. ```python main.py
  12. from embedchain import App
  13. # Load pinecone configuration from yaml file
  14. app = App.from_config(config_path="pod_config.yaml")
  15. # Or
  16. app = App.from_config(config_path="serverless_config.yaml")
  17. ```
  18. ```yaml pod_config.yaml
  19. vectordb:
  20. provider: pinecone
  21. config:
  22. metric: cosine
  23. vector_dimension: 1536
  24. index_name: my-pinecone-index
  25. pod_config:
  26. environment: gcp-starter
  27. metadata_config:
  28. indexed:
  29. - "url"
  30. - "hash"
  31. ```
  32. ```yaml serverless_config.yaml
  33. vectordb:
  34. provider: pinecone
  35. config:
  36. metric: cosine
  37. vector_dimension: 1536
  38. index_name: my-pinecone-index
  39. serverless_config:
  40. cloud: aws
  41. region: us-west-2
  42. ```
  43. </CodeGroup>
  44. <br />
  45. <Note>
  46. You can find more information about Pinecone configuration [here](https://docs.pinecone.io/docs/manage-indexes#create-a-pod-based-index).
  47. You can also optionally provide `index_name` as a config param in yaml file to specify the index name. If not provided, the index name will be `{collection_name}-{vector_dimension}`.
  48. </Note>
  49. ## Usage
  50. ### Hybrid search
  51. Here is an example of how you can do hybrid search using Pinecone as a vector database through Embedchain.
  52. ```python
  53. import os
  54. from embedchain import App
  55. config = {
  56. 'app': {
  57. "config": {
  58. "id": "ec-docs-hybrid-search"
  59. }
  60. },
  61. 'vectordb': {
  62. 'provider': 'pinecone',
  63. 'config': {
  64. 'metric': 'dotproduct',
  65. 'vector_dimension': 1536,
  66. 'index_name': 'my-index',
  67. 'serverless_config': {
  68. 'cloud': 'aws',
  69. 'region': 'us-west-2'
  70. },
  71. 'hybrid_search': True, # Remember to set this for hybrid search
  72. }
  73. }
  74. }
  75. # Initialize app
  76. app = App.from_config(config=config)
  77. # Add documents
  78. app.add("/path/to/file.pdf", data_type="pdf_file", namespace="my-namespace")
  79. # Query
  80. app.query("<YOUR QUESTION HERE>", namespace="my-namespace")
  81. ```
  82. Under the hood, Embedchain fetches the relevant chunks from the documents you added by doing hybrid search on the pinecone index.
  83. If you have questions on how pinecone hybrid search works, please refer to their [offical documentation here](https://docs.pinecone.io/docs/hybrid-search).
  84. <Snippet file="missing-vector-db-tip.mdx" />