introduction.mdx 7.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221
  1. ---
  2. title: 📚 Introduction
  3. description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
  4. ---
  5. ## 🌐 What is Embedchain?
  6. Embedchain simplifies data handling by automatically processing unstructured data, breaking it into chunks, generating embeddings, and storing it in a vector database.
  7. Through various APIs, you can obtain contextual information for queries, find answers to specific questions, and engage in chat conversations using your data.
  8. ## 🔍 Search
  9. Embedchain lets you get most relevant context by doing semantic search over your data sources for a provided query. See the example below:
  10. ```python
  11. from embedchain import Pipeline as App
  12. # Initialize app
  13. app = App()
  14. # Add data source
  15. app.add("https://www.forbes.com/profile/elon-musk")
  16. # Get relevant context using semantic search
  17. context = app.search("What is the net worth of Elon?", num_documents=2)
  18. print(context)
  19. # Context:
  20. # [
  21. # {
  22. # 'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in',
  23. # 'source': 'https://www.forbes.com/profile/elon-musk',
  24. # 'document_id': 'some_document_id'
  25. # },
  26. # {
  27. # 'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University',
  28. # 'source': 'https://www.forbes.com/profile/elon-musk',
  29. # 'document_id': 'some_document_id'
  30. # }
  31. # ]
  32. ```
  33. ## ❓Query
  34. Embedchain empowers developers to ask questions and receive relevant answers through a user-friendly query API. Refer to the following example to learn how to utilize the query API:
  35. <CodeGroup>
  36. ```python With Citations
  37. from embedchain import Pipeline as App
  38. # Initialize app
  39. app = App()
  40. # Add data source
  41. app.add("https://www.forbes.com/profile/elon-musk")
  42. # Get relevant answer for your query
  43. answer, sources = app.query("What is the net worth of Elon?", citations=True)
  44. print(answer)
  45. # Answer: The net worth of Elon Musk is $221.9 billion.
  46. print(sources)
  47. # [
  48. # (
  49. # 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
  50. # 'https://www.forbes.com/profile/elon-musk',
  51. # '4651b266--4aa78839fe97'
  52. # ),
  53. # (
  54. # '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
  55. # 'https://www.forbes.com/profile/elon-musk',
  56. # '4651b266--4aa78839fe97'
  57. # ),
  58. # (
  59. # 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
  60. # 'https://www.forbes.com/profile/elon-musk',
  61. # '4651b266--4aa78839fe97'
  62. # )
  63. # ]
  64. ```
  65. ```python Without Citations
  66. from embedchain import Pipeline as App
  67. # Initialize app
  68. app = App()
  69. # Add data source
  70. app.add("https://www.forbes.com/profile/elon-musk")
  71. # Get relevant answer for your query
  72. answer = app.query("What is the net worth of Elon?")
  73. print(answer)
  74. # Answer: The net worth of Elon Musk is $221.9 billion.
  75. ```
  76. </CodeGroup>
  77. When `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order):
  78. 1. source chunk
  79. 2. link of the source document
  80. 3. document id (used for book keeping purposes)
  81. ## 💬 Chat
  82. Embedchain allows easy chatting over your data sources using a user-friendly chat API. Check out the example below to understand how to use the chat API:
  83. <CodeGroup>
  84. ```python With Citations
  85. from embedchain import Pipeline as App
  86. # Initialize app
  87. app = App()
  88. # Add data source
  89. app.add("https://www.forbes.com/profile/elon-musk")
  90. # Get relevant answer for your query
  91. answer, sources = app.chat("What is the net worth of Elon?", citations=True)
  92. print(answer)
  93. # Answer: The net worth of Elon Musk is $221.9 billion.
  94. print(sources)
  95. # [
  96. # (
  97. # 'Elon Musk PROFILEElon MuskCEO, Tesla$247.1B$2.3B (0.96%)Real Time Net Worthas of 12/7/23 ...',
  98. # 'https://www.forbes.com/profile/elon-musk',
  99. # '4651b266--4aa78839fe97'
  100. # ),
  101. # (
  102. # '74% of the company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes ...',
  103. # 'https://www.forbes.com/profile/elon-musk',
  104. # '4651b266--4aa78839fe97'
  105. # ),
  106. # (
  107. # 'founded in 2002, is worth nearly $150 billion after a $750 million tender offer in June 2023 ...',
  108. # 'https://www.forbes.com/profile/elon-musk',
  109. # '4651b266--4aa78839fe97'
  110. # )
  111. # ]
  112. ```
  113. ```python Without Citations
  114. from embedchain import Pipeline as App
  115. # Initialize app
  116. app = App()
  117. # Add data source
  118. app.add("https://www.forbes.com/profile/elon-musk")
  119. # Chat on your data using `.chat()`
  120. answer = app.chat("What is the net worth of Elon?")
  121. print(answer)
  122. # Answer: The net worth of Elon Musk is $221.9 billion.
  123. ```
  124. </CodeGroup>
  125. Similar to `query()` function, when `citations=True`, note that the returned `sources` are a list of tuples where each tuple has three elements (in the following order):
  126. 1. source chunk
  127. 2. link of the source document
  128. 3. document id (used for book keeping purposes)
  129. ## 🚀 Deploy
  130. Embedchain enables developers to deploy their LLM-powered apps in production using the Embedchain platform. The platform offers free access to context on your data through its REST API. Once the pipeline is deployed, you can update your data sources anytime after deployment.
  131. See the example below on how to use the deploy API:
  132. ```python
  133. from embedchain import Pipeline as App
  134. # Initialize app
  135. app = App()
  136. # Add data source
  137. app.add("https://www.forbes.com/profile/elon-musk")
  138. # Deploy your pipeline to Embedchain Platform
  139. app.deploy()
  140. # 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
  141. # ec-xxxxxx
  142. # 🛠️ Creating pipeline on the platform...
  143. # 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
  144. # 🛠️ Adding data to your pipeline...
  145. # ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
  146. ```
  147. ## 🛠️ How it works?
  148. Embedchain abstracts out the following steps from you to easily create LLM powered apps:
  149. 1. Detect the data type and load data
  150. 2. Create meaningful chunks
  151. 3. Create embeddings for each chunk
  152. 4. Store chunks in a vector database
  153. When a user asks a query, the following process happens to find the answer:
  154. 1. Create an embedding for the query
  155. 2. Find similar documents for the query from the vector database
  156. 3. Pass the similar documents as context to LLM to get the final answer
  157. The process of loading the dataset and querying involves multiple steps, each with its own nuances:
  158. - How should I chunk the data? What is a meaningful chunk size?
  159. - How should I create embeddings for each chunk? Which embedding model should I use?
  160. - How should I store the chunks in a vector database? Which vector database should I use?
  161. - Should I store metadata along with the embeddings?
  162. - How should I find similar documents for a query? Which ranking model should I use?
  163. Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.
  164. ## [🚀 Get started](https://docs.embedchain.ai/get-started/quickstart)