introduction.mdx 5.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131
  1. ---
  2. title: 📚 Introduction
  3. description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
  4. ---
  5. ## 🌐 What is Embedchain?
  6. Embedchain simplifies data handling by automatically processing unstructured data, breaking it into chunks, generating embeddings, and storing it in a vector database.
  7. Through various APIs, you can obtain contextual information for queries, find answers to specific questions, and engage in chat conversations using your data.
  8. ## 🔍 Search
  9. Embedchain lets you get most relevant context by doing semantic search over your data sources for a provided query. See the example below:
  10. ```python
  11. from embedchain import Pipeline as App
  12. # Initialize app
  13. app = App()
  14. # Add data source
  15. app.add("https://www.forbes.com/profile/elon-musk")
  16. # Get relevant context using semantic search
  17. context = app.search("What is the net worth of Elon?", num_documents=2)
  18. print(context)
  19. # Context:
  20. # [
  21. # {
  22. # 'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in',
  23. # 'source': 'https://www.forbes.com/profile/elon-musk',
  24. # 'document_id': 'some_document_id'
  25. # },
  26. # {
  27. # 'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University',
  28. # 'source': 'https://www.forbes.com/profile/elon-musk',
  29. # 'document_id': 'some_document_id'
  30. # }
  31. # ]
  32. ```
  33. ## ❓Query
  34. Embedchain empowers developers to ask questions and receive relevant answers through a user-friendly query API. Refer to the following example to learn how to utilize the query API:
  35. ```python
  36. from embedchain import Pipeline as App
  37. # Initialize app
  38. app = App()
  39. # Add data source
  40. app.add("https://www.forbes.com/profile/elon-musk")
  41. # Get relevant answer for your query
  42. answer = app.query("What is the net worth of Elon?")
  43. print(answer)
  44. # Answer: The net worth of Elon Musk is $221.9 billion.
  45. ```
  46. ## 💬 Chat
  47. Embedchain allows easy chatting over your data sources using a user-friendly chat API. Check out the example below to understand how to use the chat API:
  48. ```python
  49. from embedchain import Pipeline as App
  50. # Initialize app
  51. app = App()
  52. # Add data source
  53. app.add("https://www.forbes.com/profile/elon-musk")
  54. # Chat on your data using `.chat()`
  55. answer = app.chat("How much did Elon pay for Twitter?")
  56. print(answer)
  57. # Answer: Elon Musk paid $44 billion for Twitter.
  58. ```
  59. ## 🚀 Deploy
  60. Embedchain enables developers to deploy their LLM-powered apps in production using the Embedchain platform. The platform offers free access to context on your data through its REST API. Once the pipeline is deployed, you can update your data sources anytime after deployment.
  61. See the example below on how to use the deploy API:
  62. ```python
  63. from embedchain import Pipeline as App
  64. # Initialize app
  65. app = App()
  66. # Add data source
  67. app.add("https://www.forbes.com/profile/elon-musk")
  68. # Deploy your pipeline to Embedchain Platform
  69. app.deploy()
  70. # 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
  71. # ec-xxxxxx
  72. # 🛠️ Creating pipeline on the platform...
  73. # 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
  74. # 🛠️ Adding data to your pipeline...
  75. # ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
  76. ```
  77. ## 🚀 How it works?
  78. Embedchain abstracts out the following steps from you to easily create LLM powered apps:
  79. 1. Detect the data type and load data
  80. 2. Create meaningful chunks
  81. 3. Create embeddings for each chunk
  82. 4. Store chunks in a vector database
  83. When a user asks a query, the following process happens to find the answer:
  84. 1. Create an embedding for the query
  85. 2. Find similar documents for the query from the vector database
  86. 3. Pass the similar documents as context to LLM to get the final answer
  87. The process of loading the dataset and querying involves multiple steps, each with its own nuances:
  88. - How should I chunk the data? What is a meaningful chunk size?
  89. - How should I create embeddings for each chunk? Which embedding model should I use?
  90. - How should I store the chunks in a vector database? Which vector database should I use?
  91. - Should I store metadata along with the embeddings?
  92. - How should I find similar documents for a query? Which ranking model should I use?
  93. Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.