introduction.mdx 2.7 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556
  1. ---
  2. title: 📚 Introduction
  3. description: '📝 Embedchain is a framework to easily create LLM powered bots over any dataset.'
  4. ---
  5. ## 🤔 What is Embedchain?
  6. Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database.
  7. You can add a single or multiple datasets using the `.add` method. Then, simply use the `.query` method to find answers from the added datasets.
  8. If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
  9. ```python
  10. from embedchain import App
  11. naval_chat_bot = App()
  12. # Embed Online Resources
  13. naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
  14. naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
  15. naval_chat_bot.add("https://nav.al/feedback")
  16. naval_chat_bot.add("https://nav.al/agi")
  17. # Embed Local Resources
  18. naval_chat_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
  19. naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
  20. # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
  21. ```
  22. ## 🚀 How it works?
  23. Creating a chat bot over any dataset involves the following steps:
  24. 1. Detect the data type and load the data
  25. 2. Create meaningful chunks
  26. 3. Create embeddings for each chunk
  27. 4. Store the chunks in a vector database
  28. When a user asks a query, the following process happens to find the answer:
  29. 1. Create an embedding for the query
  30. 2. Find similar documents for the query from the vector database
  31. 3. Pass the similar documents as context to LLM to get the final answer.
  32. The process of loading the dataset and querying involves multiple steps, each with its own nuances:
  33. - How should I chunk the data? What is a meaningful chunk size?
  34. - How should I create embeddings for each chunk? Which embedding model should I use?
  35. - How should I store the chunks in a vector database? Which vector database should I use?
  36. - Should I store metadata along with the embeddings?
  37. - How should I find similar documents for a query? Which ranking model should I use?
  38. Embedchain takes care of all these nuances and provides a simple interface to create bots over any dataset.
  39. In the first release, we make it easier for anyone to get a chatbot over any dataset up and running in less than a minute. Just create an app instance, add the datasets using the `.add` method, and use the `.query` method to get the relevant answers.