introduction.mdx 2.6 KB

12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455
  1. ---
  2. title: 📚 Introduction
  3. description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
  4. ---
  5. ## 🤔 What is Embedchain?
  6. Embedchain abstracts the entire process of loading data, chunking it, creating embeddings, and storing it in a vector database.
  7. You can add data from different data sources using the `.add()` method. Then, simply use the `.query()` method to find answers from the added datasets.
  8. If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
  9. ```python
  10. from embedchain import App
  11. naval_bot = App()
  12. # Add online data
  13. naval_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
  14. naval_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
  15. naval_bot.add("https://nav.al/feedback")
  16. naval_bot.add("https://nav.al/agi")
  17. naval_bot.add("The Meanings of Life", 'text', metadata={'chapter': 'philosphy'})
  18. # Add local resources
  19. naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
  20. naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
  21. # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
  22. ```
  23. ## 🚀 How it works?
  24. Embedchain abstracts out the following steps from you to easily create LLM powered apps:
  25. 1. Detect the data type and load data
  26. 2. Create meaningful chunks
  27. 3. Create embeddings for each chunk
  28. 4. Store chunks in a vector database
  29. When a user asks a query, the following process happens to find the answer:
  30. 1. Create an embedding for the query
  31. 2. Find similar documents for the query from the vector database
  32. 3. Pass the similar documents as context to LLM to get the final answer
  33. The process of loading the dataset and querying involves multiple steps, each with its own nuances:
  34. - How should I chunk the data? What is a meaningful chunk size?
  35. - How should I create embeddings for each chunk? Which embedding model should I use?
  36. - How should I store the chunks in a vector database? Which vector database should I use?
  37. - Should I store metadata along with the embeddings?
  38. - How should I find similar documents for a query? Which ranking model should I use?
  39. Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.