--- title: 📚 Introduction description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data' --- ## 🤔 What is Embedchain? Embedchain abstracts the entire process of loading data, chunking it, creating embeddings, and storing it in a vector database. You can add data from different data sources using the `.add()` method. Then, simply use the `.query()` method to find answers from the added datasets. If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you. ```python from embedchain import App naval_bot = App() # Add online data naval_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44") naval_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf") naval_bot.add("https://nav.al/feedback") naval_bot.add("https://nav.al/agi") naval_bot.add("The Meanings of Life", 'text', metadata={'chapter': 'philosphy'}) # Add local resources naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor.")) naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?") # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality. ``` ## 🚀 How it works? Embedchain abstracts out the following steps from you to easily create LLM powered apps: 1. Detect the data type and load data 2. Create meaningful chunks 3. Create embeddings for each chunk 4. Store chunks in a vector database When a user asks a query, the following process happens to find the answer: 1. Create an embedding for the query 2. Find similar documents for the query from the vector database 3. Pass the similar documents as context to LLM to get the final answer The process of loading the dataset and querying involves multiple steps, each with its own nuances: - How should I chunk the data? What is a meaningful chunk size? - How should I create embeddings for each chunk? Which embedding model should I use? - How should I store the chunks in a vector database? Which vector database should I use? - Should I store metadata along with the embeddings? - How should I find similar documents for a query? Which ranking model should I use? Embedchain takes care of all these nuances and provides a simple interface to create apps on any data.