--- title: 📚 Introduction description: '📝 Embedchain is a framework to easily create LLM powered bots over any dataset.' --- ## 🤔 What is Embedchain? Embedchain abstracts the entire process of loading a dataset, chunking it, creating embeddings, and storing it in a vector database. You can add a single or multiple datasets using the .add and .add_local functions. Then, simply use the .query function to find answers from the added datasets. If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you. ```python from embedchain import App naval_chat_bot = App() # Embed Online Resources naval_chat_bot.add("youtube_video", "https://www.youtube.com/watch?v=3qHkcs3kG44") naval_chat_bot.add("pdf_file", "https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf") naval_chat_bot.add("web_page", "https://nav.al/feedback") naval_chat_bot.add("web_page", "https://nav.al/agi") # Embed Local Resources naval_chat_bot.add_local("qna_pair", ("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor.")) naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?") # Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality. ``` ## 🚀 How it works? Creating a chat bot over any dataset involves the following steps: 1. Load the data 2. Create meaningful chunks 3. Create embeddings for each chunk 4. Store the chunks in a vector database When a user asks a query, the following process happens to find the answer: 1. Create an embedding for the query 2. Find similar documents for the query from the vector database 3. Pass the similar documents as context to LLM to get the final answer. The process of loading the dataset and querying involves multiple steps, each with its own nuances: - How should I chunk the data? What is a meaningful chunk size? - How should I create embeddings for each chunk? Which embedding model should I use? - How should I store the chunks in a vector database? Which vector database should I use? - Should I store metadata along with the embeddings? - How should I find similar documents for a query? Which ranking model should I use? Embedchain takes care of all these nuances and provides a simple interface to create bots over any dataset. In the first release, we make it easier for anyone to get a chatbot over any dataset up and running in less than a minute. Just create an app instance, add the datasets using the `.add()` function, and use the `.query()` function to get the relevant answers.