Ver código fonte

[Docs] Update docs and minor improvements in search API (#869)

Deshraj Yadav 1 ano atrás
pai
commit
d3726134b2

+ 12 - 1
README.md

@@ -64,7 +64,7 @@ For example, you can use Embedchain to create an Elon Musk bot using the followi
 
 ```python
 import os
-from embedchain import App
+from embedchain import Pipeline as App
 
 # Create a bot instance
 os.environ["OPENAI_API_KEY"] = "YOUR API KEY"
@@ -78,6 +78,17 @@ elon_bot.add("https://www.youtube.com/watch?v=RcYjXbSJBN8")
 # Query the bot
 elon_bot.query("How many companies does Elon Musk run and name those?")
 # Answer: Elon Musk currently runs several companies. As of my knowledge, he is the CEO and lead designer of SpaceX, the CEO and product architect of Tesla, Inc., the CEO and founder of Neuralink, and the CEO and founder of The Boring Company. However, please note that this information may change over time, so it's always good to verify the latest updates.
+
+# (Optional): Deploy app to Embedchain Platform
+app.deploy()
+# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
+# ec-xxxxxx
+
+# 🛠️ Creating pipeline on the platform...
+# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
+
+# 🛠️ Adding data to your pipeline...
+# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
 ```
 
 ## Examples

BIN
docs/ background.png


BIN
docs/favicon.png


+ 91 - 15
docs/get-started/introduction.mdx

@@ -3,30 +3,106 @@ title: 📚 Introduction
 description: '📝 Embedchain is a Data Platform for LLMs - load, index, retrieve, and sync any unstructured data'
 ---
 
-## 🤔 What is Embedchain?
+## 🌐 What is Embedchain?
 
-Embedchain abstracts the entire process of loading data, chunking it, creating embeddings, and storing it in a vector database.
+Embedchain simplifies data handling by automatically processing unstructured data, breaking it into chunks, generating embeddings, and storing it in a vector database.
 
-You can add data from different data sources using the `.add()` method. Then, simply use the `.query()` method to find answers from the added datasets.
+Through various APIs, you can obtain contextual information for queries, find answers to specific questions, and engage in chat conversations using your data.
+## 🔍 Search
 
-If you want to create a Naval Ravikant bot with a YouTube video, a book in PDF format, two blog posts, and a question and answer pair, all you need to do is add the respective links. Embedchain will take care of the rest, creating a bot for you.
+Embedchain lets you get most relevant context by doing semantic search over your data sources for a provided query. See the example below:
 
 ```python
 from embedchain import Pipeline as App
 
-naval_bot = App()
-# Add online data
-naval_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
-naval_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
-naval_bot.add("https://nav.al/feedback")
-naval_bot.add("https://nav.al/agi")
-naval_bot.add("The Meanings of Life", 'text', metadata={'chapter': 'philosphy'})
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant context using semantic search
+context = app.search("What is the net worth of Elon?", num_documents=2)
+print(context)
+# Context:
+# [
+#     {
+#         'context': 'Elon Musk PROFILEElon MuskCEO, Tesla$221.9BReal Time Net Worthas of 10/29/23Reflects change since 5 pm ET of prior trading day. 1 in the world todayPhoto by Martin Schoeller for ForbesAbout Elon MuskElon Musk cofounded six companies, including electric car maker Tesla, rocket producer SpaceX and tunneling startup Boring Company.He owns about 21% of Tesla between stock and options, but has pledged more than half his shares as collateral for personal loans of up to $3.5 billion.SpaceX, founded in',
+#         'source': 'https://www.forbes.com/profile/elon-musk',
+#         'document_id': 'some_document_id'
+#     },
+#     {
+#         'context': 'company, which is now called X.Wealth HistoryHOVER TO REVEAL NET WORTH BY YEARForbes Lists 1Forbes 400 (2023)The Richest Person In Every State (2023) 2Billionaires (2023) 1Innovative Leaders (2019) 25Powerful People (2018) 12Richest In Tech (2017)Global Game Changers (2016)More ListsPersonal StatsAge52Source of WealthTesla, SpaceX, Self MadeSelf-Made Score8Philanthropy Score1ResidenceAustin, TexasCitizenshipUnited StatesMarital StatusSingleChildren11EducationBachelor of Arts/Science, University',
+#         'source': 'https://www.forbes.com/profile/elon-musk',
+#         'document_id': 'some_document_id'
+#     }
+# ]
+```
+
+## ❓Query
+
+Embedchain empowers developers to ask questions and receive relevant answers through a user-friendly query API. Refer to the following example to learn how to utilize the query API:
+
+```python
+from embedchain import Pipeline as App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Get relevant answer for your query
+answer = app.query("What is the net worth of Elon?")
+print(answer)
+# Answer: The net worth of Elon Musk is $221.9 billion.
+```
+
+## 💬 Chat
+
+Embedchain allows easy chatting over your data sources using a user-friendly chat API. Check out the example below to understand how to use the chat API:
+
+```python
+from embedchain import Pipeline as App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Chat on your data using `.chat()`
+answer = app.chat("How much did Elon pay for Twitter?")
+print(answer)
+# Answer: Elon Musk paid $44 billion for Twitter.
+```
+
+## 🚀 Deploy
+
+Embedchain enables developers to deploy their LLM-powered apps in production using the Embedchain platform. The platform offers free access to context on your data through its REST API. Once the pipeline is deployed, you can update your data sources anytime after deployment.
+
+See the example below on how to use the deploy API:
+
+```python
+from embedchain import Pipeline as App
+
+# Initialize app
+app = App()
+
+# Add data source
+app.add("https://www.forbes.com/profile/elon-musk")
+
+# Deploy your pipeline to Embedchain Platform
+app.deploy()
+
+# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
+# ec-xxxxxx
 
-# Add local resources
-naval_bot.add(("Who is Naval Ravikant?", "Naval Ravikant is an Indian-American entrepreneur and investor."))
+# 🛠️ Creating pipeline on the platform...
+# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
 
-naval_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?")
-# Answer: Naval argues that humans possess the unique capacity to understand explanations or concepts to the maximum extent possible in this physical reality.
+# 🛠️ Adding data to your pipeline...
+# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
 ```
 
 ## 🚀 How it works?

+ 23 - 0
docs/get-started/quickstart.mdx

@@ -33,6 +33,19 @@ app.add("https://www.forbes.com/profile/elon-musk")
 ```python
 app.query("What is the net worth of Elon Musk today?")
 # Answer: The net worth of Elon Musk today is $258.7 billion.
+```
+  </Step>
+  <Step title="🚀 (Optional) Deploy your pipeline to Embedchain Platform">
+```python
+app.deploy()
+# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
+# ec-xxxxxx
+
+# 🛠️ Creating pipeline on the platform...
+# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
+
+# 🛠️ Adding data to your pipeline...
+# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
 ```
   </Step>
 </Steps>
@@ -55,4 +68,14 @@ app.add("https://www.forbes.com/profile/elon-musk")
 response = app.query("What is the net worth of Elon Musk today?")
 print(response)
 # Answer: The net worth of Elon Musk today is $258.7 billion.
+
+app.deploy()
+# 🔑 Enter your Embedchain API key. You can find the API key at https://app.embedchain.ai/settings/keys/
+# ec-xxxxxx
+
+# 🛠️ Creating pipeline on the platform...
+# 🎉🎉🎉 Pipeline created successfully! View your pipeline: https://app.embedchain.ai/pipelines/xxxxx
+
+# 🛠️ Adding data to your pipeline...
+# ✅ Data of type: web_page, value: https://www.forbes.com/profile/elon-musk added successfully.
 ```

Diferenças do arquivo suprimidas por serem muito extensas
+ 0 - 4
docs/logo/dark.svg


Diferenças do arquivo suprimidas por serem muito extensas
+ 0 - 4
docs/logo/light.svg


+ 27 - 6
docs/mint.json

@@ -7,9 +7,20 @@
   },
   "favicon": "/favicon.png",
   "colors": {
-    "primary": "#12A7D3",
-    "light": "#81D7F7",
-    "dark": "#004E7A"
+    "primary": "#2B48EE",
+    "light": "#2B48EE",
+    "dark": "#2B48EE",
+    "background": {
+      "dark": "#020415"
+    }
+  },
+  "metadata": {
+    "og:image": "/og.png",
+    "twitter:site": "@embedchain"
+  },
+  "topAnchor": {
+    "name": "Documentation",
+    "icon": "book-open"
   },
   "topbarLinks": [
     {
@@ -26,8 +37,11 @@
     }
   ],
   "topbarCtaButton": {
-    "name": "GitHub",
-    "url": "https://embedchain.ai"
+    "name": "Get started",
+    "url": "https://app.embedchain.ai"
+  },
+  "primaryTab": {
+    "name": "Docs"
   },
   "navigation": [
     {
@@ -111,5 +125,12 @@
   },
   "backgroundImage": "/background.png",
   "isWhiteLabeled": true,
-  "feedback.thumbsRating": true
+  "feedback": {
+    "suggestEdit": true,
+    "raiseIssue": true,
+    "thumbsRating": true
+  },
+  "search": {
+    "prompt": "✨ Search embedchain docs..."
+  }
 }

BIN
docs/og.png


+ 11 - 1
embedchain/pipeline.py

@@ -229,12 +229,22 @@ class Pipeline(EmbedChain):
         # TODO: Search will call the endpoint rather than fetching the data from the db itself when deploy=True.
         if self.id is None:
             where = {"app_id": self.local_id}
-            return self.db.query(
+            context = self.db.query(
                 query,
                 n_results=num_documents,
                 where=where,
                 skip_embedding=False,
             )
+            result = []
+            for c in context:
+                result.append(
+                    {
+                        "context": c[0],
+                        "source": c[1],
+                        "document_id": c[2],
+                    }
+                )
+            return result
         else:
             # Make API call to the backend to get the results
             NotImplementedError("Search is not implemented yet for the prod mode.")

+ 1 - 1
pyproject.toml

@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "embedchain"
-version = "0.0.83"
+version = "0.0.84"
 description = "Data platform for LLMs - Load, index, retrieve and sync any unstructured data"
 authors = [
     "Taranjeet Singh <taranjeet@embedchain.ai>",

Alguns arquivos não foram mostrados porque muitos arquivos mudaram nesse diff