123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354 |
- ---
- title: 'Data Type Handling'
- ---
- ## Automatic data type detection
- The add method automatically tries to detect the data_type, based on your input for the source argument. So `app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')` is enough to embed a YouTube video.
- This detection is implemented for all formats. It is based on factors such as whether it's a URL, a local file, the source data type, etc.
- ### Debugging automatic detection
- Set `log_level=DEBUG` (in [AppConfig](http://localhost:3000/advanced/query_configuration#appconfig)) and make sure it's working as intended.
- Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
- ### Forcing a data type
- To omit any issues with the data type detection, you can **force** a data_type by adding it as a `add` method argument.
- The examples below show you the keyword to force the respective `data_type`.
- Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
- ## Remote Data Types
- <Tip>
- **Use local files in remote data types**
- Some data_types are meant for remote content and only work with URLs.
- You can pass local files by formatting the path using the `file:` [URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme), e.g. `file:///info.pdf`.
- </Tip>
- ## Reusing a vector database
- Default behavior is to create a persistent vector DB in the directory **./db**. You can split your application into two Python scripts: one to create a local vector DB and the other to reuse this local persistent vector DB. This is useful when you want to index hundreds of documents and separately implement a chat interface.
- Create a local index:
- ```python
- from embedchain import App
- naval_chat_bot = App()
- naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
- naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
- ```
- You can reuse the local index with the same code, but without adding new documents:
- ```python
- from embedchain import App
- naval_chat_bot = App()
- print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
- ```
|