data-type-handling.mdx 2.2 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354
  1. ---
  2. title: 'Data Type Handling'
  3. ---
  4. ## Automatic data type detection
  5. The add method automatically tries to detect the data_type, based on your input for the source argument. So `app.add('https://www.youtube.com/watch?v=dQw4w9WgXcQ')` is enough to embed a YouTube video.
  6. This detection is implemented for all formats. It is based on factors such as whether it's a URL, a local file, the source data type, etc.
  7. ### Debugging automatic detection
  8. Set `log_level=DEBUG` (in [AppConfig](http://localhost:3000/advanced/query_configuration#appconfig)) and make sure it's working as intended.
  9. Otherwise, you will not know when, for instance, an invalid filepath is interpreted as raw text instead.
  10. ### Forcing a data type
  11. To omit any issues with the data type detection, you can **force** a data_type by adding it as a `add` method argument.
  12. The examples below show you the keyword to force the respective `data_type`.
  13. Forcing can also be used for edge cases, such as interpreting a sitemap as a web_page, for reading its raw text instead of following links.
  14. ## Remote Data Types
  15. <Tip>
  16. **Use local files in remote data types**
  17. Some data_types are meant for remote content and only work with URLs.
  18. You can pass local files by formatting the path using the `file:` [URI scheme](https://en.wikipedia.org/wiki/File_URI_scheme), e.g. `file:///info.pdf`.
  19. </Tip>
  20. ## Reusing a vector database
  21. Default behavior is to create a persistent vector DB in the directory **./db**. You can split your application into two Python scripts: one to create a local vector DB and the other to reuse this local persistent vector DB. This is useful when you want to index hundreds of documents and separately implement a chat interface.
  22. Create a local index:
  23. ```python
  24. from embedchain import App
  25. naval_chat_bot = App()
  26. naval_chat_bot.add("https://www.youtube.com/watch?v=3qHkcs3kG44")
  27. naval_chat_bot.add("https://navalmanack.s3.amazonaws.com/Eric-Jorgenson_The-Almanack-of-Naval-Ravikant_Final.pdf")
  28. ```
  29. You can reuse the local index with the same code, but without adding new documents:
  30. ```python
  31. from embedchain import App
  32. naval_chat_bot = App()
  33. print(naval_chat_bot.query("What unique capacity does Naval argue humans possess when it comes to understanding explanations or concepts?"))
  34. ```