quickstart.mdx 1.6 KB

1234567891011121314151617181920212223242526272829303132333435363738394041424344
  1. ---
  2. title: '🚀 Pipelines'
  3. description: '💡 Start building LLM powered data pipelines in 1 minute'
  4. ---
  5. Embedchain lets you build data pipelines on your own data sources and deploy it in production in less than a minute. It can load, index, retrieve, and sync any unstructured data.
  6. Install embedchain python package:
  7. ```bash
  8. pip install embedchain
  9. ```
  10. Creating a pipeline involves 3 steps:
  11. <Steps>
  12. <Step title="⚙️ Import pipeline instance">
  13. ```python
  14. from embedchain import Pipeline
  15. p = Pipeline(name="Elon Musk")
  16. ```
  17. </Step>
  18. <Step title="🗃️ Add data sources">
  19. ```python
  20. # Add different data sources
  21. p.add("https://en.wikipedia.org/wiki/Elon_Musk")
  22. p.add("https://www.forbes.com/profile/elon-musk")
  23. # You can also add local data sources such as pdf, csv files etc.
  24. # p.add("/path/to/file.pdf")
  25. ```
  26. </Step>
  27. <Step title="💬 Deploy your pipeline to Embedchain platform">
  28. ```python
  29. p.deploy()
  30. ```
  31. </Step>
  32. </Steps>
  33. That's it. Now, head to the [Embedchain platform](https://app.embedchain.ai) and your pipeline is available there. Make sure to set the `OPENAI_API_KEY` 🔑 environment variable in the code.
  34. After you deploy your pipeline to Embedchain platform, you can still add more data sources and update the pipeline multiple times.
  35. Here is a Google Colab notebook for you to get started: [![Open in Colab](https://camo.githubusercontent.com/84f0493939e0c4de4e6dbe113251b4bfb5353e57134ffd9fcab6b8714514d4d1/68747470733a2f2f636f6c61622e72657365617263682e676f6f676c652e636f6d2f6173736574732f636f6c61622d62616467652e737667)](https://colab.research.google.com/drive/1YVXaBO4yqlHZY4ho67GCJ6aD4CHNiScD?usp=sharing)