File Loaders
Only available on Node.js.
These loaders are used to load files given a filesystem path or a Blob object.
📄️ Folders with multiple files
This example goes over how to load data from folders with multiple files. The second argument is a map of file extensions to loader factories. Each file will be passed to the matching loader, and the resulting documents will be concatenated together.
📄️ Multiple individual files
This example goes over how to load data from multiple file paths. The second argument is a map of file extensions to loader factories. Each file will be passed to the matching loader, and the resulting documents will be concatenated together.
📄️ ChatGPT files
This example goes over how to load conversations.json from your ChatGPT data export folder. You can get your data export by email by going to: ChatGPT -> (Profile) - Settings -> Export data -> Confirm export -> Check email.
📄️ CSV
This notebook provides a quick overview for getting started with
📄️ Docx files
This example goes over how to load data from docx files.
📄️ EPUB files
This example goes over how to load data from EPUB files. By default, one document will be created for each chapter in the EPUB file, you can change this behavior by setting the splitChapters option to false.
📄️ JSON files
The JSON loader use JSON pointer to target keys in your JSON files you want to target.
📄️ JSONLines files
This example goes over how to load data from JSONLines or JSONL files. The second argument is a JSONPointer to the property to extract from each JSON object in the file. One document will be created for each JSON object in the file.
📄️ Notion markdown export
This example goes over how to load data from your Notion pages exported from the notion dashboard.
📄️ Open AI Whisper Audio
Only available on Node.js.
📄️ PDF files
This example goes over how to load data from PDF files. By default, one document will be created for each page in the PDF file, you can change this behavior by setting the splitPages option to false.
📄️ PPTX files
This example goes over how to load data from PPTX files. By default, one document will be created for all pages in the PPTX file.
📄️ Subtitles
This example goes over how to load data from subtitle files. One document will be created for each subtitles file.
📄️ Text files
This example goes over how to load data from text files.
📄️ Unstructured
This example covers how to use Unstructured.io to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more.