File Loaders

Compatibility

Only available on Node.js.

These loaders are used to load files given a filesystem path or a Blob object.

📄️ Folders with multiple files

This example goes over how to load data from folders with multiple files. The second argument is a map of file extensions to loader factories. Each file will be passed to the matching loader, and the resulting documents will be concatenated together.

📄️ Multiple individual files

This example goes over how to load data from multiple file paths. The second argument is a map of file extensions to loader factories. Each file will be passed to the matching loader, and the resulting documents will be concatenated together.

📄️ ChatGPT files

This example goes over how to load conversations.json from your ChatGPT data export folder. You can get your data export by email by going to: ChatGPT -> (Profile) - Settings -> Export data -> Confirm export -> Check email.

📄️ CSV

This notebook provides a quick overview for getting started with

📄️ Docx files

This example goes over how to load data from docx files.

📄️ EPUB files

This example goes over how to load data from EPUB files. By default, one document will be created for each chapter in the EPUB file, you can change this behavior by setting the splitChapters option to false.

📄️ JSON files

The JSON loader use JSON pointer to target keys in your JSON files you want to target.

📄️ JSONLines files

This example goes over how to load data from JSONLines or JSONL files. The second argument is a JSONPointer to the property to extract from each JSON object in the file. One document will be created for each JSON object in the file.

📄️ Notion markdown export

This example goes over how to load data from your Notion pages exported from the notion dashboard.

📄️ Open AI Whisper Audio

Only available on Node.js.

📄️ PDF files

This example goes over how to load data from PDF files. By default, one document will be created for each page in the PDF file, you can change this behavior by setting the splitPages option to false.

📄️ PPTX files

This example goes over how to load data from PPTX files. By default, one document will be created for all pages in the PPTX file.

📄️ Subtitles

This example goes over how to load data from subtitle files. One document will be created for each subtitles file.

📄️ Text files

This example goes over how to load data from text files.

📄️ Unstructured

This example covers how to use Unstructured.io to load files of many types. Unstructured currently supports loading of text files, powerpoints, html, pdfs, images, and more.