Document Loaders
Explore how to use document loaders in LangChainGo to extract and convert HTML, text, and CSV documents into manageable chunks for AI processing. Understand different splitting strategies and see practical Go examples that prepare document data for chaining and embedding workflows.
We'll cover the following...
Document loaders provide a way to extract data from a configured source and convert them into a slice of schema.Document in langchaingo. Many loader implementations are supported, including HTML, text, PDF, CSV, and more.
A splitter works alongside a document loader to divide the document into manageable chunks. langchaingo also defines a number of splitting strategies including:
Token-based: This implementation splits text by tokens.
Recursive: It splits texts recursively by different characters.
Markdown: It is used to parse and chunk Markdown files.
Let's take a look at how to use langchaingo to load documents from different sources and split them for further processing.
HTML document loader
The HTML loader in langchaingo makes it possible to load arbitrary HTML document and make it ready for further processing using other components such as chains, etc. Let's walk through an ...