Explore Data 360 Content Parsing and Pre-Processing Methods
Learning Objectives
After completing this unit, you’ll be able to:
- Describe Data 360 content parsing and pre-processing methods.
- Explain how selecting the right parsing and pre-processing method grounds AI in trusted, accurate data.
Parsing Content
Parsing transforms unstructured documents—like PDFs, manuals, and reports—into structured data that AI systems can easily retrieve and analyze. The parsing strategy you select is critical: it determines how well document structure, relationships, and visual context are preserved–which directly impacts your system's answer accuracy, overall performance, and operational costs.
Data 360 offers three options for parsing content.
-
Default Parser: Extracts text into structured, searchable data with built-in settings.
-
LLM-based Parser: Extracts text, images, and other visual elements using an LLM.
-
Docling Parser: Extracts text and tables with layout understanding using open-source models. Combine Docling parser with image processing using LLMs to process visuals such as flow charts and images.

Let’s explore the options, so you know which one to choose for your use case.
Default Parser
The Default Parser is a highly scalable, cost-efficient solution designed specifically for text-heavy resources like knowledge base articles, policy documents, developer documentation, and internal wikis. It’s optimized to extract clean, linear text where meaning is primarily conveyed through paragraphs, lists, and headings. This makes it the ideal choice for large-scale ingestion of curated textual knowledge with minimal structural complexity.
LLM-based Parser
For highly complex, multimodal documents, full LLM-based parsing delivers the most comprehensive interpretation available across text, tables, and visuals. By enabling a holistic semantic understanding of diverse content types, it provides unparalleled contextual awareness. This depth of analysis is ideal for scenarios—such as advanced compliance analysis or engineering diagnostics—where maximizing comprehension is the primary objective and outweighs any additional processing costs.
Docling Parser
The Docling Parser is designed for enterprise documents where layout and structural relationships matter—such as financial reports, compliance documents, and operational reports. Unlike standard parsers, it delivers superior structural fidelity by interpreting complex elements like multilevel headers, merged cells, and nested tables. This deep layout awareness preserves the vital context needed to ensure highly accurate downstream retrieval and AI-generated answers.
When dealing with visually complex enterprise documents like system architectures, organizational charts, or governance workflows, standard text extraction falls short. Enabling Image Processing with Docling bridges this gap by intelligently interpreting both textual and visual elements. By using advanced LLMs to selectively analyze directional flows, labeled connections, and structural dependencies, this approach unlocks deeper contextual insights into visually represented processes without sacrificing processing efficiency.
Pre-Processing
You can also select LLM-based Visual Data Pre-Processing with the Default Parser to capture context from visual data using an LLM. LLM-based visual data pre-processing prepares multimodal elements, such as images and tables, for chunking. For example, preprocessing extracts context from images and extracts data from tables while maintaining the context and relationships within the table data.

You cannot select both LLM-based parsing and LLM-based Visual Data Pre-Processing options for a search index. Use LLM-based Parsing for documents that contain rich visual content—such as images, charts, and tables—throughout the documents. In such cases, processing the entire document holistically provides better context and understanding.
Summary
Parsing and pre-processing is the critical process of transforming unstructured documents like PDFs and reports into structured data that AI systems can analyze. Data 360 provides multiple options to align parsing strategy with maximum contextual understanding, scale, and efficiency. The strategy you choose directly impacts your system's accuracy, performance, and costs.
Take the next step to discover search index types in Data 360 and identify the right search index to build for your use case with the Search Index Types in Data 360: Quick Look module.