Core DOM parsing engine that transforms raw HTML into structured, agent-friendly data.
The JavaScript layer is the heart of CeSail’s DOM parsing capabilities. It runs directly in the browser context and provides comprehensive element extraction, analysis, and transformation functionality.
index.js
: Main entry point and public APIaction-extraction.js
: Extracts actionable elements and metadatafilter-elements.js
: Filters and groups elements by importancescoring.js
: Scores elements based on visibility and interactivityselector-extraction.js
: Generates reliable CSS selectorsvisualizer.js
: Visual debugging and element highlightingcache-manager.js
: Performance optimization and cachingutility-functions.js
: Common utility functionsconstants.js
: Configuration constants and weightsperf.js
: Performance monitoring and profilingThe JavaScript layer transforms raw HTML into structured, agent-friendly JSON:
// Raw HTML input
<button class="btn-primary" onclick="submit()">Submit Form</button>
<input type="text" placeholder="Enter email" id="email" />
// CeSail transforms to agent-friendly JSON
{
"type": "BUTTON",
"selector": "button.btn-primary",
"text": "Submit Form",
"action": "CLICK",
"importance": 0.9,
"context": "form submission",
"metadata": {
"aria-label": null,
"disabled": false,
"visible": true
}
}
This layer is automatically injected into web pages by the Python DOM Parser and provides APIs for:
The JavaScript layer operates as a browser-injected script that:
The JavaScript layer must be built before it can be used by the Python DOM Parser.
cd dom_parser/
npm install # Installs Rollup and build dependencies
npm run build
This creates the bundled JavaScript file at dom_parser/dist/dom-parser.js
which contains:
src/js/
The Python DOMParser
automatically injects the built JavaScript bundle into every browser page:
# The bundle is automatically loaded from:
bundle_path = Path(__file__).parent.parent / "dist" / "dom-parser.js"
# And injected as an init script:
await self.context.add_init_script(path=str(self.bundle_path))
The JavaScript is built using Rollup with the following outputs:
dist/dom-parser.js
) - Main bundle for browser injectiondist/dom-parser.esm.js
) - For modern module systemsdist/dom-parser.umd.js
) - For Node.js compatibility# Watch mode for development
npm run dev
# Clean and rebuild
npm run clean && npm run build
# Simple build (alternative)
npm run build:simple
Note: Always rebuild the JavaScript bundle after making changes to files in src/js/
before testing the Python layer.