CeSail

DOM Parser Python Layer

Orchestration layer that manages browser interactions and provides high-level APIs for AI-powered web automation.

Overview

The Python layer serves as the orchestration and integration layer for CeSail’s DOM parsing capabilities. It manages browser interactions, provides high-level APIs, and coordinates between the JavaScript parsing engine and external consumers like the MCP server.

Core Components

DOMParser Class

The top-level interface for DOM parsing and web automation. This is the main entry point that orchestrates all functionality including page analysis, action execution, and screenshot capture.

from cesail.dom_parser.src import DOMParser

# Basic initialization
parser = DOMParser()

# With custom configuration
config = {
    "browser": {"headless": True},
    "idle_watcher": {"default_idle_time_ms": 500}
}
parser = DOMParser(config=config)

# Context manager usage (recommended)
async with DOMParser() as parser:
    # Your automation code here
    pass

Supported APIs

Core Methods:

Page Management:

Component Access (Properties):

Configuration & Information:

Context Management:

PageAnalyzer

Analyzes page structure and extracts actionable elements. Provides comprehensive page analysis capabilities including element extraction, selector management, and action generation.

# Get page analysis
parsed_page = await parser.analyze_page()

# Access different components
actions = parsed_page.get_actions()
forms = parsed_page.get_forms()
metadata = parsed_page.get_metadata()
elements = parsed_page.get_important_elements()

# Get selector mapping
selector_map = await parser.page_analyzer.get_selector_mapping()
selector = await parser.page_analyzer.get_selector_by_id("1")

Supported APIs

Core Analysis:

Selector Management:

Processing Pipeline Access:

ActionExecutor

Executes web actions through Playwright using a plugin-based architecture. Supports a comprehensive set of web automation actions organized into categories.

from cesail.dom_parser.src.py.types import Action, ActionType

# Navigation actions
navigate_action = Action(
    type=ActionType.NAVIGATE,
    metadata={"url": "https://example.com"}
)

back_action = Action(type=ActionType.BACK)
forward_action = Action(type=ActionType.FORWARD)

# Interaction actions
click_action = Action(
    type=ActionType.CLICK,
    element_id="button.submit"
)

type_action = Action(
    type=ActionType.TYPE,
    element_id="input#email",
    text_to_type="user@example.com"
)

hover_action = Action(
    type=ActionType.HOVER,
    element_id="button.dropdown"
)

# Scrolling actions
scroll_action = Action(type=ActionType.SCROLL_DOWN_VIEWPORT)

scroll_by_action = Action(
    type=ActionType.SCROLL_BY,
    metadata={"x": 0, "y": 500}
)

# Execute actions
result = await parser.execute_action(action, wait_for_idle=True)

Supported Action Plugins

Navigation Actions (navigation_actions.py):

Interaction Actions (interaction_actions.py):

Input Actions (input_actions.py):

System Actions (system_actions.py):

Supported APIs

Core Execution:

Configuration & Information:

Action Plugin Architecture

All action plugins inherit from BaseAction which provides:

Extending Actions: You can add custom actions by creating new action classes in the actions_plugins/ directory and registering them in the ActionExecutor.

ScreenshotTaker

Handles screenshot capture with configurable dimensions, quality, and coordinate conversion capabilities. The primary use case is to draw bounding boxes on the browser with selector IDs, take a screenshot, and send it to an LLM for visual analysis and action planning.

# Basic screenshot
await parser.take_screenshot("screenshot.png")

# Screenshot with custom dimensions
await parser.take_screenshot(
    filepath="custom_size.png",
    dimensions=(1920, 1080)
)

# High quality JPEG
await parser.take_screenshot(
    filepath="high_quality.jpg",
    format="jpeg",
    quality=95
)

# Full page screenshot
await parser.take_screenshot(
    filepath="full_page.png",
    full_page=True
)

# Base64 screenshot
base64_screenshot = await parser.take_screenshot(
    filepath="screenshot.png",
    return_base64=True
)

Supported APIs

Core Screenshot:

Coordinate Conversion:

Viewport Management:

Features

Configuration

config = {
    "screenshot": {
        "default_format": "png",    # Default image format
        "default_quality": 90       # Default JPEG quality
    }
}

Parameters

IdleWatcher

Monitors page state and waits for stability before proceeding with actions. Uses efficient DOM mutation detection and viewport-aware analysis. The component implements a MutationObserver to watch for DOM changes and waits for DOMContentLoaded events. Note: This component can be buggy and doesn’t always reliably wait for complete page load. For critical applications, consider implementing additional wait conditions or manual timeouts.

from cesail.dom_parser.src.py.idle_watcher import wait_for_page_ready, wait_for_page_quiescence

# Wait for page to be ready (Promise-style)
ready_promise = wait_for_page_ready(page, mutation_timeout_ms=300)
await ready_promise

# Wait for page quiescence with timeout
visible_elements = await wait_for_page_quiescence(
    page, 
    idle_ms=300, 
    timeout_ms=10000
)

Supported APIs

Core Functions:

EfficientIdleWatcher Class:

ViewportAwareIdleWatcher Class:

Features

Configuration

Default Configuration

DEFAULT_CONFIG = {
    "browser": {
        "headless": False,
        "browser_type": "chromium",
        "browser_args": [
            "--disable-blink-features=AutomationControlled",
            "--disable-features=IsolateOrigins,site-per-process",
            "--no-sandbox",
            "--enable-logging",
            "--v=1"
        ],
        "context_options": {},
        "extra_http_headers": {
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...",
            "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9...",
            "Accept-Language": "en-US,en;q=0.9"
        }
    },
    "action_executor": {
        "enabled_actions": [
            "click", "type", "hover", "select", "check", "clear", "submit",
            "navigate", "back", "forward", "scroll_to", "scroll_by", "scroll_down_viewport",
            "right_click", "double_click", "focus", "blur", "drag_drop",
            "press_key", "key_down", "key_up", "upload_file",
            "alert_accept", "alert_dismiss", "wait", "wait_for_selector", "wait_for_navigation",
            "switch_to_frame", "switch_to_parent_frame", "switch_tab", "close_tab"
        ]
    },
    "idle_watcher": {
        "default_idle_time_ms": 300,
        "mutation_timeout_ms": 5000,
        "network_idle_timeout_ms": 1000,
        "enable_console_logging": True,
        "log_idle_events": False,
        "strict_idle_detection": False
    },
    "page_analyzer": {
        "element_extraction": {
            "extract_forms": True,
            "extract_media": True,
            "extract_links": True,
            "extract_structured_data": True,
            "extract_dynamic_state": True,
            "extract_layout_info": True,
            "extract_pagination_info": True,
            "extract_meta_data": True,
            "extract_document_outline": True,
            "extract_text_content": True,
            "actions": {
                "enable_mapping": True,
                "show_bounding_boxes": True,
                "action_filters": {
                    "include_fields": ["type", "selector", "importantText"],
                    "exclude_fields": [],
                    "important_text_max_length": 250,
                    "trim_text_to_length": 100
                }
            }
        }
    },
    "screenshot": {
        "default_format": "png",
        "default_quality": 90
    },
    "global": {
        "bundle_path": None,
        "enable_console_logging": False, # Prints all JS logs - can be very verbose
        "log_level": "INFO"
    }
}

**For complete configuration options and defaults, see:** `/Users/rachitapradeep/CeSail/dom_parser/src/py/config.py`

Available Action Filter Fields

The action_filters.include_fields configuration controls which fields are included in the extracted action data:

Text Length Limits:

Advanced Usage Examples

Complete Web Automation Workflow

import asyncio
import json
from cesail.dom_parser.src import DOMParser, Action, ActionType

async def complete_workflow():
    async with DOMParser(headless=False) as parser:
        # Navigate to a website
        navigate_action = Action(
            type=ActionType.NAVIGATE,
            metadata={"url": "https://www.pinterest.com/ideas/"}
        )
        await parser._action_executor.execute_action(navigate_action)
        
        # Take initial screenshot
        await parser.take_screenshot("/tmp/01_after_navigation.png")
        
        # Analyze the page
        parsed_page = await parser.analyze_page()
        print(f"URL: {parsed_page.metadata.url}")
        print(f"Elements: {len(parsed_page.important_elements.elements)}")
        print(f"Actions: {len(parsed_page.actions.actions)}")
        
        # Print available actions
        print("Available actions:")
        print(json.dumps(parsed_page.to_json()["actions"], indent=2))
        
        # Test selector functionality
        selector = await parser.page_analyzer.get_selector_by_id("1")
        print(f"Selector for element 1: {selector}")
        
        # Perform scrolling and re-analysis
        for i in range(3):
            scroll_action = Action(type=ActionType.SCROLL_DOWN_VIEWPORT)
            result = await parser.execute_action(scroll_action, wait_for_idle=True)
            
            # Take screenshot after scroll
            await parser.take_screenshot(f"/tmp/scroll_{i+1}.png")
            
            # Re-analyze page
            parsed_page = await parser.analyze_page()
            
            # Test base64 screenshot
            if i == 1:
                screenshot = await parser.take_screenshot(
                    filepath="/tmp/base64_screenshot.png",
                    quality=None,
                    format="png",
                    full_page=False,
                    return_base64=True
                )
                print(f"Base64 screenshot length: {len(screenshot) if screenshot else 0}")

asyncio.run(complete_workflow())

Screenshot and Analysis

async def screenshot_analysis():
    async with DOMParser() as parser:
        # Navigate and analyze
        await parser._action_executor.execute_action(Action(
            type=ActionType.NAVIGATE,
            metadata={"url": "https://example.com"}
        ))
        
        # Take different types of screenshots
        # Regular screenshot
        await parser.take_screenshot("regular.png")
        
        # Full page screenshot
        await parser.take_screenshot(
            filepath="full_page.png",
            full_page=True
        )
        
        # High quality JPEG
        await parser.take_screenshot(
            filepath="high_quality.jpg",
            format="jpeg",
            quality=95
        )
        
        # Base64 screenshot
        base64_screenshot = await parser.take_screenshot(
            return_base64=True,
            format="png"
        )
        
        # Analyze page with different options
        parsed_page = await parser.analyze_page()
        
        # Get specific data
        actions = parsed_page.get_actions()
        forms = parsed_page.get_forms()
        metadata = parsed_page.get_metadata()
        
        # Print results
        print(f"Page title: {metadata.title}")
        print(f"Available actions: {len(actions)}")
        print(f"Forms found: {len(forms)}")