Orchestration layer that manages browser interactions and provides high-level APIs for AI-powered web automation.
The Python layer serves as the orchestration and integration layer for CeSail’s DOM parsing capabilities. It manages browser interactions, provides high-level APIs, and coordinates between the JavaScript parsing engine and external consumers like the MCP server.
The top-level interface for DOM parsing and web automation. This is the main entry point that orchestrates all functionality including page analysis, action execution, and screenshot capture.
from cesail.dom_parser.src import DOMParser
# Basic initialization
parser = DOMParser()
# With custom configuration
config = {
"browser": {"headless": True},
"idle_watcher": {"default_idle_time_ms": 500}
}
parser = DOMParser(config=config)
# Context manager usage (recommended)
async with DOMParser() as parser:
# Your automation code here
pass
Core Methods:
analyze_page() -> ParsedPage
- Analyze current page and return structured dataexecute_action(action, wait_for_idle=True, translate_element_id=False) -> ActionResult
- Execute web actionstake_screenshot(filepath, dimensions=None, quality=None, format=None, full_page=False, clip=None, omit_background=False, return_base64=False) -> str
- Capture screenshotsPage Management:
get_page() -> Page
- Get the current Playwright page instanceget_page_content() -> str
- Get current page’s HTML contentComponent Access (Properties):
page_analyzer -> PageAnalyzer
- Access page analysis functionalityaction_executor -> ActionExecutor
- Access action execution functionalityscreenshot_taker -> ScreenshotTaker
- Access screenshot functionalityConfiguration & Information:
get_available_actions() -> Dict[str, Any]
- Get list of all available actions and parametersContext Management:
__aenter__()
- Initialize browser and page (called automatically with async with
)__aexit__()
- Clean up resources (called automatically with async with
)Analyzes page structure and extracts actionable elements. Provides comprehensive page analysis capabilities including element extraction, selector management, and action generation.
# Get page analysis
parsed_page = await parser.analyze_page()
# Access different components
actions = parsed_page.get_actions()
forms = parsed_page.get_forms()
metadata = parsed_page.get_metadata()
elements = parsed_page.get_important_elements()
# Get selector mapping
selector_map = await parser.page_analyzer.get_selector_mapping()
selector = await parser.page_analyzer.get_selector_by_id("1")
Core Analysis:
analyze_page() -> ParsedPage
- Analyze current page and return comprehensive structured dataSelector Management:
get_selector_mapping() -> Dict[str, str]
- Get complete mapping between selector IDs and original selectorsget_selector_by_id(selector_id: str) -> Optional[str]
- Get original selector string from selector IDget_selector_id(selector: str) -> Optional[str]
- Get selector ID from original selector stringclear_selector_mapping() -> None
- Clear the selector mapping cacheProcessing Pipeline Access:
get_raw_actions() -> List[Dict[str, Any]]
- Get raw actions from processing pipelineget_grouped_actions() -> List[Dict[str, Any]]
- Get grouped actions from processing pipelineget_scored_actions() -> List[Dict[str, Any]]
- Get scored actions from processing pipelineget_transformed_actions() -> List[Dict[str, Any]]
- Get transformed actions from processing pipelineget_filtered_actions() -> List[Dict[str, Any]]
- Get filtered actions from processing pipelineget_mapped_actions() -> List[Dict[str, Any]]
- Get mapped actions from processing pipelineget_field_filtered_actions() -> List[Dict[str, Any]]
- Get field filtered actions from processing pipelineExecutes web actions through Playwright using a plugin-based architecture. Supports a comprehensive set of web automation actions organized into categories.
from cesail.dom_parser.src.py.types import Action, ActionType
# Navigation actions
navigate_action = Action(
type=ActionType.NAVIGATE,
metadata={"url": "https://example.com"}
)
back_action = Action(type=ActionType.BACK)
forward_action = Action(type=ActionType.FORWARD)
# Interaction actions
click_action = Action(
type=ActionType.CLICK,
element_id="button.submit"
)
type_action = Action(
type=ActionType.TYPE,
element_id="input#email",
text_to_type="user@example.com"
)
hover_action = Action(
type=ActionType.HOVER,
element_id="button.dropdown"
)
# Scrolling actions
scroll_action = Action(type=ActionType.SCROLL_DOWN_VIEWPORT)
scroll_by_action = Action(
type=ActionType.SCROLL_BY,
metadata={"x": 0, "y": 500}
)
# Execute actions
result = await parser.execute_action(action, wait_for_idle=True)
Navigation Actions (navigation_actions.py
):
NavigateAction
- Navigate to a URLBackAction
- Go back in browser historyForwardAction
- Go forward in browser historySwitchTabAction
- Switch to a different tabCloseTabAction
- Close the current tabSwitchToFrameAction
- Switch to an iframeSwitchToParentFrameAction
- Switch back to parent frameInteraction Actions (interaction_actions.py
):
ClickAction
- Click on an elementRightClickAction
- Right-click on an elementDoubleClickAction
- Double-click on an elementHoverAction
- Hover over an elementFocusAction
- Focus on an elementBlurAction
- Remove focus from an elementScrollToAction
- Scroll to a specific elementScrollByAction
- Scroll by specific amountScrollDownViewportAction
- Scroll down the viewportDragDropAction
- Drag and drop elementsInput Actions (input_actions.py
):
TypeAction
- Type text into an input fieldCheckAction
- Check/uncheck checkboxes and radio buttonsSelectAction
- Select options from dropdownsClearAction
- Clear input field contentPressKeyAction
- Press keyboard keysKeyDownAction
- Hold down a keyKeyUpAction
- Release a keyUploadFileAction
- Upload filesSubmitAction
- Submit formsDatePickAction
- Select dates from date pickersSliderAction
- Adjust slider valuesSystem Actions (system_actions.py
):
AlertAcceptAction
- Accept browser alertsAlertDismissAction
- Dismiss browser alertsWaitAction
- Wait for a specified timeWaitForSelectorAction
- Wait for an element to appearWaitForNavigationAction
- Wait for page navigationCore Execution:
execute_action(action: Action) -> Dict[str, Any]
- Execute a single actionexecute_actions(actions: List[Action]) -> List[Dict[str, Any]]
- Execute multiple actions in sequenceexecute_action_from_json(action_json: Dict[str, Any]) -> Dict[str, Any]
- Execute action from JSONConfiguration & Information:
get_available_actions() -> Dict[str, Any]
- Get comprehensive information about all available action plugins_get_action_plugin(action_type: ActionType) -> Optional[Type[BaseAction]]
- Get plugin class for action typeAll action plugins inherit from BaseAction
which provides:
action_type
- Property defining the action typeexecute(action)
- Abstract method for action implementation_get_element(element_id)
- Helper to get visible elements_create_success_result()
- Helper for success responses_create_error_result()
- Helper for error responsesExtending Actions: You can add custom actions by creating new action classes in the actions_plugins/
directory and registering them in the ActionExecutor.
Handles screenshot capture with configurable dimensions, quality, and coordinate conversion capabilities. The primary use case is to draw bounding boxes on the browser with selector IDs, take a screenshot, and send it to an LLM for visual analysis and action planning.
# Basic screenshot
await parser.take_screenshot("screenshot.png")
# Screenshot with custom dimensions
await parser.take_screenshot(
filepath="custom_size.png",
dimensions=(1920, 1080)
)
# High quality JPEG
await parser.take_screenshot(
filepath="high_quality.jpg",
format="jpeg",
quality=95
)
# Full page screenshot
await parser.take_screenshot(
filepath="full_page.png",
full_page=True
)
# Base64 screenshot
base64_screenshot = await parser.take_screenshot(
filepath="screenshot.png",
return_base64=True
)
Core Screenshot:
take_screenshot(filepath, dimensions=None, quality=None, format=None, full_page=False, clip=None, omit_background=False, return_base64=False) -> str
- Take screenshot with configurable parametersCoordinate Conversion:
convert_coordinates(x, y, from_resolution, to_resolution) -> Tuple[float, float]
- Convert coordinates between resolutionsconvert_coordinates_from_screenshot_to_actual(x, y) -> Tuple[float, float]
- Convert from screenshot to actual page coordinatesconvert_coordinates_from_actual_to_screenshot(x, y) -> Tuple[float, float]
- Convert from actual page to screenshot coordinatesViewport Management:
get_viewport_info() -> Dict[str, Any]
- Get stored viewport information_store_viewport_info(dimensions=None)
- Store viewport size informationconfig = {
"screenshot": {
"default_format": "png", # Default image format
"default_quality": 90 # Default JPEG quality
}
}
filepath
: Path to save the screenshotdimensions
: Tuple of (width, height) to resize pagequality
: JPEG quality (1-100), only for JPEG formatformat
: Image format (‘jpeg’, ‘png’, ‘webp’)full_page
: Whether to capture entire pageclip
: Dict with x, y, width, height for clippingomit_background
: Create transparent PNGreturn_base64
: Return base64 string instead of filepathMonitors page state and waits for stability before proceeding with actions. Uses efficient DOM mutation detection and viewport-aware analysis. The component implements a MutationObserver to watch for DOM changes and waits for DOMContentLoaded events. Note: This component can be buggy and doesn’t always reliably wait for complete page load. For critical applications, consider implementing additional wait conditions or manual timeouts.
from cesail.dom_parser.src.py.idle_watcher import wait_for_page_ready, wait_for_page_quiescence
# Wait for page to be ready (Promise-style)
ready_promise = wait_for_page_ready(page, mutation_timeout_ms=300)
await ready_promise
# Wait for page quiescence with timeout
visible_elements = await wait_for_page_quiescence(
page,
idle_ms=300,
timeout_ms=10000
)
Core Functions:
wait_for_page_ready(page, mutation_timeout_ms=300, config=None)
- Create Promise-like object for page readinesswait_for_page_quiescence(page, idle_ms=300, skip_urls=None, timeout_ms=10000, config=None)
- Wait for page stabilityEfficientIdleWatcher Class:
wait_for_dom_content_loaded()
- Wait for DOMContentLoaded eventwait_for_visible_mutations()
- Wait for DOM mutations to settleget_visible_actions()
- Get visible interactive elementsget_page_state()
- Get current page state with visible actionsstop()
- Clean up resourcesViewportAwareIdleWatcher Class:
wait_for_quiescence(timeout_ms=10000)
- Wait for page to be quiescentget_visible_elements()
- Get all visible interactive elementsanalyze_page()
- Complete page analysis with viewport infoDEFAULT_CONFIG = {
"browser": {
"headless": False,
"browser_type": "chromium",
"browser_args": [
"--disable-blink-features=AutomationControlled",
"--disable-features=IsolateOrigins,site-per-process",
"--no-sandbox",
"--enable-logging",
"--v=1"
],
"context_options": {},
"extra_http_headers": {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36...",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9...",
"Accept-Language": "en-US,en;q=0.9"
}
},
"action_executor": {
"enabled_actions": [
"click", "type", "hover", "select", "check", "clear", "submit",
"navigate", "back", "forward", "scroll_to", "scroll_by", "scroll_down_viewport",
"right_click", "double_click", "focus", "blur", "drag_drop",
"press_key", "key_down", "key_up", "upload_file",
"alert_accept", "alert_dismiss", "wait", "wait_for_selector", "wait_for_navigation",
"switch_to_frame", "switch_to_parent_frame", "switch_tab", "close_tab"
]
},
"idle_watcher": {
"default_idle_time_ms": 300,
"mutation_timeout_ms": 5000,
"network_idle_timeout_ms": 1000,
"enable_console_logging": True,
"log_idle_events": False,
"strict_idle_detection": False
},
"page_analyzer": {
"element_extraction": {
"extract_forms": True,
"extract_media": True,
"extract_links": True,
"extract_structured_data": True,
"extract_dynamic_state": True,
"extract_layout_info": True,
"extract_pagination_info": True,
"extract_meta_data": True,
"extract_document_outline": True,
"extract_text_content": True,
"actions": {
"enable_mapping": True,
"show_bounding_boxes": True,
"action_filters": {
"include_fields": ["type", "selector", "importantText"],
"exclude_fields": [],
"important_text_max_length": 250,
"trim_text_to_length": 100
}
}
}
},
"screenshot": {
"default_format": "png",
"default_quality": 90
},
"global": {
"bundle_path": None,
"enable_console_logging": False, # Prints all JS logs - can be very verbose
"log_level": "INFO"
}
}
**For complete configuration options and defaults, see:** `/Users/rachitapradeep/CeSail/dom_parser/src/py/config.py`
The action_filters.include_fields
configuration controls which fields are included in the extracted action data:
"type"
: Element type (BUTTON, LINK, INPUT, SELECT, etc.)"selector"
: CSS selector for the element (e.g., “button.submit”, “input#email”)"importantText"
: Most important text content including labels, aria-labels, placeholders, and contextual text"text"
: Raw text content of the element"bbox"
: Bounding box coordinates with x, y, width, height (normalized to viewport)"attributes"
: All HTML attributes of the element (class, id, href, etc.)"score"
: Importance score of the element (higher = more important)"object"
: Internal object reference for advanced usageText Length Limits:
important_text_max_length
: Maximum length for importantText (default: 250)trim_text_to_length
: Maximum length for text field (default: 100)import asyncio
import json
from cesail.dom_parser.src import DOMParser, Action, ActionType
async def complete_workflow():
async with DOMParser(headless=False) as parser:
# Navigate to a website
navigate_action = Action(
type=ActionType.NAVIGATE,
metadata={"url": "https://www.pinterest.com/ideas/"}
)
await parser._action_executor.execute_action(navigate_action)
# Take initial screenshot
await parser.take_screenshot("/tmp/01_after_navigation.png")
# Analyze the page
parsed_page = await parser.analyze_page()
print(f"URL: {parsed_page.metadata.url}")
print(f"Elements: {len(parsed_page.important_elements.elements)}")
print(f"Actions: {len(parsed_page.actions.actions)}")
# Print available actions
print("Available actions:")
print(json.dumps(parsed_page.to_json()["actions"], indent=2))
# Test selector functionality
selector = await parser.page_analyzer.get_selector_by_id("1")
print(f"Selector for element 1: {selector}")
# Perform scrolling and re-analysis
for i in range(3):
scroll_action = Action(type=ActionType.SCROLL_DOWN_VIEWPORT)
result = await parser.execute_action(scroll_action, wait_for_idle=True)
# Take screenshot after scroll
await parser.take_screenshot(f"/tmp/scroll_{i+1}.png")
# Re-analyze page
parsed_page = await parser.analyze_page()
# Test base64 screenshot
if i == 1:
screenshot = await parser.take_screenshot(
filepath="/tmp/base64_screenshot.png",
quality=None,
format="png",
full_page=False,
return_base64=True
)
print(f"Base64 screenshot length: {len(screenshot) if screenshot else 0}")
asyncio.run(complete_workflow())
async def screenshot_analysis():
async with DOMParser() as parser:
# Navigate and analyze
await parser._action_executor.execute_action(Action(
type=ActionType.NAVIGATE,
metadata={"url": "https://example.com"}
))
# Take different types of screenshots
# Regular screenshot
await parser.take_screenshot("regular.png")
# Full page screenshot
await parser.take_screenshot(
filepath="full_page.png",
full_page=True
)
# High quality JPEG
await parser.take_screenshot(
filepath="high_quality.jpg",
format="jpeg",
quality=95
)
# Base64 screenshot
base64_screenshot = await parser.take_screenshot(
return_base64=True,
format="png"
)
# Analyze page with different options
parsed_page = await parser.analyze_page()
# Get specific data
actions = parsed_page.get_actions()
forms = parsed_page.get_forms()
metadata = parsed_page.get_metadata()
# Print results
print(f"Page title: {metadata.title}")
print(f"Available actions: {len(actions)}")
print(f"Forms found: {len(forms)}")