In your project folder, create a new Python script and name it example_script.py.
Step 1: Import Required Libraries
Import needed functions and classes from playwright and agentql libraries and import the logging library.
logging provides logging, debugging, and information messages, playwright provides core browser interaction functionality and agentql adds the main AgentQL functionality.
Step 2: Launch the Browser and Open the Website
Set up logging: logging.basicConfig(level=logging.DEBUG) configures the logging to show debug-level messages.
Define URL: The URL "https://www.youtube.com/" is the target website for the script.
Start Playwright instance with sync_playwright().
Launch the browser: playwright.chromium.launch(headless=False)
Create a new page in the browser and wrap it to get access to the AgentQL's querying API: agentql.wrap(browser.new_page()).
Navigate to the website with page.goto(URL).
Step 3: Define AgentQL Queries
These queries provide the tool for communication and interaction with the right elements on the website. Ensuring you have functional and reliable queries is paramount!
Step 4: Try & Except Block
Catching Errors: Since an AgentQL query will either return an AQLResponseProxy or throw an error during the querying process, best practice suggests to encapsulate query logic with a try and catch block for better debugging.
Step 5: Execute Search Query and Interact with Search Elements
Search Query: Here we pass the SEARCH_QUERY to query specific elements on the page to interact with the search elements on YouTube page.
Type and Click: It types "machine learning" into the search input with a delay of 75ms between keystrokes and then clicks search_btn
Optional Step: Convert AgentQL response to a Python dict
Since the raw response is an AgentQL response object, it is not easy to work with it. You can use the to_data() API to convert the response to a Python dict.
The to_data() API converts the AgentQL response to a structured dict in which it replaces the response nodes with text contents of the nodes.
Sample result would be as follows:
Step 6: Execute Video Query and Interact with Video Elements
Video Query: The script runs the VIDEO_QUERY to interact with video elements on the search results page.
Click on Video: It clicks on the first video link in the results.
Logging: A debug message logs the title of the clicked video.
Step 7: Control Video Playback and Show Description
Video Control Query: Run the VIDEO_CONTROL_QUERY to interact with video controls.
Step 8: Capture and Log Video Description
Instead of converting the intermediate AgentQL response to dict via to_data() API you can call page.query_data() method to query for structured data at the first place.
Description Query: Executes the DESCRIPTION_QUERY.
Logging Description: Logs the captured description of the video.
Step 9: Scroll Down the Page to Load Comments
To load comments on a YouTube video page, we need to scroll down the page a few times.
Press PageDown Button: Presses the PageDown button to scroll down the page.
Wait for Page Ready State: Waits for the comments to load with AgentQL's wait_for_page_ready_state() method.
Step 10: Capture and Log Comments
Execute Query: Pass the session our COMMENT_QUERY to capture comment section data.
Count and Log: Here we simply log the number of comments captured.
Step 11: Stop the Browser
Call browser.close() to releasing resources and close the browser.
Step 12: Run the Script
Open a terminal in your project's folder and run the script:
Putting it all together…
Here is the complete script, if you'd like to copy it directly: