Feature

Value

Screen

15 inch

RAM

16 GB

Storage

512 GB SSD

`, `

` - **Attribute substrings** like `[class*="price"]` when the obfuscated class still contains a readable keyword - **Structural selectors** like `article > h2` that rely on tag hierarchy rather than class names ### Forgetting That select() Returns a List In BeautifulSoup, `select()` always returns a list, even if there is only one match: ```python # Wrong -- this is a list, not an element price = soup.select(".price") print(price.text) # AttributeError # Correct -- use select_one() for a single element price = soup.select_one(".price") print(price.text) # Or index into the list price = soup.select(".price")[0] print(price.text) ``` ### Not Handling Missing Elements A selector might return `None` (in `select_one()`) or an empty list (in `select()`). Always check: ```python price_element = soup.select_one(".product-card .price") if price_element: price = price_element.text.strip() else: price = "N/A" ``` In Playwright, using `locator()` with assertions or `count()` prevents errors on missing elements: ```javascript const priceLocator = page.locator(".product-card .price"); if ((await priceLocator.count()) > 0) { const price = await priceLocator.first().textContent(); console.log(price); } ``` ## Selector Strategy Decision Flow When you are looking at a page and trying to decide which selector to use, follow this flow: ```mermaid graph TD A[Inspect the target element] --> B{Has a unique ID?} B -->|Yes| C["Use #id"] B -->|No| D{Has a stable class name?} D -->|Yes| E["Use .class-name"] D -->|No| F{Has a data attribute?} F -->|Yes| G["Use [data-attr]"] F -->|No| H{Class contains
readable keyword?} H -->|Yes| I["Use [class*=keyword]"] H -->|No| J{Inside a semantic
tag like nav or article?} J -->|Yes| K[Use tag-based selector
like article > h2] J -->|No| L[Use structural selector
with :nth-child] ``` Start from the top and use the first strategy that gives you a reliable, stable selector. IDs and data attributes are the most resilient to site changes. Class names are next, as long as they are human-readable and not auto-generated. Structural selectors are the last resort because they depend on the exact DOM layout. ## Putting It All Together Here is a complete scraping script that uses multiple selector strategies to extract product data: ```python import requests from bs4 import BeautifulSoup response = requests.get("https://example.com/products") soup = BeautifulSoup(response.text, "html.parser") products = [] # Select all real product cards, excluding ads for card in soup.select(".product-card:not(.ad):not(.sponsored)"): product = {} # Use data attribute for ID product["id"] = card.get("data-product-id", "unknown") # Use class selector for title title_el = card.select_one(".product-title") product["title"] = title_el.text.strip() if title_el else "N/A" # Use attribute substring match for price (handles price, sale-price, etc.) price_el = card.select_one("[class*='price']") product["price"] = price_el.text.strip() if price_el else "N/A" # Use starts-with for product detail links link_el = card.select_one("a[href^='/products/']") product["url"] = link_el["href"] if link_el else None # Use nth-child for spec table inside the card specs = {} for row in card.select("table.specs tr:nth-child(n+2)"): cells = row.select("td") if len(cells) == 2: specs[cells[0].text.strip()] = cells[1].text.strip() product["specs"] = specs products.append(product) for p in products: print(f"{p['id']}: {p['title']} - {p['price']}") ``` CSS selectors cover the vast majority of element targeting needs in web scraping. They are readable, well-supported, and fast. For the rare cases where you need to traverse upward in the DOM (selecting a parent based on a child), XPath is the better tool. But for everything else -- extracting products, prices, links, tables, and structured data -- CSS selectors are the right default choice.

Laptop	$999	In Stock
Phone	$699	Out of Stock

Product Title

Description

Another Title

Laptop

Description