Beautifulsoup is the popular python package that allows you to scrape web content easily. There are many methods for scrapping the content. Beautifulsoup select() method is one of them. The select() method is a CSS selector that allows extracting content inside the defined CSS path in as an argument to the method. In this entire tutorial, you will know how to implement beautifulsoup select in python with steps.
Steps to implement beautifulsoup select()
In this section you will know all the steps to scrap the content of the HTML document using the select() method.
Step 1: Import the necessary package
In our example I am using only beautifulsoup package, so importing it using the import statement.
from bs4 import BeautifulSoup
Step 2: Create a Sample HTML document
For the sake of simplicity, I am creating a demo HTML document that makes it easy to understand. Below is the document. However, you can also use a live URL and get the content using the requests python package.
data = """
<html>
<head>
<title>Data Science Learner</title>
</head>
<body>
<p class="title"> id="title" <b>Data Science Learner Links</b></p>
<p class="links">Links
<a href="http://example.com/dsl1" class="element" id="link1">1</a>
<a href="http://example.com/dsl2" class="element" id="link2">2</a>
<a href="http://example.com/dsl3" class="avatar" id="link3">3</a>
<p> line ends</p>
</body>
</htm>
"""
Step 3: Parse the HTML
Before extracting the content from the document, you have to parse the HTML document. To do so you have to pass the data and the html.parser as an argument to the BeautifulSoup() method.
soup = BeautifulSoup(data, "html.parser")
Step 4: Find the content using beautifulsoup select method
Now the last method is to extract the content from the HTML document using the beautifulsoup select() method. Inside the select() method you have to find the CSS like class name or id to get the content from that class.
For example, I want to get the head class content then I will use the below lines of code.
soup.select("head")
Output
In the same way, suppose I want to get the title inside this head class then I will use the below code.
soup.select("head > title")
Output
The select() method outputs the list therefore you have to use the index to go inside the list and extract content from that. Below is the code for that.
soup.select("head > title")[0].text
Output
You can also extract the body contents using the above step.
That’s all for now. Hope you have liked this tutorial on how to implement the select() method. If you have any queries then you can contact us for more information.
Source:
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.