Beautifulsoup findall Implementation with Example : 4 Steps Only

Beautifulsoup is an open-source python package that allows you to scrap any website you want. It has many functions that quickly scrape any content from a particular or group of URLs. Beautifulsoup findall function is one of them. In this entire tutorial, you will know how to implement findall() method with steps.

Steps to Implement Beautifulsoup findall

In this section, you will know all the steps you for implementing findall() function. Just follow all the steps for a better understanding.

Step 1: Import all the necessary libraries

The first basic step is to import all the required libraries. Here I am using two libraries only. One is the requests and the other is of course Beautifulsoup. Let’s import all of them.

import requests
from bs4 import BeautifulSoup

Step 2: Get the Request from the URL

The main objective of the request module is to request the given URL and get the response and download the page data. It checks the response code like 200,404 e.t.c.

In our example, I am using our page URL for demonstration purpose only.

url = "https://www.datasciencelearner.com/java-engineer-transform-career-java-for-data-science/"
req = requests.get(url)

Step 3: Parse the HTML Page

In the above step, you have download the raw HTML data. Now you have to parse the HTML and retrieve the required data using the beautifulsoup. Add the below lines of code.

soup = BeautifulSoup(req.text, 'html.parser')

Here I am passing the two arguments inside the BeautifulSoup() method. The first is the downloaded HTML and the second is parse type that is html. parser.

If you print the print request you will get the following output.

print (req.text[:200])

Step 4: Extract the content from the data.

Now the last step is to extract the content from the scrapped data you have downloaded. For example, I want to get all the H2 tags text from a particular URL I have used. To do so I will execute the following lines of code.

h2_text = []
data = soup.find(class_ = "entry-content clearfix").find_all("h2")
for tag in data:
    h2_text.append(tag.text)
print(h2_text)

Here I am creating a list of h2_text that will store all the h2 text. In the next line, I find all the h2 tag inside the class defined inside the soup.find() method. After finding I am getting the text using the loop and tag.txt. When you run the above code you will get the following output.

Conclusion

Beautifulsoup is the best open source Python package for scrapping web content. These are steps for implementing findall() method. Using it you can easily extract desired content from the page URL. I hope you have liked this tutorial. If you have any questions then you can contact us for more help.

Source:

Beautiful Soup Documentation