Get inner div using Beautifulsoup : Stepwise scrapping Implementattion

Get inner div using Beautifulsoup _ Stepwise scrapping Implementattion

We can get inner div using Beautifulsoup by using find or findall() function from the Beautifulsoup library. There are two possible ways to achieve this. In the first option, we can directly search the class name using find() function in div components. The other one is using findAll() function from Beautifulsoup exactly similar to the earlier option of find() function with one very impacting difference of iterating all the div classes to custom control the logic in searching. This findAll() provides the flexibility to integrate the business logic like if you want to extract the div class of the same name in a different hierarchy etc.  In this article, we will discuss all these terms in more detail.

Get inner div using Beautifulsoup : ( Implementation ) –

Let’s create a dummy HTML Content to make this topic easy and explainable. Here we will extract some inner dev.

sample html with div
sample html with div

Solution 1 : Using find() method –

If you want to scrap any specific tag from the HTML file. Let’s have the above HTML String and there we have ‘class5’ where we have the text “Data Science Learner” which we need to extract. In the below code, we will use the ‘BeautifulSoup‘ library and find() function. Here is the below code.


from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
div= soup.find("div", class_="class5")
div.text
Get inner div using Beautifulsoup using find()
Get inner div using Beautifulsoup using find()

How to Use Beautifulsoup to parse html (html.parser)

Solution 2 : Using findAll() method –

findAll() function works identically similarly as find() function with the difference it extracts all the tags in list format. Here we can apply any loop like for etc to iterate and apply the business logic for extraction and scapping.


from bs4 import BeautifulSoup
content = BeautifulSoup(html)
output=''
for div in content.findAll('div', attrs={'class':'class5'}):
   output=div.text
print(output)
inner div using Beautifulsoup using findAll()
inner div using Beautifulsoup using findAll()

To read more about findAll() implementation. Read the below article.

Beautifulsoup findall Implementation with Example : 4 Steps Only

 

Note – If you do not have beautifulsoup module installed then read –

Pip Install Beautifulsoup : How to Install Beautifulsoup ( Windows & Linux )

Modulenotfounderror: no module named bs4 : Best Solution

Thanks
Data Science Learner Team

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner