Regex In Python : Complete Tutorial for Data Scientist

Regex In Python featured image

Regex is one of the basic building block in Text Mining and Analytics . Trust me some time knowing basic regex can solve your major problems . Specially if want to see a certain pattern into text data . This tutorial will help you to understand the regex  in Python –

Regex In Python-

Python has inbuilt package re . There are some functions define into it .

  1. search
  2. split
  3. sub
  4. findall

1.search() –

It will return the first occurrence object if any match in the String with the pattern . In order to understand in more detail , please refer the below code –

import re
str = "ai is helping doctors to sort out Pain in Operations"
x = re.search("ai", str)
print("Search will only capture the first occurrence ", x)

Output –

Search will only capture the first occurrence <_sre.SRE_Match object; span=(0, 2), match=’ai’>

Explanations-

Here in the string “ai” as a pattern occurred in two times . But Search captures the first occurrence only .

2.Split()-

The function is used for split the string based on matched pattern . It will return the list object . Please refer the below code base –

import re
temp_str = "The rain in Spain"
x = re.split("ai", temp_str)
print(x)

Output –

['The r', 'n in Sp', 'n']

Explanations-

Here the user pattern was ” ai ” . The above code is breaking the string once it the given pattern .

3.sub() –

When you need to replace some pattern in some string by another pattern . Please refer the below example of sub() function –

import re
str = "I am Interested in AI"
x = re.sub("\s", "%%", str)
print(x)

Output –

I%%am%%Interested%%in%%AI

Explanations-

In the output , You may see the space is filled by “%%” .

4.findall()-

We use this function to identify the match pattern in the list . Please find the below code for the reference of findall().

import re
str = "I am Data Scientist and AI developer"
x = re.findall("a", str)
print(x)


Output –

['a', 'a', 'a', 'a']

Explanations-

As We have mention in the description . “a”  is a defined pattern which is occurring four times in the list .

How to generate the pattern –

In order to generate the pattern , There are some character in python which you may use –

  1. . | Dot -This signifies the occurrence of single character .
  2. ^ – This signifies the starting pattern .
  3. $ – This signifies the ending  pattern .
  4.  * – One or more occurrence

Conclusion-

Regex is the matter of practice . Still the basic concepts are necessary .  In order to make expert hands on this topic , You need to solve real problems  of text mining . You need to practice the way of pattern creation using the symbols.  I hope you must have liked this article – Regex In Python : Complete Tutorial for Data Scientist .

As you think you want to add some information around the regex in Python , We welcome your suggestion . You may comment us or email us . If you think you want to contribute as a complete guest post . You may still provide that .

Thanks

Data Science Learner Team 

Join our list

Subscribe to our mailing list and get interesting stuff and updates to your email inbox.

Thank you for signup. A Confirmation Email has been sent to your Email Address.

Something went wrong.

Meet Abhishek ( Chief Editor) , a data scientist with major expertise in NLP and Text Analytics. He has worked on various projects involving text data and have been able to achieve great results. He is currently manages Datasciencelearner.com, where he and his team share knowledge and help others learn more about data science.
 
Thank you For sharing.We appreciate your support. Don't Forget to LIKE and FOLLOW our SITE to keep UPDATED with Data Science Learner