Regex is one of the basic building block in Text Mining and Analytics . Trust me some time knowing basic regex can solve your major problems . Specially if want to see a certain pattern into text data . This tutorial will help you to understand the regex in Python –
Regex In Python-
Python has inbuilt package re . There are some functions define into it .
- search
- split
- sub
- findall
1.search() –
It will return the first occurrence object if any match in the String with the pattern . In order to understand in more detail , please refer the below code –
import re
str = "ai is helping doctors to sort out Pain in Operations"
x = re.search("ai", str)
print("Search will only capture the first occurrence ", x)
Output –
Search will only capture the first occurrence <_sre.SRE_Match object; span=(0, 2), match=’ai’>
Explanations-
Here in the string “ai” as a pattern occurred in two times . But Search captures the first occurrence only .
2.Split()-
The function is used for split the string based on matched pattern . It will return the list object . Please refer the below code base –
import re
temp_str = "The rain in Spain"
x = re.split("ai", temp_str)
print(x)
Output –
['The r', 'n in Sp', 'n']
Explanations-
Here the user pattern was ” ai ” . The above code is breaking the string once it the given pattern .
3.sub() –
When you need to replace some pattern in some string by another pattern . Please refer the below example of sub() function –
import re
str = "I am Interested in AI"
x = re.sub("\s", "%%", str)
print(x)
Output –
I%%am%%Interested%%in%%AI
Explanations-
In the output , You may see the space is filled by “%%” .
4.findall()-
We use this function to identify the match pattern in the list . Please find the below code for the reference of findall().
import re
str = "I am Data Scientist and AI developer"
x = re.findall("a", str)
print(x)
Output –
['a', 'a', 'a', 'a']
Explanations-
As We have mention in the description . “a” is a defined pattern which is occurring four times in the list .
How to generate the pattern –
In order to generate the pattern , There are some character in python which you may use –
- . | Dot -This signifies the occurrence of single character .
- ^ – This signifies the starting pattern .
- $ – This signifies the ending pattern .
- * – One or more occurrence
Conclusion-
Regex is the matter of practice . Still the basic concepts are necessary . In order to make expert hands on this topic , You need to solve real problems of text mining . You need to practice the way of pattern creation using the symbols. I hope you must have liked this article – Regex In Python : Complete Tutorial for Data Scientist .
As you think you want to add some information around the regex in Python , We welcome your suggestion . You may comment us or email us . If you think you want to contribute as a complete guest post . You may still provide that .
Thanks
Data Science Learner Team
Join our list
Subscribe to our mailing list and get interesting stuff and updates to your email inbox.