Regex is one of the basic building block in Text Mining and Analytics . Trust me some time knowing basic regex can solve your major problems . Specially if want to see a certain pattern into text data . This tutorial will help you to understand the regex in Python –
Regex In Python-
Python has inbuilt package re . There are some functions define into it .
It will return the first occurrence object if any match in the String with the pattern . In order to understand in more detail , please refer the below code –
import re str = "ai is helping doctors to sort out Pain in Operations" x = re.search("ai", str) print("Search will only capture the first occurrence ", x)
Search will only capture the first occurrence <_sre.SRE_Match object; span=(0, 2), match=’ai’>
Here in the string “ai” as a pattern occurred in two times . But Search captures the first occurrence only .
The function is used for split the string based on matched pattern . It will return the list object . Please refer the below code base –
import re temp_str = "The rain in Spain" x = re.split("ai", temp_str) print(x)
['The r', 'n in Sp', 'n']
Here the user pattern was ” ai ” . The above code is breaking the string once it the given pattern .
When you need to replace some pattern in some string by another pattern . Please refer the below example of sub() function –
import re str = "I am Interested in AI" x = re.sub("\s", "%%", str) print(x)
In the output , You may see the space is filled by “%%” .
We use this function to identify the match pattern in the list . Please find the below code for the reference of findall().
import re str = "I am Data Scientist and AI developer" x = re.findall("a", str) print(x)
['a', 'a', 'a', 'a']
As We have mention in the description . “a” is a defined pattern which is occurring four times in the list .
How to generate the pattern –
In order to generate the pattern , There are some character in python which you may use –
- . | Dot -This signifies the occurrence of single character .
- ^ – This signifies the starting pattern .
- $ – This signifies the ending pattern .
- * – One or more occurrence
Regex is the matter of practice . Still the basic concepts are necessary . In order to make expert hands on this topic , You need to solve real problems of text mining . You need to practice the way of pattern creation using the symbols. I hope you must have liked this article – Regex In Python : Complete Tutorial for Data Scientist .
As you think you want to add some information around the regex in Python , We welcome your suggestion . You may comment us or email us . If you think you want to contribute as a complete guest post . You may still provide that .
Data Science Learner Team