One way to extract text located between numerical values using regex in Python is to use lookbehind and lookahead assertions. Here is an example:
import re
text = "123 this is some text 456 that we want to extract 789"
pattern = r"(?<=\d\s).*?(?=\s\d)"
result = re.findall(pattern, text)
print(result)
Explanation:
(?<=\d\s)
is a positive lookbehind assertion that matches a digit followed by a whitespace character (but does not include the match in the result)..*?
matches any character (except a newline) zero or more times, but as few times as possible (non-greedy).(?=\s\d)
is a positive lookahead assertion that matches a whitespace character followed by a digit (but does not include the match in the result).The re.findall
function returns a list of all non-overlapping matches of the pattern in the text. In this case, the result will be:
['this is some text', 'that we want to extract']
Please start posting anonymously - your entry will be published after you log in or create a new account. This space is reserved only for answers. If you would like to engage in a discussion, please instead post a comment under the question or an answer that you would like to discuss
Asked: 2022-03-14 11:00:00 +0000
Seen: 9 times
Last updated: Jul 24 '21
How can popen() be used to direct streaming data to TAR?
In Python, can a string be utilized to retrieve a dataframe that has the same name as the string?
What is the method for merging field value and text into a singular line for display?
What is the method for programmatic access to a time series?