Python/Re

Extracting substring from strings using re

Suppose we have a string like "thing2_2017-04-09_05-04-67.csv" and we want to extract tokens from the filename (thing2, 2017, 04, 09, etc).

To extract particular tokens using a regular expression, we can use re.findall(regular_expression,string). For example, the regular expression [0-9]{4} looks for the digits 0-9 occurring exactly 4 times.

>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.findall(r'[0-9]{4}', z)
['2017']

Splitting string at occurrences of regular expression

To split a string at occurrences of regular expressions, use re.split(regular_expression, string). This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s.

The regular expression [^a-zA-Z0-9]{1,} will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example:

>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.split(r'[^a-zA-Z0-9]{1,}', z)
['thing2', '2017', '04', '09', '05', '04', '67', 'csv']

Python/Re

From charlesreid1

Extracting substring from strings using re

Splitting string at occurrences of regular expression