This blog post discusses how to extract digits from a string (column) using regular expressions, which are imported with `import re`

in Python. The post introduces a function called `extract_digits`

, which takes a string as input and extracts all digits from it using a regular expression pattern. The pattern used in this function is `\d+`

, which matches one or more consecutive digits (`\d`

matches any digit, and `+`

specifies that one or more occurrences should be matched).

The post provides a step-by-step explanation of how the function works, including how it applies the regular expression pattern to the input string using the `re.findall()`

function, which finds all non-overlapping matches of the pattern in the string and returns them as a list of strings. Finally, it joins all the strings in the list into a single string using the `.join()`

method, which concatenates them together with an empty string as a separator.

Overall, this post provides a clear and concise explanation of how to use regular expressions to extract digits from a string (column) in Python, making it a helpful resource for anyone looking to perform this task.

# How to extract digits from a string(column) using regular expressions¶

```
#data manipulation and analysis library.
import numpy as np
import pandas as pd
# For regular expressions
import re
```

The function (extract_digits) uses regular expressions (imported with import re) to extract all digits from a given string s and return them as a single string.

- It first defines a regular expression pattern r’\d+’ which matches one or more consecutive digits (\d matches any digit, and + specifies that one or more occurrences should be matched).
- \d: Matches any digit character, i.e., the numbers 0 to 9. +: Specifies that one or more occurrences of the preceding pattern should be matched. In this case, the preceding pattern is \d, so the pattern \d+ matches one or more consecutive digits.

- It then applies this pattern to the input string s using the re.findall() function, which finds all non-overlapping matches of the pattern in the string and returns them as a list of strings.
- Finally, it joins all the strings in the list into a single string using the .join() method, which concatenates them together with an empty string as a separator.

```
# Define a function to extract only digits from a string
def extract_digits(s):
pattern = r'\d+'
matches = re.findall(pattern, s)
return (''.join(matches))
```

```
# Example string
s = "Hi, my name is francis and im 30 years old"
# Extract the digits
digits = extract_digits(s)
digits
```

'30'

```
# Example string
s = "[email protected]"
# Extract the digits
digits = extract_digits(s)
digits
```

'92'

```
# Example string
s = "I bought 1 pen and 2 notebooks"
# Extract the digits
digits = extract_digits(s)
digits
```

'12'

```
# Redefine the function to add space between 2 different digits
def extract_digits(s):
pattern = r'\d+'
matches = re.findall(pattern, s)
return (' '.join(matches))
```

```
# Example string
s = "I bought 1 pen and 2 notebooks"
# Extract the digits
digits = extract_digits(s)
digits
```

'1 2'

## Extract digits from a column¶

```
# Create a sample DataFrame
df = pd.DataFrame({
'Text': ['Im 12 years old', 'abc123def456ghi789', 'what is your name', 'I have 4 apples']
})
# Apply the extract_digits() function to the 'Text' column
df['Digits'] = df['Text'].apply(extract_digits)
df
```

Text | Digits | |
---|---|---|

0 | Im 12 years old | 12 |

1 | abc123def456ghi789 | 123 456 789 |

2 | what is your name | |

3 | I have 4 apples | 4 |

```
```