Sunday 12 April 2015

PowerShell : Regular Expressions

Greetings people of Earth!

In this article we will discuss about Regular Expressions in PowerShell. Those who have watched Interstellar movie already know that in some situations time becomes precious for us like any other resources. We have to save it! And those who have been in a situation where our dear and kind boss shouts at us "why is this not done?", "why is your program taking so long to run?", "What kind of shit code have you written", know even better the value of time. And that's why I am writing about Regular Expressions.

The beauty of regular expression is that its faster than other string searching or matching methods. But it might seem complicated if you don't understand them. But once you understand them, they become your best friend. Now you may ask why is it faster?
Regular Expressions are faster because they use smarter algorithms like Boyer-Moore string search algorithms. It doesn't go through the whole text matching all the characters with the pattern. It skips characters in between which improves the performance. If you want to learn more on that you can visit the link above.

Before diving into regex functions let's take a look into some basic symbols you will come across.

         
          SYMBOL                  
                                       MEANING                                           
               \ used to instruct compiler not to consider any special meaning of the next character.
               * represents 0 or more characters. Ex- a*
               + represents 1 or more characters. Ex- a+
               ? represents 0 or 1 character. Ex- a?
               . match any character except newline.
               [] match any single character from within the brackets. you can also specify range like [a-z] or [0-9]
               [^] match any character except those in brackets.
               ^ represents the beginning of a line. Ex- ^ab matches 'ab' in the beginning of newline.
               $ represents the end of line.
               {n,} used to represent a pattern repeated n or more number of times.
               {,n} used to represent a pattern repeated n or less than n number of times.
               {n} used to represent a pattern repeated exactly n of times.

These are some symbols you will find more often while working with regular expressions. Now I will show you some examples of regular expressions. I would suggest you to first think yourself to form a regex before looking into pattern I have given.

    - apple is a fruit apple
suppose you want to remove the second 'apple' then your regular expression would be "apple$"

$string = "apple is a fruit apple"
$string = $string -replace "apple$" , ""

    - CN=Andy/OU=ROY/AP=US
Suppose this is a string from which you want to remove the character before '=' so that resulting string is Andy/ROY/US. 

$string = "CN=Andy/OU=ROY/AP=US"
$string = $string -replace "[A-Z]{2}=",""

write-host $string

This expression replaces 2 alphabets before "=" with a blank.
NOTE : "replace" function is case insensitive so it considers it will consider "cn=" and "CN=" the same.
Suppose our string is : "Cn=Andy/OU=ROY/AP=US"
and you don't want to remove small case letters then you can use below script

$string = "Cn=Andy/OU=ROY/AP=US"
$string = $string -creplace "[A-Z]{2}=",""
write-host $string

This is will give below result :



Suppose the string is :
"C=Andy/OU=ROY/APS=US"
Then you can use the below script :


Here '[A-Z]{1,3}=' matches atleast 1 and atmost 3 alphabets before '='.
Now suppose you have a string like this :
"<body><p attr="some attribute value">some text that you want to retain</p></html>"
This is a scenario that I have faced where a SharePoint column had all junk data like this from source and I had to remove all the unwanted characters. I have simplified the string and kept only html tags to make it simple for you.
You can remove all the html tags with this regex : "<.+?>"



This is the power of regular expressions. The longer your pattern is the faster it works. There are still many types of patterns to discuss. But it's not possible to discuss all scenarios. So if you have any doubts regarding any regular expression feel free to comment below or mail me.
Have a nice day!

No comments:

Post a Comment

Feel free to share your thoughts...