Tabs + Spaces, Regex to the rescue!

| | Comments (0) | TrackBacks (0)

I was given today a little project, as a freelancer sometimes you have to prepare for the unthinkable even though their title begs to differ. I mean today it's been a spacey day, wasn't that bad since I learned something too.

So, you must be wondering what happened? If you are a developer, imagine 3,000+ \r\n invading the HTML source code. Plus different \r\n with 3,4,6 spaces, obvious the designer who did the layout surely like to annoy others. \r is carriage return while \n is newline, windows style, I think.

What does this do? Nothing, you can't see it when you are displaying the HTML, but if you check the source code then you notice big spaces and carriage returns.

Solution? My solution is pretty straight foward, yet it lacks efficency.

In the PHP you must get the HTML in a variable.

Then to remove the excessiveness our smart designer added:

$regex = preg_replace('{^[^>]([\r\n])?( | )?([\r\n]+)$}mx','',$html);
$regex = preg_replace('#([ ]+[\r\n][\r\n])#x','',$regex);
$regex = preg_replace('#(([\r\n]{3,}|[\r\n]{4,})([\r\n]+))#x','
',$regex);

I started by telling the regex engine to replace after > is set, search for possible returns and newlines is there are any present, search 3 spaces OR 1 space, optinally. Search for one or many \r\n and replace them with nothing.

Later on the second preg_replace, check for one or many spaces followed by two /r/n

And lastly, search 3 /r/n OR 4 /r/n to unlimited, followed by one /r/n+ and remove them.

I think I could have done better, but since deadlines are not something you can negotiate with--well, you get the idea. This got it done and that's all it matters. Luckily, I compared the line numbers, the original file had 4312 lines and the modified one ended with 313 lines.

Break Brains

Cheers

0 TrackBacks

Listed below are links to blogs that reference this entry: Tabs + Spaces, Regex to the rescue!.

TrackBack URL for this entry: http://routecafe.com/cgi-bin/dashboard/mt-tb.cgi/21

Leave a comment

About this Entry

This page contains a single entry by David published on August 6, 2008 5:27 PM.

Lights out was the previous entry in this blog.

Java + Me is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

Powered by Movable Type 4.1