I was given today a little project, as a freelancer sometimes you have to prepare for the unthinkable even though their title begs to differ. I mean today it's been a spacey day, wasn't that bad since I learned something too.
So, you must be wondering what happened? If you are a developer, imagine 3,000+ \r\n invading the HTML source code. Plus different \r\n with 3,4,6 spaces, obvious the designer who did the layout surely like to annoy others. \r is carriage return while \n is newline, windows style, I think.
What does this do? Nothing, you can't see it when you are displaying the HTML, but if you check the source code then you notice big spaces and carriage returns.
Solution? My solution is pretty straight foward, yet it lacks efficency.
In the PHP you must get the HTML in a variable.
Then to remove the excessiveness our smart designer added:
$regex = preg_replace('{^[^>]([\r\n])?( | )?([\r\n]+)$}mx','',$html);
$regex = preg_replace('#([ ]+[\r\n][\r\n])#x','',$regex);
$regex = preg_replace('#(([\r\n]{3,}|[\r\n]{4,})([\r\n]+))#x','
',$regex);
I started by telling the regex engine to replace after > is set, search for possible returns and newlines is there are any present, search 3 spaces OR 1 space, optinally. Search for one or many \r\n and replace them with nothing.
Later on the second preg_replace, check for one or many spaces followed by two /r/n
And lastly, search 3 /r/n OR 4 /r/n to unlimited, followed by one /r/n+ and remove them.
I think I could have done better, but since deadlines are not something you can negotiate with--well, you get the idea. This got it done and that's all it matters. Luckily, I compared the line numbers, the original file had 4312 lines and the modified one ended with 313 lines.
Cheers

Leave a comment