Added: Dec 1, 2008
From: h4ck3rm1k3
Duration: 6:9
http://www.perlfoundation.org/perl6 - - steps forward in terms of functionality and steps backwards in terms of readability. At the time, I rationalized it all in the name of backward compatibility, and perhaps that approach was correct for that time and place. It's not correct now, since the Perl 6 approach is to break everything that needs breaking all at once. And unfortunately, there's a lot of regex culture that needs breaking. Regex culture has gone wrong in a variety of ways, but it's not my intent to assign blame--there's plenty of blame to go around, and plenty of things that have gone wrong that are nobody's fault in particular. For example, it's nobody's fault that you can't realistically complement a character set anymore. It's just an accident of the way Unicode defines combining characters. The whole notion of character classes is mutating, and that will have some bearing on the future of regular expression syntax. Given all this, I need to warn you that this Apocalypse is going to be somewhat radical. We'll be proposing changes to certain "sacred" features of regex culture, and this is guaranteed to result in future shock for some of our more conservative citizens. Do not be alarmed. We will provide ways for you to continue programming in old-fashioned regular expressions if you desire. But I hope that once you've thought about it a little and worked through some examples, you'll like most of the changes we're proposing here. So although the RFCs did contribute greatly to my thinking for this Apocalypse, I'm going to present my own vision first for where regex culture should go, and then analyze the RFCs with respect to that vision. First, let me enumerate some of the things that are wrong with current regex culture. * Too much history * Too compact and "cute" * Poor Huffman coding * Too much reliance on too few metacharacters * Different things look too similar * Poor end-weight design * Too much reliance on modifiers * Too many special rules and boobytraps * Backreferences not useful enough * Too hard to match a literal string * Two-level interpretation is problematic * Too little abstraction * Little support for named captures * Difficult to use nested patterns * Little support for grammars * Inability to define variants * Poor integration with "real" language * Missing backtracking controls * Difficult to define assertions I'm sure there are other problems, but that'll do for starters. Let's look at each of these in more detail. Too much history Most of the other problems stem from trying to deal with a rich history. Now there's nothing wrong with history per se, but those of us who are doomed to repeat it find that many parts of history are suboptimal and contradictory. Perl has always tried to err on the side of incorporating as much history as possible, and sometimes Perl has succeeded in that endeavor. Cultural continuity has much to be said for it, but what can you do when the culture you're trying to be continuous with is itself discontinuous? As it says in Ecclesiastes, there's a time to build up, and a time to tear down. The first five versions of Perl mostly built up without tearing down, so now we're trying to redress that omission. Too compact and "cute" Regular expressions were invented by computational linguists who love to write examples like "/aa*b*(cd)*ee/". While these are conducive to reasoning about pattern matching in the abstract, they aren't so good for pattern matching in the concrete. In real life, most atoms are longer than ""a"" or ""b"". In real life, tokens are more recognizable if they are separated by whitespace. In the abstract, "/a+/" is reducible to "/aa*/ ". In real life, nobody wants to repeat a 15 character token merely to satisfy somebody's idea of theoretical purity. So we have shortcuts like the "+" quantifier to say "one or more". Now, you may rightly point out that "+" is something we already have, and we already introduced "/x" to allow whitespace, so why is this bullet point here? Well, there's a lot of inertia in culture, and the problem with "/x" is that it's not the default, so people don't think to turn it on when it would probably do a lot of good. The culture is biased in the wrong direction. Whitespace around tokens should be the norm, not the exception. It should be acceptable to use whitespace to separate tokens that could be confused. It should not be considered acceptable to define new constructs that contain a plethora of punctuation, but we've become accustomed
Channel: Education
Tags: apocalypse larry perl perl6 wall
Rating: ( ratings) Views: 15 Comments: 0