我遇到了一些奇怪的Perl行为:在
regexp中使用Posix字符类完全改变了结果字符串的排序顺序.
这是我的测试程序:
- sub namecmp($a,$b) {
- $a=~/([:alpha:]*)/;
- # $a=~/([a-z]*)/;
- $aword= $1;
- $b=~/([:alpha:]*)/;
- # $b=~/([a-z]*)/;
- $bword= $1;
- return $aword cmp $bword;
- };
- $_= <>;
- @names= sort namecmp split;
- print join(" ",@names),"\n";
如果使用[a-z]更改为注释掉的正则表达式,则会获得正常的词典排序顺序.但是,Posix [:alpha:]字符类会产生一些怪异的排序顺序,如下所示:
- $test_normal
- aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
- aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
- $test_posix
- aaa aab aac aba abb abc aca acb acc baa bab bac bba bbb bbc bca bcb bcc caa cbb
- baa bab bac bba bbb bbc bca bcb bcc caa cbb aba abb abc aca acb acc aab aac aaa
我最好的猜测是Posix角色类正在激活我从未听说过并且没有要求的某种语言环境.我想对“医生,医生,当我这样做时会疼!”的逻辑反应.是,“好吧,不要那样做,然后!”.
但是,谁能告诉我这里发生了什么,为什么?我使用的是perl 5.10,但我相信它也适用于perl 5.8.
字符类[:alpha:]表示Perl正则表达式中的字母字符,但方括号并不代表它们在正则表达式中通常执行的操作.所以你需要:
- $a=~/([[:alpha:]]*)/;
这在perlre中提到:
The POSIX character class Syntax
06001
is also available. Note that the
[
and]
brackets are literal; they must always be used within a character class expression.
- # this is correct:
- $string =~ /[[:alpha:]]/;
- # this is not,and will generate a warning:
- $string =~ /[:alpha:]/;