Possible Duplicate:
07000
07001
我知道这个问题已经问了很多次,但有不同的答案;我很困惑.
我的行是:
- 1,3.2,BCD,"qwer 47"" ""dfg""",1
可选的引用和双引号MS Excel标准. (数据:qwer 47“”dfg“表示如下”qwer 47“”“”dfg“”“.)
我需要一个正则表达式.
好的,你从评论中看到正则表达式不是正确的工具.但如果你坚持,这里有:
这个正则表达式将在Java(或.NET和其他支持占有量词和冗长正则表达式的实现)中工作:
- ^ # Start of string
- (?: # Match the following:
- (?: # Either match
- [^",\n]*+ # 0 or more characters except comma,quote or newline
- | # or
- " # an opening quote
- (?: # followed by either
- [^"]*+ # 0 or more non-quote characters
- | # or
- "" # an escaped quote ("")
- )* # any number of times
- " # followed by a closing quote
- ) # End of alternation,# Match a comma (separating the CSV columns)
- )* # Do this zero or more times.
- (?: # Then match
- (?: # using the same rules as above
- [^",\n]*+ # an unquoted CSV field
- | # or a quoted CSV field
- "(?:[^"]*+|"")*"
- ) # End of alternation
- ) # End of non-capturing group
- $ # End of string
Java代码:
- boolean foundMatch = subjectString.matches(
- "(?x)^ # Start of string\n" +
- "(?: # Match the following:\n" +
- " (?: # Either match\n" +
- " [^\",\\n]*+ # 0 or more characters except comma,quote or newline\n" +
- " | # or\n" +
- " \" # an opening quote\n" +
- " (?: # followed by either\n" +
- " [^\"]*+ # 0 or more non-quote characters\n" +
- " | # or\n" +
- " \"\" # an escaped quote (\"\")\n" +
- " )* # any number of times\n" +
- " \" # followed by a closing quote\n" +
- " ) # End of alternation\n" +
- ",# Match a comma (separating the CSV columns)\n" +
- ")* # Do this zero or more times.\n" +
- "(?: # Then match\n" +
- " (?: # using the same rules as above\n" +
- " [^\",\\n]*+ # an unquoted CSV field\n" +
- " | # or a quoted CSV field\n" +
- " \"(?:[^\"]*+|\"\")*\"\n" +
- " ) # End of alternation\n" +
- ") # End of non-capturing group\n" +
- "$ # End of string");
请注意,您不能假设CSV文件中的每一行都是完整的行.您可以在CSV行中包含换行符(只要包含换行符的列用引号括起来).这个正则表达式知道这一点,但如果你只给它一个部分行,它就会失败.这是您真正需要CSV解析器来验证CSV文件的另一个原因.这就是解析器的作用.如果您控制输入并且知道在CSV字段中永远不会有换行符,那么您可能会放弃它,但只有这样.