

It doesn’t match cab, but matches the b (and only the b) in bed or debt. (?there is a u immediately after the q then the lookahead succeeds but then i fails to match u. It tries to match u and i at the same position. The regex q (?= u ) i can never match anything.


All remaining attempts fail as well, because there are no more q’s in the string. The lookahead was successful, so the engine continues with i. Again, the match from the lookahead must be discarded, so the engine steps back from i in the string to u. The lookahead is now positive and is followed by another token. Let’s take one more look inside, to make sure you understand the implications of the lookahead. Since q cannot match anywhere else, the engine reports failure. Since there are no other permutations of this regex, the engine has to start again at the beginning. This causes the engine to step back in the string to u.īecause the lookahead is negative, the successful match inside it causes the lookahead to fail. The engine notes success, and discards the regex match. However, it is done with the regex inside the lookahead. The engine advances to the next character: i. The next token is the u inside the lookahead. Let’s try applying the same regex to quit. At this point, the entire regex has matched, and q is returned as the match. Because the lookahead is negative, this means that the lookahead has successfully matched at the current position. The engine notes that the regex inside the lookahead failed. This does not match the void after the string. The engine takes note that it is inside a lookahead construct now, and begins matching the regex inside the lookahead. The position in the string is now the void after the string. As we already know, this causes the engine to traverse the string until the q in the string is matched. The first token in the regex is the literal q. Regex Engine Internalsįirst, let’s see how the engine applies q (?! u ) to the string Iraq. The other way around will not work, because the lookahead will already have discarded the regex match by the time the capturing group is to store its match. If you want to store the match of the regex inside a lookahead, you have to put capturing parentheses around the regex inside the lookahead, like this: (?= ( regex ) ). It is not included in the count towards numbering the backreferences. (The only exception is Tcl, which treats all groups inside lookahead as non-capturing.) The lookahead itself is not a capturing group. If it contains capturing groups then those groups will capture as normal and backreferences to them will work normally, even outside the lookahead. Any valid regular expression can be used inside the lookahead. You can use any regular expression inside the lookahead (but not lookbehind, as explained below). The positive lookahead construct is a pair of parentheses, with the opening parenthesis followed by a question mark and an equals sign. q (?= u ) matches a q that is followed by a u, without making the u part of the match. Inside the lookahead, we have the trivial regex u. The negative lookahead construct is the pair of parentheses, with the opening parenthesis followed by a question mark and an exclamation point. Negative lookahead provides the solution: q (?! u ). When explaining character classes, this tutorial explained why you cannot use a negated character class to match a q not followed by a u. Negative lookahead is indispensable if you want to match something not followed by something else. Lookaround allows you to create regular expressions that are impossible to create without them, or that would get very longwinded without them. They do not consume characters in the string, but only assert whether a match is possible or not. That is why they are called “assertions”. The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. Lookahead and lookbehind, collectively called “lookaround”, are zero-length assertions just like the start and end of line, and start and end of word anchors explained earlier in this tutorial. Lookahead and Lookbehind Zero-Length Assertions
