In case of doubt, use a regex: the weekly challenge 328 task 1

Task 1: Replace all ?:

You are given a string containing only lower case English letters and ?.

Write a script to replace all ? in the given string so that the string doesn’t contain consecutive repeating characters.

This problem reminds me of a logic puzzle, figuring out all the possibilities and coding for it all.

Not actually stated in the problem, but I'm assuming we are replacing each ? with a single character, and that character is a lowercase ASCII letter. Without those assumptions, there are trivial solutions (just remove all the ? characters, or replace single ? characters with 0 and series of ? characters with alternating 0 and 1).

Let's say we want to efficiently replace all the ? characters in a single pass. "Replace all" hints at global regex substitution.

Let's pick a substitute letter a and use that with a regex like (?<!a)\?(?!a). But there's a problem with that: that will replace a sequence of ? all with a instead of just every other ?, because the lookbehind only sees what's in the original string, not what's been substituted so far. (AFAIK this is true across languages and regex engines.) If instead we just loop over the characters in the string, we don't have that problem, but let's try to make a regex substitution work before going that route.

We can enforce replacing only every other one of a sequence of ? by asserting that there are not an odd number of following ?, with:

(?<!a) \? (?! a | \? (\?{2})* (?!\?) )

(after the ? to replace there can't be an a, and can't be an odd number of ?). But that's less than optimal when the sequence of ? is followed by a; then we want instead to match when followed by an odd number number of ?, so with e.g. a???a we get a?a?a instead of no change and with e.g. x??????a we get xa?a?a?a instead of x?a?a??a. That looks like:

(?<!a) \? (?! a | \? (\?{2})* (?![a?]) | (\?{2})* a )

This gives us:

substring results
x?x xax
x?a x?a
a?x a?x
a?a a?a
x??x x?ax
x??a xa?a
a??x a?ax
a??a a??a
x???x xa?ax
x???a x?a?a
a???x a??ax
a???a a?a?a
x????x x?a?ax
x????a xa?a?a
a????x a?a?ax
a????a a??a?a
x?????x xa?a?ax
x?????a x?a?a?a
a?????x a??a?ax
a?????a a?a?a?a
x??????x x?a?a?ax
x??????a xa?a?a?a
a??????x a?a?a?ax
a??????a a??a?a?a
x???????x xa?a?a?ax
x???????a x?a?a?a?a
a???????x a??a?a?ax
a???????a a?a?a?a?a

Now we have only single ? (with an a to the left or right or both) and a??a sequences. So just a couple more substitutions are needed. Replace ? with b wherever there isn't a b before or after, but not the second of a ?? pair:

(?<! b | \? ) \? (?!b)

a??a will now be ab?a; any other remaining ? is now a?b or b?a or at the beginning or end of the string with a b just inward. So we can safely replace any remaining ? with c.

That's three passes though, so we want to combine it all. The b regex needs a change, since we checked if the previous character was a ?, and we will no longer see a difference between a ? that changed to a and a ? that was left unchanged. So we need to assert that the a regex wouldn't have matched that preceding ?:

(?<! b | \? (?<! (?<!a) \? (?! \? (?:\?{2})* (?![a?]) | (?:\?{2})+ a ) ) ) \? (?!b)

Now we can combine them, using named captures[1] to determine which alternate matched:

(?<a> (?<!a) \? (?! a | \? (?:\?{2})* (?![a?]) | (?:\?{2})* a ))
| (?<b> (?<! b | \? (?<! (?<!a) \? (?! \? (?:\?{2})* (?![a?]) | (?:\?{2})+ a ) ) ) \? (?!b))
| (?<c>\?)

In Python:

replace_all_qs = regex.compile(r'''(?x)
    (?<a> (?<!a) \? (?! a | \? (?:\?{2})* (?![a?]) | (?:\?{2})* a ))
    | (?<b> (?<! b | \? (?<! (?<!a) \? (?! \? (?:\?{2})* (?![a?]) | (?:\?{2})+ a ) ) ) \? (?!b))
    | (?<c>\?)
    ''')
new_string = replace_all_qs.sub(lambda m:m.lastgroup, string)

In Perl:

my $replace_all_qs = qr'
    (?<a> (?<!a) \? (?! a | \? (?:\?{2})* (?![a?]) | (?:\?{2})* a ))
    | (?<b> (?<! b | \? (?<! (?<!a) \? (?! \? (?:\?{2})* (?![a?]) | (?:\?{2})+ a ) ) ) \? (?!b))
    | (?<c>\?)
'x;
my $new_string = $string =~ s/$replace_all_qs/@{[keys %+]}/gr;

Other possible approaches:

  • repeatedly substitute random letters
  • loop over character positions

full script, Python
full script, Perl

On to task 2...


  1. for consistency using the Perlish (?<...> which Python's regex but not re supports, not Python re's (?P<...>) ↩︎