Not So Loopy Digits: Weekly Challenge 352 Task 2
Task 2: Binary Prefix:
Given an array of 0s and 1s, return Booleans indicating whether each non-empty prefix of the array (shortest to longest) is divisible by 5 when interpreted as a binary number (highest bit first).
Excellent, a problem involving arrays and math; what a great opportunity to use a regex.
We start with a DFA for binary numbers divisible by 5[1], where each state is the current value modulo 5, and transitions represent adding a new lowest binary digit (so multiplying the state by 2 and adding 0 or 1, modulo 5). S₀ is the initial and final state[2]:
S₀:
0 -> S₀
1 -> S₁
S₁:
0 -> S₂
1 -> S₃
S₂:
0 -> S₄
1 -> S₀
S₃:
0 -> S₁
1 -> S₂
S₄:
0 -> S₃
1 -> S₄
Remove each non-accepting node one by one and replace transitions to it with regular expressions that end where the removed node did. First S₄:
S₂:
01*0 -> S₃
1 -> S₀
Then remove transitions to S₃:
S₁:
0|11 -> S₂
10 -> S₁
S₂:
01*00 -> S₁
01*01 -> S₂
1 -> S₀
Remove self-transition in S₂:
S₂:
(01*01)*01*00 -> S₁
(01*01)*1 -> S₀
Then remove S₁ transition to S₂:
S₁:
(0|11)(01*01)*01*00|10 -> S₁
Remove self-transition in S₁:
S₁:
((0|11)(01*01)*01*00|10)*(0|11)(01*01)*1 -> S₀
And remove S₀ transition to S₁:
S₀:
0|1((0|11)(01*01)*01*00|10)*(0|11)(01*01)*1 -> S₀
Leaving us with this regular expression:
(0|1((0|11)(01*01)*01*00|10)*(0|11)(01*01)*1)+
Removing nodes in a different order will produce different but equivalent regexes.
So our task is to join the digits into a string and match against prefixes of it. But that involves iterating over the possible prefixes or lengths; wouldn't it be nicer to have the regex matching itself do that? The trouble is that finding all matches, with most regular expression implementations, only allows finding non-overlapping matches. You can use lookahead to capture some overlaps, but that still doesn't allow multiple non-empty matches from the same starting point, as we have here.
The answer is to reverse the string and the regex and the matches returned.
Reversing the regex[3] produces:
(1(101*0)*(11|0)(01|001*0(101*0)*(11|0))*1|0)+
Instead of matching prefixes of a string of binary digits ordered from high bits to low bits, we will use this to match suffixes of the reversed string — the substrings starting at each point in the string before the end, giving us a different starting point for each match.
To allow overlap, capture that (with the divisibility-regex parens changed to non-capturing) in a lookahead, followed by a single character match (that last to prevent matching an empty string at the beginning). That gets us all the matches of binary numbers divisible by 5; to include returning false for those not divisible, we will add an empty alternation in the capture and return false where the capture is empty.
In Python:
def binary_prefix(nums: list[int]) -> tuple[bool, ...]:
return tuple(len(m) != 0 for m in re.findall(r'(?=((?:1(?:101*0)*(?:11|0)(?:01|001*0(?:101*0)*(?:11|0))*1|0)+\z|)).', ''.join(str(d) for d in nums[::-1])))[::-1]
In Perl:
sub binary_prefix($nums) {
reverse map !!length, reverse(join '', @$nums) =~ /(?=((?:1(?:101*0)*(?:11|0)(?:01|001*0(?:101*0)*(?:11|0))*1|0)+\z|))./g
}
full script, Python
full script, Perl
Comments greatly appreciated. See you next week.
This approach can be used for generating a DFA and then a regex for numbers in any base, divisible by any number. ↩︎
This is sloppy wording; since the empty string is not a binary number divisible by 5, S₀ cannot be both the initial and final state; really, there's an additional state S that is the initial state, with transitions the same as S₀'s. ↩︎
Including reversing the order of alternations, which isn't actually necessary. ↩︎