Not a Nice Problem to Have: the weekly challenge 329 task 2
Task 2: Nice Strings:
You are given a string made up of lower and upper case English letters only.
Write a script to return the longest substring of the give string which is nice. A string is nice if, for every letter of the alphabet that the string contains, it appears both in uppercase and lowercase.
"[t]his is a very nice day, and we are taking a very nice walk, and you are two very nice young ladies. Oh! it is a very nice word indeed! it does for everything. Originally perhaps it was applied only to express neatness, propriety, delicacy, or refinement—people were nice in their dress, in their sentiments, or their choice. But now every commendation on every subject is comprised in that one word." - Northanger Abbey
As Henry says, there is too much that may be nice. A string of n characters has n(n+1)/2+1
substrings, and checking each substring for niceness is itself O(n), so a brute force approach will be O(n³).
However, checking all substrings with the same starting index together can be done in O(n) time, so overall O(n²) is doable. To do this, track letters found so far that make it not nice (that we've only seen in upper or lower case) and letters that don't (that we've seen in both cases). In pseudocode:
for each index o starting with o₀
get the character c at index o
if c is in neither our nice set or not yet nice set
if c's opposite case is in our not yet nice set,
remove it from the not yet nice set
add both cases of c to our nice set
otherwise, add it to our not yet nice set
the substring at o₀-o is nice if the not yet nice set is empty
remember this substring if it is the longest found so far
We initialize our longest found so far to the empty string and do the above over every starting index from 0
to n-1
(starting with empty nice and not yet nice sets for each), and we have our longest nice substring (the first occurring one of that length). In Python (3.13):
def simple_longest_nice_substring(string: str) -> str:
longest_nice_substring: str = ''
for starting_index in range(0, len(string)):
nice: set[str] = set()
not_yet_nice: set[str] = set()
for ending_index in range(starting_index, len(string)):
c: str = string[ending_index]
if c not in nice and c not in not_yet_nice:
if c.swapcase() in not_yet_nice:
not_yet_nice.remove(c.swapcase())
nice.update(c, c.swapcase())
else:
not_yet_nice.add(c)
if not not_yet_nice and ending_index - starting_index + 1 > len(longest_nice_substring):
longest_nice_substring = string[starting_index:ending_index+1]
return longest_nice_substring
I don't see a way to reduce the complexity further (though I'm hoping to see someone else come up with a beautiful one-pass NFA), but we can short circuit things in several ways to make it efficiently O(n²):
-
We can add a third, never nice set of characters whose opposite case is not in the string, and when we encounter one, continue the next iteration of the starting index loop after it.
-
We can terminate the starting index loop when the remaining string isn't long enough to find a longer nice substring than we have.
-
When we start checking a new starting index, we'll immediately look for the first occurrence after it of the opposite case of the starting character. If it's not found, this starting index is a wash, and we can proceed to the next. If it is found, we can check the substring up through it for niceness, and, after populating our nice and not yet nice sets from the substring, begin our ending index loop just after it (since any shorter substring will not be nice).
-
When we don't find the opposite case of the starting index character, add that character to the never nice set, since we know no additional nice substrings can exist that contain that character.
-
When we get to the end of the string, in the ending index loop, and there were nice strings with this starting index, we know there are no longer nice strings that start at or before the end of the longest of those — if there were, we would have found an even longer nice string. So we can advance our starting index loop to the second character after the last nice string.
In Python:
def longest_nice_substring(string: str) -> str:
# longest nice substring found so far
longest_nice_substring: str = ''
# characters that we know can't form part of a not yet found nice substring
never_nice: set[str] = set(string) - set(string.swapcase())
# loop over starting indexes while it is possible to find a longer nice substring
next_starting_index: int = 0
while len(string) - next_starting_index > len(longest_nice_substring):
starting_index: int = next_starting_index
c: str = string[starting_index]
next_starting_index += 1
# can we rule out any substring starting here?
if c in never_nice:
continue
# shortest substring that may be nice will end with the opposite character
ending_index: int = string.find(c.swapcase(), next_starting_index) + 1
if ending_index == 0:
never_nice.add(c)
continue
# what we know so far about substrings starting at this starting index
substring_prefix: str = string[starting_index:ending_index]
nice: set[str] = set(substring_prefix) & set(substring_prefix.swapcase())
not_yet_nice: set[str] = set(substring_prefix) - nice
# found a nice substring?
if not not_yet_nice:
if len(substring_prefix) > len(longest_nice_substring):
longest_nice_substring = substring_prefix
# any nice strings starting in the middle of or just after this nice
# string will also be nice if started at the beginning of it, so
# this iteration of the outer loop will find them and the next
# iteration can be farther on
next_starting_index = ending_index + 1
# look for longer nice substrings starting at this starting index
for ending_index in range(ending_index, len(string)):
c = string[ending_index]
# character we know there's no match for later?
if c in never_nice:
break
# character not yet seen in this substring?
if c not in nice and c not in not_yet_nice:
# completes an upper/lower pair?
c_opposite: str = c.swapcase()
if c_opposite in not_yet_nice:
not_yet_nice.remove(c_opposite)
nice.update(c, c_opposite)
else:
not_yet_nice.add(c)
# found a nice substring?
if not not_yet_nice:
if ending_index - starting_index >= len(longest_nice_substring):
longest_nice_substring = string[starting_index:ending_index+1]
# any nice strings starting in the middle of or just after this
# nice string will also be nice if started at the beginning of
# it, so this iteration of the outer loop will find them and the
# next iteration can be farther on
next_starting_index = ending_index + 2
return longest_nice_substring
All of that is fairly straightforward. Speed improvements from that will vary greatly depending on the exact input and its length. Testing random strings against the simpler version found some off by one errors, now fixed.
Some things of note:
Keeping a separate variable for the value of the next iteration of the outer loop is a nice pattern when that is conditional in the loop body. I'm not entirely happy with the Python syntax:
next_starting_index: int = 0
while len(string) - next_starting_index > len(longest_nice_substring):
starting_index: int = next_starting_index
next_starting_index += 1
...
$next_starting_index = ...
but didn't come up with anything I was happier with. In Perl, either the C-style for for (initializer; while boolean; iterate)
:
my $next_starting_index;
for (my $starting_index = 0;
length($string) - $starting_index > length($longest_nice_substring);
$starting_index = $next_starting_index
) {
++$next_starting_index;
or the equivalent while
:
my $next_starting_index = my $starting_index = 0;
while (length($string) - $starting_index > length($longest_nice_substring)) {
++$next_starting_index;
...
}
continue {
$starting_index = $next_starting_index;
}
are nicer IMO. I'd love to see continue blocks in Python.
Python's set type, with efficient construction, difference, and intersection, make those parts easy, e.g.:
nice: set[str] = set(substring_prefix) & set(substring_prefix.swapcase())
not_yet_nice: set[str] = set(substring_prefix) - nice
In Perl, hashes are the easiest equivalent:
my %nice = my %not_yet_nice = map(($_ => 1), split //, $substring_prefix);
delete @not_yet_nice{map $_ ^. ' ', keys %not_yet_nice};
delete @nice{keys %not_yet_nice};
though difference (via delete) is easier than intersection, so there's two differences rather than an intersection and a difference.
I chose to use hashes with true values to represent sets; the other alternative is not caring about value. That makes construction easier, replacing the map with a simple empty list assignment:
my (%nice, %not_yet_nice);
@nice{split //, $substring_prefix) = ();
@not_yet_nice{keys %nice} = ();
but then checking for set membership requires exists $set{$element}
rather than just $set{$element}
.
Python's swapcase is handy; with no equivalent builtin in Perl, I used the stringwise bit xor with space (chr(0x20)) to easily change between upper case (chr(0x41)..chr(0x5a)) and lower case (chr(0x61)..chr(0x7a)). For non-ASCII characters, this could be done with a regex[1]:
s/(\p{Ll}*)(\p{Lu}*)/\U$1\E\l$2/g
full script, Python
full script, Perl
Comments greatly appreciated. See you next week.
Because a regex somewhere is mandatory and I hadn't used one yet. (While the first argument to split in Perl is nominally a regex, both
split ' '
andsplit //
are special forms that don't actually use the regex engine.) ↩︎