Not a Nice Problem to Have: the weekly challenge 329 task 2

Task 2: Nice Strings:

You are given a string made up of lower and upper case English letters only.

 
Write a script to return the longest substring of the give string which is nice. A string is nice if, for every letter of the alphabet that the string contains, it appears both in uppercase and lowercase.

"[t]his is a very nice day, and we are taking a very nice walk, and you are two very nice young ladies. Oh! it is a very nice word indeed! it does for everything. Originally perhaps it was applied only to express neatness, propriety, delicacy, or refinement—people were nice in their dress, in their sentiments, or their choice. But now every commendation on every subject is comprised in that one word." - Northanger Abbey

As Henry says, there is too much that may be nice. A string of n characters has n(n+1)/2+1 substrings, and checking each substring for niceness is itself O(n), so a brute force approach will be O(n³). However, checking all substrings with the same starting position can also be O(n), so at least O(n²) is easy. I don't see a way to reduce the complexity further, but at least we can work at making it efficiently O(n²).

To check non-empty substrings at a starting position for niceness, we track letters found so far that make it not nice (that we've only seen in upper or lower case) and letters that don't (that we've seen in both cases). In pseudocode:

for each offset o starting with o₀
    get the character c at offset o
    if c is in neither our nice set or not nice set
        if c's opposite case is in our not nice set,
            remove it from the not nice set
            add both cases of c to our nice set
        otherwise, add it to our not nice set
    the substring at o₀-o is nice if the not nice set is empty
        remember this substring if it is the longest found so far

We just need to initialize our longest found so far to the empty string and do that over every starting position from 0 to n-1, and we have our longest nice substring (or at least the first occurring one of that length).

That's still a fairly brute force approach. Instead, when we start checking a new position, we'll immediately find the first occurrence after it of the opposite case of the starting character (and if there isn't one, this position is a wash, and we can proceed to the next). Then populate our nice and not nice sets with the substring up to that character (and check if this substring is the longest yet if the not nice set is empty), and then advance from there as in the pseudocode above.

Additionally, when we get to the end of the string, we can advance our starting offset loop to the character after the last nice string we found (or stop altogether if there are not enough characters left from that point for a longer nice string than we've found).

As a final refinement, when we don't find the opposite case of the starting character, save that starting character in a third, never nice set; whenever we encounter it, we can stop our end offset loop and advance the starting offset loop to after it.

Actual code forthcoming...

full script, Python
full script, Perl

See you next week.