maximum number of non-overlapping substrings
This was todays daily for the #AlgoBot Daily Question channel in the RC zulip.
I may choose a different channel in the future, there are a few and nobody did this one.
problem #
Given a string s of lowercase letters, you need to find the maximum number of non-empty substrings of s that meet the following conditions:
1. The substrings do not overlap, that is for any two substrings s[i..j] and s[x..y], either j < x or i > y is true.
2. A substring that contains a certain character c must also contain all occurrences of c.
Find the maximum number of substrings that meet the above conditions. If there are multiple solutions with the same number of substrings, return the one with minimum total length. It can be shown that there exists a unique solution of minimum total length.
Notice that you can return the substrings in any order.
Input: s = "adefaddaccc"
Output: ["e","f","ccc"]
Input: s = "abbaccd"
Output: ["d","bb","cc"]
original solution #
- map every char in the string to all its indexes
- "abbbcc" -> {a: [0], b: [1, 2, 3], c: [4, 5]}
- turn the indexes into ranges
- drop all indexes but the start and end
- characters with a single index are expanded to the same index twice
- sort the ranges by their end
- starting with a negative ending time as our "last ending":
- take the next range that doesn't overlap our last ending time
- intuition:
- choosing the interval that ends earliest leaves as much remaining time as possible
# stinky!
from collections import OrderedDict
class Solution:
def maxNumOfSubstrings(self, s: str) -> List[str]:
res = []
# keys are chars, values are a list of all indexes
ranges = {}
# populate indexes
for i, char in enumerate(s):
if char not in ranges:
ranges[char] = []
ranges[char].append(i)
for char in ranges.keys():
# make sure every range has two indexes
if len(ranges[char]) == 1:
ind = ranges[char][0]
ranges[char] = [ind, ind]
# drop everything but the first and last index
elif len(ranges[char]) >= 2:
ranges[char] = [ranges[char][0], ranges[char][-1]]
# sort all the intervals by the end of their range
intervals = OrderedDict(sorted(ranges.items(), key=lambda x: x[1][-1]))
# Greedily select non-overlapping substrings
last_end = -1
for char, (start, end) in intervals.items():
if start > last_end:
res.append(s[start : end + 1])
last_end = end
return res
This "solution" passed the two example test cases, but only passes 81/284 of the real test cases. :(
Input: s = "abab"
Output: ["aba"]
Expected Output: ["abab"]
The first problem is that I only count runs that start and end with the same letter.
I also failed to think about the second constraint.
2. A substring that contains a certain character c must also contain all occurrences of c.
My solution includes one b and drops another.
fixed solution #
- map every char in the string to its start and end index
- "ababbcc" -> {a: [0, 2], b: [1, 4], c: [5, 6]}
str.rindex()
to the rescue
- sort the ranges by their end
- looking at each chars range:
- find every different character in that range
- expand your start and end if those other character's ranges are wider
- "ababbcc" -> {a: [0, 4], b: [0, 4], c: [5, 6]}
- notice how
a
andb
are now intermingled, and their ranges have changed
- notice how
- starting with a negative ending time as our "last ending":
- take the next range that doesn't overlap our last ending time
- intuition:
- choosing the interval that ends earliest leaves as much remaining time as possible
Also this can be a ...lot more complicated
class Solution:
def maxNumOfSubstrings(self, s: str) -> List[str]:
# keys are chars, values are first and last index (inclusive)
ranges = {}
for char in set(s):
ranges[char] = [s.index(char), s.rindex(char)]
# we need to expand ranges to include the index of any other characters
expanded_ranges = []
for char in set(s):
orig_start = ranges[char][0]
orig_end = ranges[char][1]
# bounds haven't really changed, but we need to loop at least once
bounds_changed = True
while bounds_changed:
bounds_changed = False
# note the + 1: python slices are upper bound exclusive
chars_in_range = set(s[orig_start : orig_end + 1])
for char in chars_in_range:
# choose the earliest range start
new_start = min(orig_start, ranges[char][0])
# choose the latest range end
new_end = max(orig_end, ranges[char][1])
# if either change, keep bounds_changed true and keep checking
if new_start != orig_start or new_end != orig_end:
orig_start, orig_end = new_start, new_end
bounds_changed = True
# note the + 1: see above
expanded_ranges.append([orig_start, orig_end + 1])
# greedily select non-overlapping substrings
expanded_ranges.sort(key=lambda x: x[1])
res = []
last_end_index = -1
for start, end in expanded_ranges:
if start >= last_end_index:
res.append(s[start:end])
last_end_index = end
return res
notes #
-
python -c "help()"
- spawns a repl that shows you the docs in
less
for any keyword you enter - the output is the python standard library docs
- my python LSP doesn't show docs for the std lib, so this lets me skip a browser
- spawns a repl that shows you the docs in
-
str.rindex()
- i knew about
list.index()
, but only strings support the rightmost index - apparently "nobody uses this for lists" from the BDFL himself
- i knew about
-
dict.keys()
vsdict.vals()
vsdict.items()
keys
returns keysvals
returns valsitems
returns tuples of(key, val)
- straightforward, the footgun is that
for _ in dict
iterates overkeys
, notitems
-
this is a standard dynamic programming problem called "interview scheduling"
- some slides here
- the greedy part at the end is the standard problem statement
- the twist in this leetcode was the expansion of the ranges, i.e. the part I ignored :^)
-
python slices are upper bound exclusive
foo = [1, 2, 3]
->foo[0:1] == [1]
- common cause of off by 1 errors
- Previous: the hub
- Next: sum of prefix scores of strings