Write an iterator that takes in an iterable and an optional key_func where the key function is responsible for computing the key value for each element. If a key_func is not specified, then the identity function is used.

The iterable is assumed to be sorted by the key_func. Take a look at the following example:

>>> g = GroupBy('AAAABBBCCDEFFF')
>>> for key, value_iterator in g:
...     print(key, list(value_iterator))
A ['A', 'A', 'A', 'A']
B ['B', 'B', 'B']
C ['C', 'C']
D ['D']
E ['E']
F ['F', 'F', 'F']

Remember MapReduce? You don’t need to in order to solve this problem; however, if you remember the phrase before the reduce stage where the inputs are all grouped together by the key…that’s what you’re implementing here. So, after solving this problem, you can say you know how that phase of MapReduce works! Pretty cool!

class GroupBy(object):

    def __init__(self, iterable, key_func=lambda x: x):
        "***YOUR CODE HERE***"


Toggle Solution

class GroupBy(object):

    def __init__(self, iterable, key_func=lambda x: x):
        self.iterable = iter(iterable)
        self.key_func = key_func
        self.current_key = None
        self.target_key = None
        self.current_value = None

    def __iter__(self):
        return self

    def __next__(self):
        while self.current_key == self.target_key:
            self.current_value = next(self.iterable)
            self.current_key = self.key_func(self.current_value)
        self.target_key = self.current_key
        return (self.current_key, self.grouper(self.target_key))

    def grouper(self, target_key):
        while self.current_key == target_key:
            yield self.current_value
            self.current_value = next(self.iterable)
            self.current_key = self.key_func(self.current_value)

This problem is really, really hard so don’t be discouraged if you didn’t get it. We mainly have three variables to keep track of, we have the current_key which is the key that we’re currently on, we have the target_key which is the key that we’re looking for, and we have the current_value which, after finding the matching key, we want to yield out the current values.

In our __next__ method, we’re doing two things. First, we’re searching for the current key that does not match the target key. As soon as we find a current key that’s not the target key, we update the target key so that we can use it in our helper function and then we output the current_key and the grouped up values based on that target_key that we just found.

The grouper method is searching for the target_key in the next keys. Essentially, it is looking through the input to try to find the next key that does not match the key that we’re looking for. As soon as that’s true, it exits the while loop and stops yielding values to the __next__ method. The result is that the grouper function yields all values that match the target_key and then updates the current_key to be the next key that appears in the iterable.

I don't claim to be perfect so if you find an error on this page, please send me an email preferably with a link to this page so that I know what I need to fix!

comments powered by Disqus