Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Transducers for Python (github.com/cognitect-labs)
63 points by espeed on Nov 20, 2015 | hide | past | favorite | 36 comments


The README sounds like it's describing Python's venerable iterator protocol. The example code looks like higher order functions and iterator composers from the standard functools and itertools modules. It should probably explain how this is different from those if the audience is Python programmers.


I concur. I don't really get the name "transducer". It's just map+reduce? So... why do we need a new name and new libraries?


It's not really map+reduce.

Transducers arise out of realizing that transformations between (possibly infinite) lists can be encoded as a "reducer transformations", "transducers". When you work in this encoding you automatically get lazy transformation (since everything goes through a function call) and the avoidance of intermediate data structures (since everything goes through a function call!).

Moreover, there's a neat property of this encoding, a CPS encoding, which is that function composition still "works", it just works "backwards".


Python iterators also give you lazy transformation and no intermediate data structures, no?


Yes, although transducers are more general than iterators. Iterators model lazy series from which you must 'pull' elements. Transducers can also be used with elements that are 'pushed', such as events coming from channels, or whatever. Essentially they decouple the transformation from the mechanisms by which input elements arrive and result elements are received, whilst still allowing separate transformations to be composed into more complex composite transformations.


That's what I was saying : this is from somebody not knowing Python.

Generators have a "send()" method to push data in it.

We have yield from to delegate to sub coroutines.

And generators can be classes for more advanced behaviors.

All this lib does is to make something that already exists in the language, in a heavier and slower way.


> Generators have a "send()" method to push data in it.

What would be the pythonic way of "mapping over" the inputs that are sent into a generator, or performing grouping operations on those inputs?

For example, how to transform a generator that expects to be sent sets of strings into a generator that simply accepts a flat stream of strings?


You could use the python-transducer library <https://github.com/sixty-north/python-transducers> I mentioned in my other comments to do this. It supports using the same transducers with iterables and push-event coroutine generators.


You don't, you chain small generators. Compositions of generators transforming data is the key.


Exactly. This is a lib created by someone that didn't get, at all, the point of iteration in Python. We have lazy coroutine exactly for that. It's embeded in the language. You got itertools, sum, any, all et comprehensions to use them.

Iteration is everything in Python. It's the core philosophy.

We already got tools to do this in better, lighter way.


Yeah, but they compose differently and are stateful. Transducers compose much like functions and are stateless.


Are transducers really stateless? Decoders and transducers that perform grouping operations need to keep some state.


They produce and use state when they run (at least in the Fold transformer format), but that's a very different kind of statefulness than generators.


"Stateful" doesn't mean "bad". I know functional programming is all the rage right now, but being stateless doesn't turn magically something into a better version of something stateful. If the stateful version is better integrated, have a heavier stateless version is not going to make your code better.


A similar, but different, project, with associated learning material:

- https://github.com/abingham/python-transducers

- http://www.slideshare.net/alinadolgikh/austin-bingham-transd...

- http://sixty-north.com/blog/series/understanding-transducers...

The last link is an 8 part blog series discussing transducers from first principles, and discussing their use (still in the context of Python).


Note that the first link has been superseded by https://github.com/sixty-north/python-transducers which is also available on PyPI as https://pypi.python.org/pypi/transducer/


This code by Cognitect is a fairly literal port of Clojure transducers to Python. If you'd like to see a more Pythonic interpretation of transducers, take a look at Sixty-North's alternative https://github.com/sixty-north/python-transducers


My own attempt at implementing something like transducers in Haskell: http://hackage.haskell.org/package/foldl-transduce.

It depends on the useful "foldl" library of reducers.


This looks really useful!

Isn't your Transduction the same as doing premap (or lmap from Profunctor)?


Transformations created with premap are stateless, they depend only on the current input. So one can't build UTF8 decoders with them, for example. That said, if you supply a function to premap, the result typechecks as a "Transduction".


eli5: what can you use these for? what are some examples of when you'd use something like this...


The idea is that you can define a transformation separate from its use and then reuse it in many contexts. Some of the win sits in decoupling what you are doing from when and where you are doing it.

Rich Hickey wrote about them on Cognitect's blog a while back: http://blog.cognitect.com/blog/2014/8/6/transducers-are-comi...

The author refers to the test suite for examples: https://github.com/cognitect-labs/transducers-python/blob/ma...


I always had a hard time figuring out how transducers are different "stream transformation" functions like map and filter.

I know that they are more complicated/powerful than that but I can't quite point that out precisely.


You can transform map or filter into a transducer by decoupling the transformation from the data. When you call map, you usually call it on some data, or some stream. When you create a transducer, and you call for example (map inc), you're saying "whenever this transducer is applied, on whatever it is applied to, increase each unit by one".

You can lug around the transformation itself as if it were an object, almost, separate from the data it's meant to transform.


How is that different from just partially applying/currying the map?


A key component to transducers is to realize that map, filter, take, can all be expressed as special cases of the reduce function.

So a curried map function can be expressed as a reduction operation, for example, in python you could do..

  # map that adds 1 to each element
  # same as curried map(sub1)([1,2,3])
  sub1 = lambda x: x - 1
  map(sub1, [1,2,3])
  
  # same outcome using reduce
  reduce(lambda acc, x: acc + [sub1(x)], [1,2,3], [])
Notice that when writing a map operation using the reduce function, we've put two pieces of logic on each step. (1) we have a transformation called sub1, and (2) we have a step operation (returning an array with the transformed x appended to it).

We can use this same logic to create filters with an if statement..

  lt2 = lambda x: x < 2
  filter(lt2, [1,2,3])
  reduce(lambda acc, x: acc + [x] if lt2(x) else acc, [1,2,3], [])
A critical thing to notice in each of these cases is that if we ran a curried filter and map (with transformation / filtering functions already passed), then they would loop over the data twice. However, if you look at the example on this github repo, the want to

* take 3 elements from the sequence

* perform a transformation (map operation) to those elements

Notice that in this case take is being performed before the map, but you could imagine doing something like

* map items from geometric series to float

* filter items that are greater than .15

* take 3 of the remaining items

How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

This is the beauty of transformers and transducers, they provide a composable approach to filtering, transforming, and taking elements from a (possibly infinite) sequence.

Of course, all of this could be done in a for loop, but often it introduces complexity..

  # assume geom_series is the generator from the github example
  take = 3
  for el in geom_series:
      if float(el) < .15:
          result.append(el)
          if len(result) == take: break
However, this for loop is using the same ideas: we have a transformation / if (predicate) operation, and step operation to append the data we want. The problem transducers try to solve is writing a series of these types of operations compactly and cleary as a pipeline of functions (erm, transformers), so those functions can be replaced/moved around quickly, and the pipelines can be combined together simply.

A good article that covers tranducers in javascript (sorry!) is here:

http://simplectic.com/blog/2014/transducers-explained-1/


> How would you do that with map and filter functions without looping over the entire (inexhaustible) sequence?

But isn't this just a matter of defining map and filter over a lazy stream datatype (aka iterators) instead of over lists?

So the workflow would be

   list -> stream -> filter1 -> filter2 -> list
This would let you run the filters "in parallel" without iterating through things twice.


Python's map and filter already run over iterators. If you're talking about creating filters that are composable, so that you could do

  filter1 = filter(predicate1)   # curried filter
  filter2 = filter(predicate2)   # curried filter
  pipeline = compose(filter1, filter2)
  pipeline(<generator of some kind>)
and make it so that the sequence of operations will be

  for el in some_generator:
    if predicate1(el) and predicate2(el): <accumulate value>
Then I would say defining them in this way is useful and important--transducers are exactly one way of doing this! A critical aspect I didn't go into detail on is the idea of a take function. In the github repo, T.take(3) is the portion that allows the transducer to operate of an infinite stream of values.

This is the piece your workflow example would need to take into account. How could I apply a filter followed by something that takes 3 passing values from an infinite sequence? I'm sure you could come up with a way, and it would be worth comparing to the transducer approach :).

(it's worth noting that currying map, filter, etc.. are very complimentary to the transducer approach)


If you are wondering how to make a lazy take() in python, here is one solution:

    def take(num):
        def gen(iterable):
            for i, item in enumerate(iterable):
                if i == num:
                    break
                yield item
        return gen
and then

    list(take(3)(range(100)) == [0, 1, 2]



NB: Here's the Clojure page for transducers, which includes a link to Rich Hickey's introductory blog post and Strange Loop video.

http://clojure.org/transducers


Genuine question: what is the point of this?

it seems like a long-winded and not very clear way to write something you could just do with a function and a list comprehension


Well, in Clojure they solve the problem that the standard functions are inefficient when you chain a bunch of list manipulations together (because they all return lazy lists) and neatly sidestep the fact that Clojure doesn't have generators.


So if Clojure had some form of fusion of composed functions on lists, that would equate out to what transducers are doing?


That's almost literally what happens. (comp (map inc) (filter even?)) when applied takes all even numbers from a collection/channel/what-have-you and adds one to them. Filter and map normally take a collection as their last argument, but when none are supplied, a transducer is returned instead.


How does transducers compare to Gremlin traversal language?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: