A small problem with Observables, but also JavaScript is broken
RxJSJavaScriptObservableIterable(Okay, neither is "completely broken"... but hear me out...) #
TLDR: #
Teardown your resources as soon as you know you should, before you do anything else!
JavaScript iteration was implemented cleverly, but poorly, and that's why we can't have nice things.
Click here if you're just angry and want to complain about this article on Twitter
How are Observables broken? #
Over the years in RxJS, we've had a few issues come up that usually relate to reentrancy (meaning an observable operation that observes a source, that when it notifies causes it to recieve another notification from the source). The basic thing that happens is that we'll have some operator, like take
, that notifies of complete, and then the complete
handler downstream will next
or error
or the like into a subject or something that is feeding into the take
's source. Here's an example:
ts
import {Subject ,tap ,take } from 'rxjs';constsubject = newSubject <number>();subject .pipe (tap (console .log ),take (1)).subscribe ({next : (n ) =>subject .next (n + 1),complete : () =>subject .next (42),});subject .next (1);// In every version of RxJS// (except the latest RxJS alpha at the time of this writing)// this will log:// 1// 2// 42
This gets even more gnarly if there are errors thrown into the mix... consider the following:
ts
import {Subject ,map ,take ,tap ,catchError ,of } from 'rxjs';constsubject = newSubject <number>();subject .pipe (map ((v ) => {if (v === 42) {throw newError ('No Monty Python references!');}returnv ;}),catchError ((err ) => {console .log ('LOL! an error? Yes: ' +err ?.message );returnof ('wee');}),tap (console .log ),take (1),).subscribe ({next : (n ) =>subject .next (n + 1),complete : () =>subject .next (42),});subject .next (1);// 1// 2// LOL! an error? Yes: No Monty Python references!// wee
Or... really any considerable work... What if that map
had some very expensive calculation in it? It would be executed, even though the downstream subscriber (inside of take
) already told us it's no longer interested!
We've had several fixes around this... most recently in the unreleased RxJS 8 alpha: https://github.com/ReactiveX/rxjs/pull/6396
And all of those fixes involve some code stink in the form of needing to get a handle to the subscriber so you can unsubscribe it from within itself before you notify anything downstream... for example what was done to fix this in take
The problem is that this isn't fixed everywhere, and basically any operator that could even have an error will need to do this, or risk having this same breakage.
Take map
for example:
ts
import {Subject ,map ,tap } from 'rxjs';constsubject = newSubject <number>();subject .pipe (tap (console .log ),map (() => {throw newError ('ROFL');}),).subscribe ({error : (err ) => {console .log (`Error: ${err ?.message }`);subject .next (42);},});subject .next (1);// 1// Error: ROFL// 42
The Cause: Teardown/unsubscription is incorrectly timed #
Fundamentally, the logic for "what happens when a subscriber complete/error is called?" is currently, roughly as follows:
- Flag the subscriber closed
- Notify (complete/error)
- Unsubscribe/Teardown
However, while coming up with the solution to the issues I listed above, I came up with a principal for implementing operators that we adopted (IIRC @cartant was the most active contributor at the time that also agreed with this), and that principle was basically "Teardown your resources as soon as you know you should, before you do anything else". So, if you're in take
, and you take your fill, unsubscribe, then notify your next
and complete
. This principle solved the reentrancy issues we'd been seeing.
It's also a pretty good principle in general: Don't keep resources around any longer than you have to.
However, it doesn't take long mentally exercising this principle in earnest before you realize there are issues with almost every operator. Like map
above. It seems rather smelly that in every operator implementation we'd have to trap a Subscriber
reference, just so it can unsubscribe itself before notifying downstream. Think about this: The subscriber needs to reference itself so it unsubscribes before it emits.
So let's take our principle all the way to the actual Subscriber itself:
"Teardown your resources as soon as you know you should, before you do anything else"
...If that's true, then we should teardown our resources as soon as we know we can/should during the lifecycle of the Subscriber
itself. ...But we don't.
We actually go out of our way to not do that, executing this code in our Subscriber:
ts
class Subscriber<T> {// ...complete(): void {if (this.isStopped) {handleStoppedNotification(COMPLETE_NOTIFICATION, this);} else {this.isStopped = true;this._complete();}}protected _complete(): void {try {this.destination.complete();} finally {this.unsubscribe();}}}
Or simplified to the important bits:
ts
class Subscriber<T> {// ...complete(): void {if (!this.isStopped) {this.isStopped = true;this.destination.complete(); // notify downstreamthis.unsubscribe(); // flag closed, teardown}}}
Even weirder, our Subscriber isn't closed
when it's "stopped". It's only closed
after unsubscribe
was called. So if you examined it while it's notifying downstream it would be like "yeah, I'm not closed" when you clearly cannot notify anything through it anymore — because of this non-public, mysterious to non-implementors "isStopped" bit. ... yet more stank.
The Solution: Follow the principle for all observables! #
We need to make sure we teardown/unsubscribe before we notify of anything. That is to say, the simplified implementation should look like this:
ts
class Subscriber<T> {// ...complete(): void {if (!this.closed) {this.unsubscribe(); // flag closed, teardownthis.destination.complete(); // notify downstream}}}
Because the subscriber itself is the first to know that it no longer needs whatever resources it has had added to it (via add
, part of the Subscription
it inherited along with unsubscribe
), the Subscriber should tear things down before notifying on.
This means we can remove any specialized code that captures the subscriber just so it can unsubscribe it first! Removing the stink.
This would be a breaking change #
Fundamentally, what this means is things like our finalize
operator, (which is "finally" in other implementations) would need to have its handler fired before any downstream "complete" notifications.
Duality and alignment with other types #
The breaking change above, though, makes sense, when you look at Observable
as the "dual" of Iterable
. There's been plenty of coverage of this duality, starting with @headinthebox's paper on this duality here
If we look at a simplified Iterable implementation (written out as a generator function), we can test this behavior:
ts
function*producer () {try {console .log ('producer start');yield 1;yield 2;yield 3;} finally {console .log ('producer teardown');}}functionconsumer () {for (constvalue ofproducer ()) {console .log (`consumer gets: ${value }`);}console .log ('consumer knows producer is done');}consumer ();// Logs:// producer start// consumer gets: 1// consumer gets: 2// consumer gets: 3// producer teardown// consumer knows producer is done
As you can see, the producer "tears down" or finalizes BEFORE the consumer can possibly know that the producer is done. Research has shown me that this is true for every iteration protocol implementation, including IEnumerable from DotNet, which is were observable, and ultimately RxJS was born from.
JavaScript's iteration design is fundamentally flawed! #
Unrelated to any of this, even if we fix this here in the proposed way, any operation which knows it can unsubscribe from the source at the moment before it notifies the consumer of a value still isn't completely fixed. For example take(count)
. Because as soon as take realizes that its count as been met, it must unsubscribe from the source. This means in those scenarios, we'll will need to trap a subscriber instance and tear it down before we next out that last value. It stinks, but there it is.
So what would solve this neatly? Well, if we could somehow call a "next" like method on Subscriber
that not only send the last value, but flagged us closed in the meantime. Let's hypothetically call that... return
?
ts
class Subscriber<T> {// ...return(lastValue: T) {if (!this.closed) {this.unsubscribe();this.destination.next(lastValue);this.destination.complete();}}}
But wait! that destination.next
call needs to know it should teardown first, and that complete
call doesn't make sense, if the new return
can do the whole dance for you!.
Okay, but Ben, "Why would we want two different ways of completing an observable?". WE DON'T! The truth is that we could have ONE complete (aka return
) if calling it with void
(or the complete absence of a value, aka no argument at all) was a signal NOT to next, then we could have one "return" (aka "complete") to rule them all!:
ts
class Subscriber<T> {// ...complete(lastValue?: T) {if (!this.closed) {this.unsubscribe();if (arguments.length > 0) {// Not void!this.destination.next(lastValue);}this.destination.complete();}}}
Therefor a take
implementation could look like this, and everything is solved:
ts
interfaceObserver <T > {next (value :T ): void;error (error : any): void;complete (value :T ): void;}interfaceSubscriber <T > {next (value :T ): void;error (error : any): void;complete (value :T ): void;}classObservable <T > {constructor(init : (subscriber :Subscriber <T >) => void) {}subscribe (observer :Observer <T >): void {}}// -- cut --functiontake <T >(count : number) {return (source :Observable <T >) =>newObservable <T >((subscriber ) => {letseen = 0;source .subscribe ({next : (value ) => {seen ++;if (seen >=count ) {subscriber .complete (value );} else {subscriber .next (value );}},error : (err ) =>subscriber .error (err ),complete : (...args ) =>subscriber .complete (...args ),});});}
So why am I saying JavaScript's design for iterables is flawed?
If I'm showing you that a return (aka complete) in Observable that sometimes gets a value, and other times does not, definitively solves this issue, then certainly that design should have some analog in JavaScript. BUT... what I'm calling complete
above is analogous to return
in JavaScript... and a value passed via return
in JavaScript is "special" and not part of the iteration.
Consider the following:
ts
function*iter () {yield 1;yield 2;return 3;}for (constvalue ofiter ()) {console .log (value );}// 1// 2
WHAT HAPPENED TO THE 3
?!
It was eaten by JavaScript's bad iteration design! (I'm sorry if you worked on this, don't worry I make bad things too, see this library for plenty of examples)
In every language BUT JavaScript, iteration is done in two steps:
- Can I get the next value?
boolean
- Get the next value.
V
.
Leaving only two states you can be in: Either you can get a value and you get one, or you can't get a value.
But the designers of JavaScript's iterator thought they could do better! They can do it all in one step:
- Get an object that gives me a value OR tells me if I'm done.
{ value: V, done: boolean }
But this creates three possible states! You have a value, and can get more; You don't have a value, and can get no more; OR the magical third state, You DO have a value, and can get no more.
But so far, this seems fine, and almost aligns with the hypothetical change above, right? "complete with a value". So far so good.
BUT I demonstrated above that the return
value is not part of the actual iterable values! And there in lies the flaw! You can't possibly implement a take
using this mechanism in JavaScript iteration, because it wouldn't work right:
ts
function*iter () {letn = 0;while (true) {yieldn ++;}}// Ideal, but wouldn't workfunction*takeBetter <T >(source :Iterable <T >,count : number) {letseen = 0;for (constvalue ofsource ) {seen ++;if (seen >=count ) {returnvalue ;}yieldvalue ;}}for (constvalue oftakeBetter (iter (), 3)) {console .log (value );}// It only logs two values :(// 1// 2
ts
function*iter () {letn = 0;while (true) {yieldn ++;}}// -- cut --// Less ideal, but does workfunction*take <T >(source :Iterable <T >,count : number) {letseen = 0;for (constvalue ofsource ) {seen ++;yieldvalue ;if (seen >=count ) {return 'totally ignored value';}}}for (constvalue oftake (iter (), 3)) {console .log (value );}// 0// 1// 2
Okay, but Ben... So what?! It seems just as simple?!
Okay, sure, but it's broken! Before we even yield in that take
, we inherently know that our source iterable can teardown. But JavaScript iteration protocol lacks the mechanism to deal with this appropriately!
ts
function*iter () {try {letn = 0;while (true) {yieldn ++;}} finally {console .log ('cleaned up');}}for (constvalue oftake (iter (), 3)) {console .log (value );}// 0// 1// 2// cleaned up
In an ideal world, "cleaned up" should be logged before "2". In place of console.log(value)
could be something that does a lot of work, or somehow impedes the clean up that happens in that finally block. In language use, there's really no telling what could be happening in there. (This holds true for RxJS's usage as well!). Everything should hold up to the principle "Teardown your resources as soon as you know you should, before you do anything else!"
But why is this broken in this way? Basically, because JavaScript decided to invent a new "single step" iteration mechanism, that made the seemingly obvious choice for the IteratorResult
of return 0
be { done: true, value: 0 }
. Pretty neat, right? Well, no, actually not at all.
You see, iteration protocol in JavaScript can't handle including the { done: true, value: ??? }
in the iteration. And why is that? Because it can't differentiate between { done: true, value: undefined }
from a generator literally executing return undefined
, and the implicit "void" (in practice undefined
) return of all JavaScript functions being sent with that single iteration step:
ts
function*iter1 () {yield 1;yield 2;returnundefined ;}// vsfunction*iter2 () {yield 1;yield 2;}
To accommodate { done: true, value: 'whatever' }
being iterated in a for..of
loop (or anything else that reads iterables, like rest spreads, destructuring, or Array.from()
), you'd then have to deal with an implicit undefined
at the end of EVERY iterable:
ts
declare constgetSomeData : () =>Iterable <{isGood : boolean;value : string }>;// -- cut --function*readGoodData () {constdataRows =getSomeData ();for (constdata ofdataRows ) {if (data .isGood ) {yielddata .value ;}}}for (constvalue ofreadGoodData ()) {// If { done: true, value: ??? } is iterated, we'll get a trailing "undefined" here.console .log (value );}
JavaScript is flawed? JavaScript iteration is. #
There's two things at play here.
- The implicit return of all JavaScript functions being
undefined
rather thanvoid
being an actual thing (as opposed to a pig that TypeScript puts lipstick on withvoid
). - The iteration mechanism being "let's iterate in a single step".
If all functions implicit returns were a void
that was a real thing, meaning "you can't access or read this or use this or you'll get an error, because it's not to be used, it's void". Then we wouldn't be in this mess. But that ship has sailed.
If iteration was done in the "normal" two steps, we'd have this solved as well.
Iterators were introduced as a concept in programming languages by the creator of the CLU (Cluster) programming language (and 2008 Turing Award Recipient), Barbara Liskov at MIT in 1973. From a paper she contributed to, Ownership Types for Object Encapsulation, published 2003:
interface TEnumeration<enumOwner, TOwner> {T<TOwner> getNext();boolean hasMoreElements();}
Or in Java:
public interface Iterator<Item>{Item next();boolean hasNext();}
Or Dot Net:
public interface IEnumerator<T>{T Current;boolean MoveNext();}
Iteration is effectively done by running code to see if you have a next value, then pulling the value in a second call. If iteration was done in two steps, there's no chance anyone can conflate any of this. Consider the following iterable defined with a generator function in JavaScript:
ts
function*iter () {leta = 1;yielda ;a =a + 1;yielda ;a =a + 2;yielda ;console .log ('additional code');}
In JavaScript, since everything is done in one step, this work like:
iterator.next()
executeslet a = 1
, returns{ done: false, value: 1 }
iterator.next()
executesa = a + 1
, returns{ done: false, value: 2 }
iterator.next()
executesa = a + 2
, returns{ done: false, value: 4 }
iterator.next()
executesconsole.log('additional code')
, returns{ done: true, value: undefined }
With a more contemporary iteration design, this would look as follows:
iterator.hasNext()
executeslet a = 1
and returnstrue
.iterator.next()
returns1
.iterator.hasNext()
executesa = a + 1
and returnstrue
.iterator.next()
returns2
.iterator.hasNext()
executesa = a + 2
and returnstrue
.iterator.next()
returns4
.iterator.hasNext()
executesconsole.log('additional code')
and returnsfalse
.
Sure, it's more steps, but we're not allocating an object on every turn, and we can differentiate between whether or not a real value has been returned, or we're just returned. Even in a language like JavaScript where we return undefined
implicitly from functions.
I suppose you could differentiate this in JavaScript's mechanism by seeing the difference between { done: true }
and { done: true, value: undefined }
, but "Yay, JavaScript!" that doesn't really work either if you naively read the value ({ done: true }).value === undefined
. You'd have to check result.done && 'value' in result
, but that's a footgun and a bit inefficient.
So what about an example where we use return
in iteration?
ts
function*iter () {leta = 1;yielda ;a =a + 1;yielda ;a =a + 2;returna ;}
In this hypothetical, the more contemporary iteration design would execute as follows:
iterator.hasNext()
executeslet a = 1
and returnstrue
.iterator.next()
returns1
.iterator.hasNext()
executesa = a + 1
and returnstrue
.iterator.next()
returns2
.iterator.hasNext()
executesa = a + 2
and returnstrue
.iterator.next()
returns4
.iterator.hasNext()
returnsfalse
, with nothing to execute as the function returned.
But then we're hosed if we just wanted to end our iteration early... Because in JavaScript, return;
is observably the same as return undefined;
as far as the consumer is concerned. This is why in C#, return
in a generator function is a compile-time error, and there's yield return
and yield break
.
How does this hold up to our principle above? "Teardown your resources as soon as you know you should, before you do anything else!" Since you're checking to see if you have another value before you're allowing the consumer to consume it, it holds up perfectly. There's enough time to notify teardown any upstream resource with an associated method call. In the case of generators, it's the poorly named return
method, I'd have supplied some sort of finalize()
method, personally, I think. But I'm not a language designer.
Okay, well how do we fix our take
above? #
Now we're left to do some hacky stuff to get this working... In order for anything to "know" that it it's done sending values at or before the moment we yield that last value, we'll have to examine the iterator to see if it's actually a generator, and then invoke it's return()
method.
ts
function*take <T >(source :Iterable <T >,count : number) {letseen = 0;for (constvalue ofsource ) {seen ++;constdone =seen >=count ;if (done && 'return' insource && typeofsource .return === 'function') {// Execute the finally block in the iterator if it// happens to be a generator.source .return ();}yieldvalue ;if (done ) {return;}}}
Wait, are you saying that Observable complete should have a value? #
No. Not in JavaScript. It simply doesn't make sense and there are no use cases to support it. It would add an undue level of complexity. Just as generator.return("this value")
can only echo the value back to the consumer that executed the call, even though it does cause the innards of the generator itself to jump to whatever finally
block its in; Observables should not be able to push a value through it's "return" (complete
) to be observed by it's consumer. Being the "dual" the roles are reversed. If a iterables consumer can't pass a value to the producer through return(value)
, an observable's producer shouldn't be able to pass a value to its consumer through complete(value)
(it's return). It doesn't make any sense, and like I said, there are no use cases to support it anyhow.
There are probably some thinking-types out there that might say "well why not pass { done: boolean, value: T }
to next
in Observables?" That's no good either! The problem is that next
is reserved for observed values from the set of events. { done: true, value: T }
is NOT "observed" during iteration of an iterable, why would it be observed while subscribed to an observable? That doesn't fit the duality of the type.
Summary #
- RxJS Observable, and all Observable implementations will need to start tearing down as soon as they're able to. That means that teardown and other finalization needs to happen before the consumer is notified of
complete
orerror
. - JavaScript's iterable design was an avante garde idea, but it has some rough edges related to the fact that it tries to do too much in one call; and they probably should have used an existing design, rather than invent a new one.
- I ramble a lot, and if you made it this far. Thank you for putting up with my nonsense. <3
- Previous: forEach is a code smell