Hello everyone! This is the thirteenth post in the node.js modules you should know about article series.
The first post was about dnode - the freestyle rpc library for node, the second was about optimist - the lightweight options parser for node, the third was about lazy - lazy lists for node, the fourth was about request - the swiss army knife of HTTP streaming, the fifth was about hashish - hash combinators library, the sixth was about read - easy reading from stdin, the seventh was about ntwitter - twitter api for node, the eighth was about socket.io that makes websockets and realtime possible in all browsers, the ninth was about redis - the best redis client API library for node, the tenth was on express - an insanely small and fast web framework for node, the eleventh was semver - a node module that takes care of versioning, the twelfth was cradle - a high-level, caching, CouchDB client for node.
This time I'll introduce you to a very awesome module called JSONStream. JSONStream is written by Dominic Tarr and it parses streaming JSON.
Here is an example. Suppose you have couchdb view like this:
{"total_rows":129,"offset":0,"rows":[
{ "id":"change1_0.6995461115147918"
, "key":"change1_0.6995461115147918"
, "value":{"rev":"1-e240bae28c7bb3667f02760f6398d508"}
, "doc":{
"_id": "change1_0.6995461115147918"
, "_rev": "1-e240bae28c7bb3667f02760f6398d508","hello":1}
},
{ "id":"change2_0.6995461115147918"
, "key":"change2_0.6995461115147918"
, "value":{"rev":"1-13677d36b98c0c075145bb8975105153"}
, "doc":{
"_id":"change2_0.6995461115147918"
, "_rev":"1-13677d36b98c0c075145bb8975105153"
, "hello":2
}
},
...
]}
And you want to only filter out doc
values from the rows
. You can do it easily with JSONStream this way:
var parser = JSONStream.parse(['rows', /./, 'doc']);
This creates a stream that parses out rows.*.doc
.
Since it's a stream you have to feed it data and then have it output the data somewhere. You can do it very nicely and idiomatically in node this way:
req.pipe(parser).pipe(process.stdout);
Here is the output:
{ _id: 'change1_0.6995461115147918', _rev: '1-e240bae28c7bb3667f02760f6398d508', hello: 1 } { _id: 'change2_0.6995461115147918', _rev: '1-13677d36b98c0c075145bb8975105153', hello: 2 }
Where req
is request to couchdb view and parser
is the JSONStream parser, and it all gets piped to process.stdout. The output, as you can see, is only the rows.*.doc
. That was a really easy way to parse a JSON stream without reading the whole JSON into memory.
You can install JSONStream
through npm as always:
npm install JSONStream
JSONStream on GitHub: https://github.com/dominictarr/JSONStream.