In an aggregation pipeline in MongoDB,
is how you get your results out of the pipeline and into another collection (maybe for further analysis).
$out is supported in CosmosDB, but there’s a shortcoming that can hamper some uses cases.
In CosmosDB, inserts have a certain cost based on the document size you are returning. Each insert costs a
certain amount of RU, and based on the insert rate, you will incur a certain RU/s. If your
collection’s allocated RU/s is low, it’s possible to insert documents too quickly and get back a
that you need to rate limit yourself or perform less costly operations.
This gets interesting with the
$out operator since it appears that
$out in CosmosDB is telling
the database to insert all of the results of an aggregation pipeline into a collection. If the collection does not
have high enough RU/s allocated to process the inserts, then the entire aggregation operation will fail with
429. One possible workaround for this would be to create the collection beforehand and then up the
allocated RU/s to the maximum allowed so that
$out will not insert faster than the collection can take
the documents. Unfortunately, the destination collection must be created by
$out and it appears that
all collections in CosmosDB are created with a default of 1000 RU/s. I was not able to find a way to change
What this effectively means is that if you have a
$out stage in any aggregation pipeline which will insert
documents to a destination collection at a rate of more than 1000 RU/s, that aggregation cannot succeed.
The only known workaround at this time is to stream the results of the aggregation pipeline to the client
batch insert documents back into CosmosDB (being careful not to insert too quickly).
The CosmosDB team is aware of this use case and I’m hopeful that they can provide a solution to make this workaround unnecessary.