CosmosDB with MongoDB API and $out

Feb 17, 2018 09:31 · 326 words · 2 minutes read CosmosDB

In an aggregation pipeline in MongoDB, $out is how you get your results out of the pipeline and into another collection (maybe for further analysis). $out is supported in CosmosDB, but there’s a shortcoming that can hamper some uses cases.

In CosmosDB, inserts have a certain cost based on the document size you are returning. Each insert costs a certain amount of RU, and based on the insert rate, you will incur a certain RU/s. If your collection’s allocated RU/s is low, it’s possible to insert documents too quickly and get back a 429 indicating that you need to rate limit yourself or perform less costly operations.

This gets interesting with the $out operator since it appears that $out in CosmosDB is telling the database to insert all of the results of an aggregation pipeline into a collection. If the collection does not have high enough RU/s allocated to process the inserts, then the entire aggregation operation will fail with a 429. One possible workaround for this would be to create the collection beforehand and then up the allocated RU/s to the maximum allowed so that $out will not insert faster than the collection can take the documents. Unfortunately, the destination collection must be created by $out and it appears that all collections in CosmosDB are created with a default of 1000 RU/s. I was not able to find a way to change this setting.

What this effectively means is that if you have a $out stage in any aggregation pipeline which will insert documents to a destination collection at a rate of more than 1000 RU/s, that aggregation cannot succeed. The only known workaround at this time is to stream the results of the aggregation pipeline to the client batch insert documents back into CosmosDB (being careful not to insert too quickly).

The CosmosDB team is aware of this use case and I’m hopeful that they can provide a solution to make this workaround unnecessary.