Streaming Data From The Database In Elixir

Say we have a search feature and an index_article/1 function that knows what to do with the given article – can fetch all related data, concatenate it into a tsvector field and then save in the search cache table. With it we can index any article – so far so good.

Now what if we want to rebuild the whole search index? Looping over each entry in the database and indexing it with the existing function would work, but will require loading everything in RAM which will definitely break in the long run.

Usually we would create some kind of Task that loops as long as there are unprocessed entries in the Database, takes a number of those entries, processes each of them adjusts offset etc.

Well, Elixir can do better, using Repo.stream/2 we can create a stream that we can use with the Stream module, this way we can greatly simplify the above logic.

Here is the whole function that reindexes all articles of the given type:
Continue reading