PostgreSQL Queries on JSONB Fields with Ecto

The Promotions team at theScore is responsible for enabling dynamic marketing campaigns throughout thescore.bet application. To determine which users are eligible for a campaign, we need to execute dynamic queries in Elixir's Ecto library against data stored in a PostgreSQL JSONB-type field.

đź’ˇ
This blog post was originally published on techblog.thescore.com while I was an employee at theScore.

https://techblog.thescore.com/2022/06/23/embedded-schema-queries-with-ecto/

The Promotions team at theScore is responsible for enabling dynamic marketing campaigns within thescore.bet application. These promotions allow us to specify an audience of users who will be available to participate in a promotion, which is determined based on a combination of past actions that a user has taken.

For example, when creating a new promotion, we can specify that the audience for that promotion can only include patrons who have previously deposited a specific amount of money, or placed a specific quantity of bets.

Disclaimer: The scenario (and format of the data shown in the examples throughout) are contrived entirely for the purposes of this article.

An overview of the PostgreSQL schema

As mentioned in the introduction, the Promotions service is dependent on the ability to read past events placed by a patron. These events are sent asynchronously via Kafka from other services, and stored internally in the Promotions database as a generic “user action”. These actions are used for a variety of purposes as they related to Promotions, but in this case, we are exploring looking at the past actions of a given user to see if they fulfill audience criteria. The table which they are stored was created with a migration that resembled the code sample below.

defmodule Promotions.Repo.Migrations.CreateUserActionsTable do
  use Ecto.Migration

  def change do
    create table(:user_actions) do
      add :action_id, :string, null: false
      add :action_type, :action_type, null: false
      add :occurred_at, :utc_datetime_usec, null: false
      add :region, :region, null: false
      add :user_id, :binary_id, null: false
      add :attributes, :jsonb, null: false

      timestamps()
    end
  end
end

The corresponding Ecto schema for the “user_actions” table looks like the following.

defmodule Promotions.UserAction do
  use Ecto.Schema
  
  alias Promotions.Schemas.UserAction.CasinoWagerAttributes
  alias Promotions.Schemas.UserAction.DepositAttributes
  alias Promotions.Schemas.UserAction.RegistrationAttributes
  alias Promotions.Schemas.UserAction.SportsbookBetAttributes
  
  schema "user_actions" do
    field :action_id, :string
    
    field :action_type, Ecto.Enum,
      values: [:registration, :deposit, :sportsbook_bet, :casino_wager]

    field :occurred_at, :utc_datetime_usec

    field :region, Ecto.Enum,
      values: [:new_jersey, :colorado, :indiana, :iowa, :ontario]

    field :user_id, Ecto.UUID
    
    embeds_one :casino_wager_attributes, CasinoWagerAttributes,
      source: :attributes

    embeds_one :deposit_attributes, DepositAttributes,
      source: :attributes

    embeds_one :registration_attributes, RegistrationAttributes,
      source: :attributes
      
    embeds_one :sportsbook_bet_attributes, SportsbookBetAttributes,
      source: :attributes
    
    timestamps()
  end
end

Each of the various embedded schemas to populate the attributes field is represented in a separate Ecto schema of its own. For the purposes of this article, only a truncated sample of the SportsbookBetAttributes embedded schema will be shown below as well.

defmodule Promotions.UserAction.SportsbookBetAttributes do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field :amount_cents_wagered, :integer
    field :amount_cents_payout, :integer
    field :bet_type, Ecto.Enum, values: [:straight, :parlay]
    field :odds, :integer

    embeds_many :legs, SportsbookBetLegAttributes
  end
end

defmodule Promotions.UserAction.SportsbookBetLegAttributes do
  use Ecto.Schema

  @primary_key false
  embedded_schema do
    field :event_id, :string
    field :event_type, :string
    field :market_id, :string
    field :market_selection_id, :string
    field :market_type, :string
    field :in_play, :boolean
    field :odds, :integer
  end
end

Example Criteria to Turn into Database Queries

Our marketing team needs to be empowered to create promotional campaigns which include a variety of criteria. The audience criteria, mentioned earlier, will determine which users are permitted to see the marketing content for a promotion, and subsequently participate within it.

The audience criteria can be based on any type of action that we record (ie. deposit, sportsbook bet, etc) and the various attributes of the action. The following are examples of the such an audience rule.

  • The patron registered their account in January 2022
  • The patron has placed at least 3 deposits from within New Jersey
  • The patron placed at least one sportsbook bet on the last years Super Bowl

For the first two examples, the data is stored within more fields that are of a more primitive data type within the user_actions table of the database, and are very standard to query. However for the last example, it requires querying the JSONB attributes field to determine if any of the “legs” of a any of the users previous sportsbook bets contain the event_id associated with last years Super Bowl.

Similarly, the audience criteria also needs to support the scenario such as “The patron has placed a sportsbook bet that was not on last years Super Bowl” as well. This would require looking for a sportsbook bet that does not have an event_id corresponding to the Super Bowl for the previous year in any of its legs.

The queries we need to write need to work for both “at least one leg of a sportsbook contains” and “none of the legs of a sportsbook bet can contain” scenarios. To do so, we needed to investigate how to make precise queries on a set of data within a JSONB field in PostgreSQL, and subsequently how we can turn those queries into composable Ecto queries within our Elixir application as well.

Notably, we want to be able to support operators of “in” and “not in” when querying for data, as opposed to operators such as “equals”, “not equals”, “greater than or equals”, etc.

Query of a JSONB Field

The first results our team came across while looking into this was using PostgreSQL native functions and operators (https://www.postgresql.org/docs/current/functions-json.html) however we quickly realized that these would not fit all of our use-cases. It would be challenging to turn it into a reusable Ecto query that could be used for a variety of fields and an inelegant solution for deeply nested data.

The next option we came across was to use the function jsonb_to_recordset to create a queryable schema of records from an array within a JSONB field (ie. the embeds_many :legs section of our bet attributes).

For the example, we’ll look at in this article, we’ll try to find if at least one bet was placed with the following attributes to see if a specific user has placed a bet while in Ontario, of at least $15, on one of 3 specific event’s (by their event id).

user_id: "fcca980b-8edc-4848-91cc-4e00ff47019c"
region: "ontario"
amount_cents_wagered:
  operator: greater_than_or_equal_to
  value: $15
event_id (not in any of the following):
  - "e8a95b17-972c-4421-9090-8bf66c0804a9"
  - "5fbef652-e772-4bae-84f5-31255fe2fadd"
  - "dd92d86c-fe6d-43a7-8765-a3e1026145db"

Using jsonb_to_recordset with SQL

The first step to transforming this into a usable query within our Elixir app was to try to get it working with a query made directly against the database to validate that it is the right tool for the job.

After some initial research, we composed the following SQL query:

SELECT
  public.user_actions AS action,
  jsonb_to_recordset(action.attributes->'legs') AS leg("event_id" varchar)
WHERE
  action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
  action.region = 'ontario' AND
  (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
  leg.event_id NOT IN (
    'e8a95b17-972c-4421-9090-8bf66c0804a9',
    '5fbef652-e772-4bae-84f5-31255fe2fadd',
    'dd92d86c-fe6d-43a7-8765-a3e1026145db'
  );

While this allows us to use the operators we want (ie. >= and NOT IN) on data within the JSONB attributes field, and gets us exactly the records we want, it would be a challenge to try to make it fit into an Ecto query since Ecto queries only allow for one field to be in the from clause. If there is a way around that, we didn’t happen to find a solution.

The next option we looked to was modifying the first query by utilizing a lateral-join to try to make a “user action” such that it has a one-to-many relationship with its own “legs” nested within the JSONB “attributes” field. By doing so, we eliminate the need to use multiple clauses in the FROM section of the query, which helps gets us closer to a valid Ecto query translation. The query then looks like this:

SELECT
    action.*
FROM
    user_actions AS action
INNER JOIN LATERAL (
    SELECT *
    FROM   jsonb_to_recordset(action.attributes->'legs')
    AS     leg("event_id" varchar)
) AS leg ON true
WHERE
    action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
    action.region = 'ontario' AND
    (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
    leg.event_id NOT IN (
        'e8a95b17-972c-4421-9090-8bf66c0804a9',
        '5fbef652-e772-4bae-84f5-31255fe2fadd',
        'dd92d86c-fe6d-43a7-8765-a3e1026145db'
    )

The problem we quickly saw with this query is that will look to see if any of the “legs” of a specific bet adhere to the condition. Instead, we want to make sure that the all of the legs within the bet cannot have any of the given “event id” values. As such, we found that putting the inner join in a subquery helped to achieve the goal, wrapping it in a “NOT EXISTS” condition.

SELECT
    action.*
FROM
    user_actions AS action
WHERE
    action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
    action.region = 'ontario' AND
    (action.attributes->'amount_cents_wagered')::integer >= 1500 AND
    NOT EXISTS (
        SELECT
            *
        FROM
            user_actions AS lateral_action
        INNER JOIN LATERAL (
            SELECT *
            FROM jsonb_to_recordset(lateral_action.attributes->'legs')
            AS leg("event_id" varchar)
        ) AS leg ON true
        WHERE
            action.id = lateral_action.id AND
            leg.event_id IN (
                'e8a95b17-972c-4421-9090-8bf66c0804a9',
                '5fbef652-e772-4bae-84f5-31255fe2fadd',
                'dd92d86c-fe6d-43a7-8765-a3e1026145db'
            )
    );

This query seems to meet all of the goals that we were looking to achieve, so the next step was to take it and turn it into a composable query within Ecto.

Translating SQL query to an Ecto Query

Many parts of our query are pretty quick and easy to make into an Ecto.Query structure. We can begin with everything before the subquery. For this example, we can assume all of the values were passed in and the operators for each part of the expression are known:

def query(user_id, region, amount_cents, event_ids)
  from action in Promotions.UserAction,
    as: :user_action,
    where: action.user_id == ^user_id,
    where: action.region == ^region,
    where: fragment(
      "(?->?)::integer",
      action.sportsbook_bet_attributes,
      "amount_cents_wagered"
    ) >= ^amount_cents
end

We also need to build a query around the jsonb_to_recordset data that we created in the lateral join seen earlier.

This was a bit challenging, as the jsonb_to_recordset is specific to PostgreSQL and as such does not have any utility functions built into Ecto. It also requires us to define an AS ... ON ... set of clauses to use with it. Since Ecto will create its own AS alias with each “from” or “join”, that is problematic as well.

After some research on this issue on the elixir forums, we were pointed in the right direction to wrap the subquery in parentheses, so that we could use our own AS within it, and it would not conflict with the one Ecto adds on.

def sportsbook_bet_legs_query do
  from action in Promotions.UserAction,
    as: :leg_user_action,
    inner_lateral_join: legs in fragment(
      "(
        SELECT *
        FROM jsonb_to_recordset(?->'legs')
        AS record(event_id varchar)
      )",
      field(action, :sportsbook_bet_attributes)
    )
    on: true
    as: :sportsbook_bet_legs
end

To utilize both queries together, we need to include the sportsbook_bet_legs_query function as a subquery within the first, and wrap it in a where not exists clause of the outer query.

Since the subquery query uses a lateral join, we need to remember to join it back to the outer query using Ecto’s parent_as/1 function. Another example of this can be found in Ecto’s documentation for exists/1 as well (https://hexdocs.pm/ecto/Ecto.Query.API.html#exists/1)

def sportsbook_bet_query(user_id, region, amount_cents_wagered, event_ids) do
  from action in Promotions.UserAction,
    as: :action,
    where: action.user_id == ^user_id,
    where: action.region == ^region,
    where:
      fragment(
        "(?->?)::integer",
        action.sportsbook_bet_attributes,
        "amount_cents_wagered"
      ) >= ^amount_cents_wagered,
    where:
      not exists(
        from [
               leg_user_action: action_subquery,
               sportsbook_bet_legs: sportsbook_bet_legs
             ] in event_ids_query(),
         where: parent_as(:action).id == action_subquery.id,
         where: sportsbook_bet_legs.event_id in ^event_ids
      )
end

With the function now complete, it can be called by simply taking the earlier example and passing in the arguments to the reusable function:

user_id = "fcca980b-8edc-4848-91cc-4e00ff47019c"
region = :can_ontario
amount_cents_wagered = 1_500
event_ids = [
  "e8a95b17-972c-4421-9090-8bf66c0804a9",
  "5fbef652-e772-4bae-84f5-31255fe2fadd",
  "dd92d86c-fe6d-43a7-8765-a3e1026145db"
]

sportsbook_bet_query(user_id, region, amount_cents_wagered, event_id)

As a reminder, one of the goals of our function was to find a “user action” record that did not contain any of the values of the list of event_id’s from the attributes of any of its array of “leg” data points. One benefit of structuring the query in the way that has been shown here is that if we want to flip it to find if one (or more) legs does include a particular event_id, we can simply remove the not from the not exists condition without any other modifications.

The queries that we need in our application are more robust, including changing up the operators that we use instead of statically choosing “equals”, “greater than or equals” and “not in” as outlined in our example, but from this template we were able to build out a library of robust and composable queries to scope results based on attributes both in more primitive data types within our database schema, as well as JSONB fields too.