PostgreSQL Queries on JSONB Fields with Ecto
The Promotions team at theScore is responsible for enabling dynamic marketing campaigns throughout thescore.bet application. To determine which users are eligible for a campaign, we need to execute dynamic queries in Elixir's Ecto library against data stored in a PostgreSQL JSONB-type field.
https://techblog.thescore.com/2022/06/23/embedded-schema-queries-with-ecto/
The Promotions team at theScore is responsible for enabling dynamic marketing campaigns within thescore.bet application. These promotions allow us to specify an audience of users who will be available to participate in a promotion, which is determined based on a combination of past actions that a user has taken.
For example, when creating a new promotion, we can specify that the audience for that promotion can only include patrons who have previously deposited a specific amount of money, or placed a specific quantity of bets.
Disclaimer: The scenario (and format of the data shown in the examples throughout) are contrived entirely for the purposes of this article.
An overview of the PostgreSQL schema
As mentioned in the introduction, the Promotions service is dependent on the ability to read past events placed by a patron. These events are sent asynchronously via Kafka from other services, and stored internally in the Promotions database as a generic “user action”. These actions are used for a variety of purposes as they related to Promotions, but in this case, we are exploring looking at the past actions of a given user to see if they fulfill audience criteria. The table which they are stored was created with a migration that resembled the code sample below.
defmodule Promotions.Repo.Migrations.CreateUserActionsTable do
use Ecto.Migration
def change do
create table(:user_actions) do
add :action_id, :string, null: false
add :action_type, :action_type, null: false
add :occurred_at, :utc_datetime_usec, null: false
add :region, :region, null: false
add :user_id, :binary_id, null: false
add :attributes, :jsonb, null: false
timestamps()
end
end
end
The corresponding Ecto schema for the “user_actions” table looks like the following.
defmodule Promotions.UserAction do
use Ecto.Schema
alias Promotions.Schemas.UserAction.CasinoWagerAttributes
alias Promotions.Schemas.UserAction.DepositAttributes
alias Promotions.Schemas.UserAction.RegistrationAttributes
alias Promotions.Schemas.UserAction.SportsbookBetAttributes
schema "user_actions" do
field :action_id, :string
field :action_type, Ecto.Enum,
values: [:registration, :deposit, :sportsbook_bet, :casino_wager]
field :occurred_at, :utc_datetime_usec
field :region, Ecto.Enum,
values: [:new_jersey, :colorado, :indiana, :iowa, :ontario]
field :user_id, Ecto.UUID
embeds_one :casino_wager_attributes, CasinoWagerAttributes,
source: :attributes
embeds_one :deposit_attributes, DepositAttributes,
source: :attributes
embeds_one :registration_attributes, RegistrationAttributes,
source: :attributes
embeds_one :sportsbook_bet_attributes, SportsbookBetAttributes,
source: :attributes
timestamps()
end
end
Each of the various embedded schemas to populate the attributes
field is represented in a separate Ecto schema of its own. For the purposes of this article, only a truncated sample of the SportsbookBetAttributes
embedded schema will be shown below as well.
defmodule Promotions.UserAction.SportsbookBetAttributes do
use Ecto.Schema
@primary_key false
embedded_schema do
field :amount_cents_wagered, :integer
field :amount_cents_payout, :integer
field :bet_type, Ecto.Enum, values: [:straight, :parlay]
field :odds, :integer
embeds_many :legs, SportsbookBetLegAttributes
end
end
defmodule Promotions.UserAction.SportsbookBetLegAttributes do
use Ecto.Schema
@primary_key false
embedded_schema do
field :event_id, :string
field :event_type, :string
field :market_id, :string
field :market_selection_id, :string
field :market_type, :string
field :in_play, :boolean
field :odds, :integer
end
end
Example Criteria to Turn into Database Queries
Our marketing team needs to be empowered to create promotional campaigns which include a variety of criteria. The audience criteria, mentioned earlier, will determine which users are permitted to see the marketing content for a promotion, and subsequently participate within it.
The audience criteria can be based on any type of action that we record (ie. deposit, sportsbook bet, etc) and the various attributes of the action. The following are examples of the such an audience rule.
- The patron registered their account in January 2022
- The patron has placed at least 3 deposits from within New Jersey
- The patron placed at least one sportsbook bet on the last years Super Bowl
For the first two examples, the data is stored within more fields that are of a more primitive data type within the user_actions
table of the database, and are very standard to query. However for the last example, it requires querying the JSONB attributes
field to determine if any of the “legs” of a any of the users previous sportsbook bets contain the event_id
associated with last years Super Bowl.
Similarly, the audience criteria also needs to support the scenario such as “The patron has placed a sportsbook bet that was not on last years Super Bowl” as well. This would require looking for a sportsbook bet that does not have an event_id
corresponding to the Super Bowl for the previous year in any of its legs.
The queries we need to write need to work for both “at least one leg of a sportsbook contains” and “none of the legs of a sportsbook bet can contain” scenarios. To do so, we needed to investigate how to make precise queries on a set of data within a JSONB field in PostgreSQL, and subsequently how we can turn those queries into composable Ecto queries within our Elixir application as well.
Notably, we want to be able to support operators of “in” and “not in” when querying for data, as opposed to operators such as “equals”, “not equals”, “greater than or equals”, etc.
Query of a JSONB Field
The first results our team came across while looking into this was using PostgreSQL native functions and operators (https://www.postgresql.org/docs/current/functions-json.html) however we quickly realized that these would not fit all of our use-cases. It would be challenging to turn it into a reusable Ecto query that could be used for a variety of fields and an inelegant solution for deeply nested data.
The next option we came across was to use the function jsonb_to_recordset
to create a queryable schema of records from an array within a JSONB field (ie. the embeds_many :legs
section of our bet attributes).
For the example, we’ll look at in this article, we’ll try to find if at least one bet was placed with the following attributes to see if a specific user has placed a bet while in Ontario, of at least $15, on one of 3 specific event’s (by their event id).
user_id: "fcca980b-8edc-4848-91cc-4e00ff47019c"
region: "ontario"
amount_cents_wagered:
operator: greater_than_or_equal_to
value: $15
event_id (not in any of the following):
- "e8a95b17-972c-4421-9090-8bf66c0804a9"
- "5fbef652-e772-4bae-84f5-31255fe2fadd"
- "dd92d86c-fe6d-43a7-8765-a3e1026145db"
Using jsonb_to_recordset
with SQL
The first step to transforming this into a usable query within our Elixir app was to try to get it working with a query made directly against the database to validate that it is the right tool for the job.
After some initial research, we composed the following SQL query:
SELECT
public.user_actions AS action,
jsonb_to_recordset(action.attributes->'legs') AS leg("event_id" varchar)
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
leg.event_id NOT IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
);
While this allows us to use the operators we want (ie. >=
and NOT IN
) on data within the JSONB attributes field, and gets us exactly the records we want, it would be a challenge to try to make it fit into an Ecto query since Ecto queries only allow for one field to be in the from
clause. If there is a way around that, we didn’t happen to find a solution.
The next option we looked to was modifying the first query by utilizing a lateral-join to try to make a “user action” such that it has a one-to-many relationship with its own “legs” nested within the JSONB “attributes” field. By doing so, we eliminate the need to use multiple clauses in the FROM
section of the query, which helps gets us closer to a valid Ecto query translation. The query then looks like this:
SELECT
action.*
FROM
user_actions AS action
INNER JOIN LATERAL (
SELECT *
FROM jsonb_to_recordset(action.attributes->'legs')
AS leg("event_id" varchar)
) AS leg ON true
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
leg.event_id NOT IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
)
The problem we quickly saw with this query is that will look to see if any of the “legs” of a specific bet adhere to the condition. Instead, we want to make sure that the all of the legs within the bet cannot have any of the given “event id” values. As such, we found that putting the inner join in a subquery helped to achieve the goal, wrapping it in a “NOT EXISTS” condition.
SELECT
action.*
FROM
user_actions AS action
WHERE
action.user_id = 'f0bb75d3-2cce-4328-9177-02349fa99de6' AND
action.region = 'ontario' AND
(action.attributes->'amount_cents_wagered')::integer >= 1500 AND
NOT EXISTS (
SELECT
*
FROM
user_actions AS lateral_action
INNER JOIN LATERAL (
SELECT *
FROM jsonb_to_recordset(lateral_action.attributes->'legs')
AS leg("event_id" varchar)
) AS leg ON true
WHERE
action.id = lateral_action.id AND
leg.event_id IN (
'e8a95b17-972c-4421-9090-8bf66c0804a9',
'5fbef652-e772-4bae-84f5-31255fe2fadd',
'dd92d86c-fe6d-43a7-8765-a3e1026145db'
)
);
This query seems to meet all of the goals that we were looking to achieve, so the next step was to take it and turn it into a composable query within Ecto.
Translating SQL query to an Ecto Query
Many parts of our query are pretty quick and easy to make into an Ecto.Query
structure. We can begin with everything before the subquery. For this example, we can assume all of the values were passed in and the operators for each part of the expression are known:
def query(user_id, region, amount_cents, event_ids)
from action in Promotions.UserAction,
as: :user_action,
where: action.user_id == ^user_id,
where: action.region == ^region,
where: fragment(
"(?->?)::integer",
action.sportsbook_bet_attributes,
"amount_cents_wagered"
) >= ^amount_cents
end
We also need to build a query around the jsonb_to_recordset
data that we created in the lateral join seen earlier.
This was a bit challenging, as the jsonb_to_recordset
is specific to PostgreSQL and as such does not have any utility functions built into Ecto. It also requires us to define an AS ... ON ...
set of clauses to use with it. Since Ecto will create its own AS
alias with each “from” or “join”, that is problematic as well.
After some research on this issue on the elixir forums, we were pointed in the right direction to wrap the subquery in parentheses, so that we could use our own AS
within it, and it would not conflict with the one Ecto adds on.
def sportsbook_bet_legs_query do
from action in Promotions.UserAction,
as: :leg_user_action,
inner_lateral_join: legs in fragment(
"(
SELECT *
FROM jsonb_to_recordset(?->'legs')
AS record(event_id varchar)
)",
field(action, :sportsbook_bet_attributes)
)
on: true
as: :sportsbook_bet_legs
end
To utilize both queries together, we need to include the sportsbook_bet_legs_query
function as a subquery within the first, and wrap it in a where not exists
clause of the outer query.
Since the subquery query uses a lateral join, we need to remember to join it back to the outer query using Ecto’s parent_as/1
function. Another example of this can be found in Ecto’s documentation for exists/1
as well (https://hexdocs.pm/ecto/Ecto.Query.API.html#exists/1)
def sportsbook_bet_query(user_id, region, amount_cents_wagered, event_ids) do
from action in Promotions.UserAction,
as: :action,
where: action.user_id == ^user_id,
where: action.region == ^region,
where:
fragment(
"(?->?)::integer",
action.sportsbook_bet_attributes,
"amount_cents_wagered"
) >= ^amount_cents_wagered,
where:
not exists(
from [
leg_user_action: action_subquery,
sportsbook_bet_legs: sportsbook_bet_legs
] in event_ids_query(),
where: parent_as(:action).id == action_subquery.id,
where: sportsbook_bet_legs.event_id in ^event_ids
)
end
With the function now complete, it can be called by simply taking the earlier example and passing in the arguments to the reusable function:
user_id = "fcca980b-8edc-4848-91cc-4e00ff47019c"
region = :can_ontario
amount_cents_wagered = 1_500
event_ids = [
"e8a95b17-972c-4421-9090-8bf66c0804a9",
"5fbef652-e772-4bae-84f5-31255fe2fadd",
"dd92d86c-fe6d-43a7-8765-a3e1026145db"
]
sportsbook_bet_query(user_id, region, amount_cents_wagered, event_id)
As a reminder, one of the goals of our function was to find a “user action” record that did not contain any of the values of the list of event_id
’s from the attributes of any of its array of “leg” data points. One benefit of structuring the query in the way that has been shown here is that if we want to flip it to find if one (or more) legs does include a particular event_id
, we can simply remove the not
from the not exists
condition without any other modifications.
The queries that we need in our application are more robust, including changing up the operators that we use instead of statically choosing “equals”, “greater than or equals” and “not in” as outlined in our example, but from this template we were able to build out a library of robust and composable queries to scope results based on attributes both in more primitive data types within our database schema, as well as JSONB fields too.