Elixir Today: Remove duplicate rows in PostgreSQL using Ecto

Intro

In this tutorial, I want to share with you how to remove duplicate rows in your schema using Ecto.

Preparation

  • In your schema migration, let say you have this schema setup.
CREATE TABLE(:student_teacher, primary_key: false) do

     add :id, :uuid, primary_key: true
     add :student_id, references(:students, type: :uuid)
     add :teacher_id, references(:teachers, type: :uuid)

end
  • And then let's insert a couple of duplicate record. (Supposed you are using Ex.Machina for your data mock ups )
student = insert(:student)
teacher = insert(:teacher)
insert_list(100, :student_teacher, student: student, teacher: teacher)

Check for Duplicate Rows using IEx

Run iex -S mix in your console and check for duplicate rows.

iex) query = SELECT COUNT(*), student_id, teacher_id FROM 
student_teacher group by student_id, teacher_id having count(*) > 1;

iex) Ecto.Adapter.SQL.query!(Repo, query)

iex) %Postgrex.Result{
  columns: ["count", "student_id", "teacher_id"],
  command: :select
  num_rows: 1,
  rows: [
    [
      100,
      <<student_id>>,
      <<teacher_id>>
    ],

  ]
}

Run Query to delete Duplicate Rows

Repo.transaction(
      fn ->
        query = """
        DELETE FROM student_teacher s1
        USING student_teacher s2
        where s1.id < s2.id
        AND s1.student_id = s2.student_id
        AND s1.teacher_id = s2.teacher_id
        """

        Ecto.Adapters.SQL.query!(Repo, query)
      end,
      timeout: :infinity
    )

Happy Coding!

ย