2 minute readLast updated on

Performing TPC-H benchmark in Postgresql

Database
dbt-3

TPC-H is a decision support benchmark, which consists of a set of business-oriented special query and concurrent data modification. The data that is queried and populates the database has broad industry relevance. This benchmark reflects multiple aspects of a system's ability to process queries. TPC-H consists of database agnostic data generation specifications. DBT-3 is a fair usage implementation of TPC-H benchmark for many RDBMS including Postgres.

Hardware specification

A VM with the following specs was rented in Azure for this benchmarking.

HardwareConfiguration
VM Cloud providerAzure
VM seriesF-Series
InstanceStandard F16s v2
CPUIntel Xeon Platinum 82
Vcpus16
Memory32 GiB
DiskPremium SSD LRS - 30 GiB

Software specification

The following is the software specification for the performed benchmarking.

SoftwareConfiguration
OSUbuntu 20.04 LTS
Kernel5.15.0-1050-azure
DatabasePostgres 16
TPC-H3.0.1
DBT-33.0.1

Downloading TPC-H tools file

The TPC Tools are available free of charge, however all users must agree to the licensing terms and register prior to use. You'll need to fill out a form and agree to the licensing terms. You'll receive an email with the link to download the tools. On downloading the zip file, execute the following command.

unzip *.zip      #replace with actual filename

Installing dbt-3

Following are the prerequisites for installing dbt-3

  • C compiler
  • CMake
  • make
  • patch
  • bc
  • TPC-H tools

First clone the dbt-3 repository and change the current working directory to the repository.

git clone https://github.com/osdldbt/dbt3.git && cd ./dbt3

Next create a valid build using the following command.

make -f Makefile.cmake release

Now change the working directory to builds/release and install using cmake.

cd builds/release
cmake --install . --prefix /usr/local

Running the benchmark

Now that we have installed dbt3, we can proceed with running the benchmark. Switch the current working directory to the place where you unzipped the tpch tool.

First, let's first create a database first.

psql -c 'CREATE DATABASE dbt3;'

Now, we have to build the TPC-H tools for postgres. This is done by executing the following command.

dbt3-build-dbgen --patch-dir=<path to dbt3 repo>/patches --query-dir=<path to dbt3 repo>/queries pgsql "TPC-H V3.0.1"/ # replace tpch version accordingly

Once the tool is built, a complete test can be run by the following command.

dbt3-run --tpchtools="TPC-H V3.0.1" pgsql /tmp/results

This will run the generate the data files for a 1 GB scale factor database load, power and throughput test, with 2 streams, against PostgreSQL and save the results of the test in /tmp/results.

The dbt3-run script can be used to run any combination of a load test, power test, and throughput test. A load tests must be run in order to create the database before a power or throughput tests can be run individually.

Results

Power Test

  • Seed: 1125170539
Duration (seconds)Query Start TimeRF1 Start TimeRF2 Start Time
Query End TimeRF1 End TimeRF2 End Time
7573.622023-11-25 17:05:46.012023-11-25 17:05:41.152023-11-25 19:11:54.60
2023-11-25 19:11:54.562023-11-25 17:05:45.982023-11-25 19:11:54.70
QueryResponse Time (seconds)
12.32
20.38
30.48
40.18
50.49
60.41
70.56
80.70
91.49
100.65
110.23
120.64
130.61
140.69
150.79
160.35
172497.60
183.40
190.59
205054.63
210.36
220.13
RF14.83
RF20.10

Throughput Test

Stream execution summary:

StreamDuration (seconds)Query Start TimeRF1 Start TimeRF2 Start Time
SeedQuery End TimeRF1 End TimeRF2 End Time
18338.322023-11-25 19:11:55.802023-11-25 19:11:54.842023-11-25 19:11:59.73
11251705402023-11-25 21:30:54.052023-11-25 19:11:59.692023-11-25 19:11:59.83
28783.612023-11-25 19:11:55.802023-11-25 19:11:59.942023-11-25 19:12:04.77
11251705412023-11-25 21:38:19.342023-11-25 19:12:04.732023-11-25 19:12:04.88

Query execution duration (seconds):

StreamQ1Q2Q3Q4Q5Q6Q7
12.350.350.480.180.480.410.56
22.350.350.480.180.480.410.57
Min2.350.350.480.180.480.410.57
Max2.350.350.480.180.480.410.57
Avg1.170.170.240.090.240.200.28
StreamQ8Q9Q10Q11Q12Q13Q14
10.701.430.650.210.640.600.42
20.691.500.650.220.640.610.41
Min0.691.500.650.220.640.610.41
Max0.691.500.650.220.640.610.41
Avg0.340.750.330.110.320.300.21
StreamQ15Q16Q17Q18Q19Q20Q21
10.790.352364.513.370.595957.870.36
20.800.352738.753.370.596028.830.36
Min0.800.352738.753.370.596028.830.36
Max0.800.352738.753.370.596028.830.36
Avg0.400.181369.371.690.303014.420.18
StreamQ22RF1RF2
10.134.850.10
20.134.790.11
Min0.134.790.10
Max0.134.850.11
Avg0.074.820.11

How do I interpret the result? We can in general interpret why some queries, like Q17 and Q20 are taking a lot of time, looking at the query structure and how postgres processes these queries(or these type of queries in general). It can also be used to see if any major change we are making on the general Postgres architecture is causing a significant impact. Say if we implement the Yannakakis algorithm in Postgres. The comparison between those two variants could provide meaningful results as to the performance.

Get in touch 👋

Feel free to email me about anything. Want some advice? Give some feedback?

You can also reach me around the web: GitHub, Twitter, Instagram