After several days of testing, here are the results.
Platform in Short
Host
Ubuntu Linux 16.04 LTS (System 76) Intel(R) Core(TM) i7-4710MQ CPU @ 2.50GHz
16GB RAM
Samsung 850 SSD 500GB (formatted ext4)
Virtual Guest
3 CPUs on VirtualBox 5.2.04
CentOS 7.2.1511
4096 MB
Fixed Size Virtual Drive formatted xfs
Benchmarking Software: www.hammerdb.com v2.21 running "remotely" on the Host
DBMSs: PostgreSQL 9.2 and MySQL community edition 5.75
Motivation
Primarily I wanted to explore benchmarking methods to compare DBMSs. Secondarily, it is a very good practice to benchmark your database throughout the lifecycle of your system or product. Sometimes a DBA does not have access to the app layer for security reasons, thus a benchmark that runs outside of the app layers provides a work-around. It also is immune to application changes. Benchmarking without regard to the application helps detect performance regressions in other layers other than the application.
Results
There are many disclaimers here. One of which, this is on a Virtual Box. Not bare metal. A second one is there are multiple innodb options to experiment with. Also, HammerDB is relying on stored procedures/functions to create the workload. It may be that PG is superior here. (PG does cache query plans in stored functions, but MySQL also has a stored program cache: https://dev.mysql.com/doc/refman/5.7/en/server-system-variables.html#sysvar_stored_program_cache)
For HammerDB I use 10 warehouses 3 virtual users. At 3, there is enough CPU to allocate processes (in PostgreSQL's case) and threads (in MySQL's case) to separate cores/"chips".
With PostgreSQL 9.2, I set checkpoint_segments=35 to make the warnings go away and set shared_buffers=2016MB. We we get Orders Per Minute (OPM) of three runs:
Postgresql 9.2 OPM
21,148
21,868
21,165
(We have vm.swappiness at 10 BTW, but all of the database fits in RAM and there's low memory pressure.)
MySQL 5.75 with Performance Schema on
9,427
9,361
8,955
The difference is so stark at this point, I'm questioning my chops. Query cache is off (query_cache_size = 0, query_cache_type = 0). Innodb_buffer_pool is 1300M. That's smaller than shared_buffers used in Postgres but they are not the same thing. The database size is 2,613 MB on Postgres, and 961 MB on MySQL. Remember, MySQL also benefits from Linux's filesystem cache just like Postgres, so we shouldn't have this big of a difference even if innodb_buffer_pool is not completely perfect. (I later set the innodb_buffer_pool to 2,613MB just for grins, and of course there was no significant difference). I wonder why the database size is so much smaller on MySQL than PG, but that should help MySQL perform better at the I/O level. And on MySQL we never see CPU pegged (each CPU barely ever goes above 50%), so any compression isn't being bottlenecked on CPU.
Let's get MySQL up to speed to Postgres
MySQL has lots more knobs that can alter it's performance. But let's first turn off the Performance Schema (PS).
11,369 OPM
Better. PS does not have a counterpart in PG, so it's a fairer comparison, but it's still nowhere near PG in performance for this test.
innodb_flush_method = IO_DIRECT
11,064 OPM
Not much of a difference, the decrease could be due to some of the non-deterministic nature of the whole stack. So I wouldn't say this setting and result made a difference (more runs required), and it doesn't get us near PG.
numactl interleave=all
11,179 OPM
See https://blog.jcole.us/2010/09/28/mysql-swap-insanity-and-the-numa-architecture/ for details. But with 3 virtual CPUs, it all depends on the hypervisor's biases. Still, not a significant difference.
set global innodb_spin_wait_delay=0;
11,793 OPM
I was seeing huge spin lock wait counts with show engine innodb status. So this setting seemed to bump up performance just a nudge. I will point out that top didn't show CPU getting pegged. There was no definitive bottleneck.
set global innodb_rand_read_ahead=ON;
11,453 OPM
HammerDB uses a random seed for its operations, so for a long-shot I turned this on. The setting is there because the developers believe some workloads will benefit, but in this case I would say it's a wash.
set global innodb_io_capacity=1000;
10,916 OPM
I turned off the innodb random read ahead and then was really grasping for straws. This setting on my SSD system made performance nudge down.
many trials and days later...
I got a big jump by setting the paravirtualization interface to KVM on VirtualBox. Setting query cache on, as expected, caused a degradation.
After many more trials, the final settings I landed on are below, giving 15,651 OPM
I changed the number of CPUs on VBox from 3 to 2 which sped things up too.
innodb_file_per_table = 1
performance_schema = OFF
innodb_flush_method = O_DSYNC #faster than O_DIRECT
query_cache_size = 0
query_cache_type = 0
innodb_buffer_pool_instances = 1
innodb_log_file_size = 5142M
innodb_doublewrite = OFF
innodb_spin_wait_delay = 0
innodb_write_io_threads = 6
innodb_flush_neighbors = 0
What stands out in all trials is a high number of context switches at around 6,500/sec. I also see spin rounds per wait of over 8,000 in the last run. i/o wait is low and disk is not saturated. The binary log is off.
Conclusion
This benchmark workload is based on http://www.tpc.org/tpcc/ with the same tables and use-cases. It is not a typical workload of a web application, where you will see select * from products where category=... for listing products for sale, and much use of limit and offset for paging through a list of items. Web apps are really a workload unique unto themselves versus OLAP and OLTP.
This workload is more like 2 applications opening up a waterhose of CRUD transactions into the database.
PostgreSQL is clearly the winner here even after careful benchmarking and applied best practices.
Hopefully someone can assist and point out an obvious tweak. MySQL is thread per connection and PG is process per connection. And this is all virtualized. The keeper of HammerDB pointed out past tests between the two DBs were not significantly different. But I suspect more virtual users were used and on real hardware.
I'm a big fan of both platforms. And I know one workload is not a definitive comparison of overall performance. It could very well be that PG is fit perfectly for this test but degrades on all other workloads in comparison to MySQL.
The next step would be to dive into the internals to find the bottleneck....
I hope this helps somebody.