pouët.net

sql optimize me beautiful

category: general [glöplog]
mysql> SELECT distinct prods.id as which,count(prods.id) as count,prods.name,prods.type,prods.platform,prods.group1,prods.group2,prods.group3,groups1.name as groupname1, groups2.name as groupname2,groups3.name as groupname3 FROM prods JOIN users LEFT JOIN groups as groups1 ON groups1.id=prods.group1 LEFT JOIN groups as groups2 ON groups2.id=prods.group2 LEFT JOIN groups as groups3 ON groups3.id=prods.group3 WHERE (users.cdc=prods.id OR users.cdc2=prods.id OR users.cdc3=prods.id OR users.cdc4=prods.id OR users.cdc5=prods.id) group by prods.id order by count desc;

*snipped results*
254 rows in set (2 min 44.00 sec)

mysql> EXPLAIN SELECT distinct prods.id as which,count(prods.id) as count,prods.name,prods.type,prods.platform,prods.group1,prods.group2,prods.group3,groups1.name as groupname1, groups2.name as groupname2,groups3.name as groupname3 FROM prods JOIN users LEFT JOIN groups as groups1 ON groups1.id=prods.group1 LEFT JOIN groups as groups2 ON groups2.id=prods.group2 LEFT JOIN groups as groups3 ON groups3.id=prods.group3 WHERE (users.cdc=prods.id OR users.cdc2=prods.id OR users.cdc3=prods.id OR users.cdc4=prods.id OR users.cdc5=prods.id) group by prods.id order by count desc;
+----+-------------+---------+--------+-------------------------+---------+---------+--------------------+-------+---------------------------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+---------+--------+-------------------------+---------+---------+--------------------+-------+---------------------------------+
| 1 | SIMPLE | users | ALL | cdc,cdc2,cdc3,cdc4,cdc5 | | | | 8862 | Using temporary; Using filesort |
| 1 | SIMPLE | prods | ALL | PRIMARY,id,id_2 | | | | 24224 | Using where |
| 1 | SIMPLE | groups1 | eq_ref | PRIMARY | PRIMARY | 4 | pouet.prods.group1 | 1 | |
| 1 | SIMPLE | groups2 | eq_ref | PRIMARY | PRIMARY | 4 | pouet.prods.group2 | 1 | |
| 1 | SIMPLE | groups3 | eq_ref | PRIMARY | PRIMARY | 4 | pouet.prods.group3 | 1 | |
+----+-------------+---------+--------+-------------------------+---------+---------+--------------------+-------+---------------------------------+

i tried adding index's to the cdc's, but it didnt help much..
is the only solution to break it into smaller queries?
added on the 2006-05-06 15:56:26 by psenough psenough
that's one evil sql query you've got there. generally it's a bad idea to have many joins in the same query.
Yeah that just smacks of 'please rethink the way you're using this data.'
do you index group1,2,3 ? (and obviously the oridnary groups.id...)
added on the 2006-05-06 16:24:37 by Hatikvah Hatikvah
rasmus: for fuck sake. any joins exept cross/outer should be *FREE*!
added on the 2006-05-06 16:25:08 by Hatikvah Hatikvah
ps:
youll have to index everything that you use for binding relations, so everything you use the where x=y.. and JOIN x2 ON y2 ..

all theese x,y,x2,y2 has to be indexed to get speed.

thats why your cdc<n> and group<n> really sucks ;p

added on the 2006-05-06 16:26:32 by Hatikvah Hatikvah
questions:
- why aren't the CDC in a separate table ?
- don't you have a cached version of the common tables ( i.e. groups, platforms, parties, compo, ... ) in the form of PHP arrays ? it'd greatly simplify your queries and would slow down the generation of pages by only a few ms.
added on the 2006-05-06 16:43:43 by p01 p01
iam also a bit skeptical on the FROM prods JOIN users ..

shouldnt you use FROM "users" and then inner join "prods" on the cdc* ?

this might be interpretaded the same way by the sql server, its just i wouldnt personally write so ever, so iam not sure what will happen, if youre not 100% sure, try it out aswell :)
added on the 2006-05-06 16:46:27 by Hatikvah Hatikvah
po1, agreed..
it should be something like

CDC
id <index>
id_prod <index>
id_user <index>

but then again, this is NOT a sane DB.. as the query shows ;-)
added on the 2006-05-06 16:48:45 by Hatikvah Hatikvah
about the seperate table for cdc's, yes, its been in the plans for quite a while now.. i just been too lazy to clean the hack into proper way. :/
added on the 2006-05-06 17:19:41 by psenough psenough
p01: we have some caches but not the whole tables no.
added on the 2006-05-06 17:20:40 by psenough psenough
alter table prods add index(group1)
alter table prods add index(group2)
alter table prods add index(group3)
alter table groups add index(id)
+all users cdc
still slow as fuck

"users JOIN prods" or "prods JOIN users" same performance

is there something i need to active to make mysql actually use the indexes when resolving queries or something?
added on the 2006-05-06 17:44:39 by psenough psenough
you should generally use the "relational" part of a relational database more ;)

i'm no expert on db design by any means, but you want something like stefan suggested for other fields too:

PROD_GROUP
id <index> primary key
id_prod <index>
id_group <index>

and then potentially have multiple entries for the same prod, instead of that group1..groupN maintenance nightmare.

similarly for filenames for multiple disks, and so on... don't make tables with 1000 columns for special-case usage, move everything that's potentially needed multiple times to its own table and join on prod id or whatever else comes in handy.
added on the 2006-05-06 17:46:42 by ryg ryg
ps:
Does Mysql BUILD indexes automagically when you create them?

Does the operation *consume time*? (A good way to see if it acctually does anything).

I usually set indexes directly so i really dont know how mysql handles this ;)
added on the 2006-05-06 17:50:24 by Hatikvah Hatikvah
43 seconds each added index..
i would reckon its doing something yes..
added on the 2006-05-06 17:53:04 by psenough psenough
they show on "show create table prods" atleast.. so they're there.. just not sure if they're beeing used O_o since i dont see anything going any faster at all anywhere..
added on the 2006-05-06 17:53:50 by psenough psenough
ryg: yeh, prodgroup table should also be created. like i said, im lazy.. only bothered redoing it for prods download links so far.. :/
added on the 2006-05-06 17:54:38 by psenough psenough
maybe your "old" query is somehow cached...

if you run on test db ( i assume you do ) just try to reboot it.
added on the 2006-05-06 17:57:01 by Hatikvah Hatikvah
i you assume wrong and i cant reboot it (only jeffry has access to restart mysql)
added on the 2006-05-06 17:59:04 by psenough psenough
you could perhaps just try make a new session and see if that works. (log off server, log back in and try run the query again).
added on the 2006-05-06 18:04:28 by Hatikvah Hatikvah
what if you add a dummy variable or change some variable's alias in your SELECT ? will it fool mysql's cache ?
added on the 2006-05-06 18:06:04 by p01 p01
I'd recommend running that EXPLAIN query again and comparing the output to the one you posted at the top. Ideally it should show that it's using the new indices.
added on the 2006-05-06 21:18:08 by bartman bartman
Heres what I know:

- When you do ALTER TABLE <table> ADD INDEX in MySQL, it _does_ create the indexes.
- I remember reading that cache entries are removed if you alter the underlying data. Be warned that you _must_ change a value to achieve this, if you just set the same value, MySQL ignores it.

It is no secret that you are paying the price for bad database design decisions made a long time ago (and probably not by you). I can only speculate as to how you can make this any better.

I believe the worst problem is the 'JOIN users' and the 'WHERE .... OR ... OR ... OR' clauses. Because the foreign keys are used conditionally, it is possible that MySQL does not optimize/use the keys on the cdc fields. You could isolate the problem by adding/removing clauses to your query until it becomes slow.

Another crazy idea is to remove the 'WHERE ... OR ... OR ... OR' clause altogether, do the same functionality with normal JOINS and select the resulting rows you want. _Maybe_ that way MySQL will optimise better. Yes, its crazy. But so is the database schema :-)

In any case, it's very difficult to draw conclusions without having actual data to run the query against. Have you considered taking a backup of the specific tables, remove/blank sensitive information (usernames/passwords) and giving it here for everyone to optimise?
added on the 2006-05-06 21:50:56 by moT moT
maybe try a OPTIMIZE TABLE on involved tables, especially if they have been modified a lot.


btw about optimization beyond SQL i'm curious, is pouet/scene.org using a PHP caching like Alternative PHP Cache, or any other caching ?
added on the 2006-05-07 00:49:46 by Zest Zest
Problem is prolly not to update the SQL DB..

Recreating new tables isnt a big problems, restoring data into them is not really a biggy either..


But i seriously wonder: HOW MANY PHP's and their queries will fail? And when the queries fail, how many lines of PHP will there be left to "restore" ? Me (and many others) have alreay reached the conclussion via irc (#pouet.net,#breakpoint) that pouet *needs* a FRESH rewrite with modern concerns about db design.

However, we also reached the conclussion that NOBODY has the time to do this alone, and that an opensource project would prolly stall the week after its creation.

Ideas?
added on the 2006-05-07 02:00:27 by Hatikvah Hatikvah

login