• Hello Fabrik Community

    Fabrik is now in the hands of the development team that brought you Fabrik for Joomla 4. We have recently transitioned the Fabrik site over to a new server and are busy trying to clean it up. We have upgraded the site to Joomla 4 and are running the latest version of Fabrik 4. We have also upgraded the Xenforo forum software to the latest version. Many of the widgets you might have been used to on the forum are no longer operational, many abandoned by the developers. We hope to bring back some of the important ones as we have time.

    Exciting times to be sure.

    The Fabrik 4.0 Official release is now available. In addition, the Fabrik codebase is now available in a public repository. See the notices about these in the announcements section

    We wish to shout out a very big Thank You to all of you who have made donations. They have really helped. But we can always use more...wink..wink..

    Also a big Thank You to those of you who have been assisting others in the forum. This takes a very big burden off of us as we work on bugs, the website and the future of Fabrik.

Help with performance issue

pastvne

Bruce Decker
Hi All:
I've been dealing with performance issues. I have a largish database of about 6GB consisting of about 600K records. I can't say the query time is bad but as the database has grown, users are now waiting about 30 seconds for a list population with prefilters.

I tried to throw hardware at the issue. My first attempt was to go to a scalable cloud server so that I could easily throw more cores and memory at it. This server also had solid state disk. But in the end, I didn't see a significant performance improvement over my regular old cheapo shared hosting plan. Also, the Cloud Server was plagued by may quirks that caused issues with Fabrik so I had to abandon that.

Next, I setup a inexpensive account at a2 hosting and moved the app over there. I was amazed at how fast it was compared to my former shared hosting provider and much faster than my so called 'scalable cloud server' even when I tuned it up to many processors and memory.

So, now I'm relatively happy but ideally I want to see response times in the 3-6 second range as opposed to 15 seconds that I'm seeing now. The underlaying table has 600K rows and with prefiltering, that goes down to about about 8700 rows.

MySQL advisor is tell me that there are joins that are based on unindexed columns but I've looked that over and I can't see any unindexed columns.

I messed around with Prefilter queries to see if I could improve the prefiltering to be smarter but my attempts at being clever only proved how much I misunderstand the interals of MySQL. It seemed everything I would try would have the opposite impact. The best performance was leaving Fabrik to do as it intended.

I've struck upon an idea to create two copies of my main table (the one with 600k+ rows). We'll call this list 'control' and it's copy will be 'control_fast'

Each night, I'd run a script that would:

1) Drop table control_fast
2) Table copy control to control_fast
3) Purge control_fast to remove all but the most recent records (say last 90 days for example). This would result in a table size of less than 180K rows.

Then, I'd create new menu options so that the user could navigate to 'archive (slower)' or current (fast). The archive menus would go to the large table and users would be advised that queries might take longer as a result. But most users would use the 'control_fast' purged database which, according to my tests, would be running in the 2-3 second range (snappy).

Now, I'd prefer to have one database and do what is necessary to get snappy performance by clever use of indexes or other techniques. But I'm afraid I'm going to need some help.

Is this the type of design/touble shooting I can request...

Thanks,

Bruce Decker
Parker, CO 80134
 
To add to this discussion, I've been reading this MySQL reference and came upon a MySQL feature call partitioning at https://dev.mysql.com/doc/refman/5.5/en/partitioning-overview.html Specifically, this paragraph stood out as it seems to be speaking to a technique that roughly approximates my proposal above to have a 'control' and a 'control_fast' table with records separated by the value of a date type field called trans_date.

Some queries can be greatly optimized in virtue of the fact that data satisfying a given WHERE clause can be stored only on one or more partitions, which automatically excludes any remaining partitions from the search. Because partitions can be altered after a partitioned table has been created, you can reorganize your data to enhance frequent queries that may not have been often used when the partitioning scheme was first set up. This ability to exclude non-matching partitions (and thus any rows they contain) is often referred to as partition pruning. For more information, see Section 18.4, ?Partition Pruning?

My questions will be whether segregation technique I describe above is advisable and then whether MySQL partitioning might be a better approach and whether it can work with Fabrik.
 
I'll keep adding bits of my research to this thread. I found this tutorial on MySQL 5.1 partitions which I though was pretty good. I believe partitioning has been improved quite a bit since 5.1. The thing I'm wresting with is that it seems like they are saying that you can't have primary keys on tables where partitioning is applied. That seems really odd to me so I'm going to read more. If I can get a bit better understanding, my plan is to make a copy site, ALTER the table and add date-based partitions.

It seems that partitions are hard-coded at the database level. So if I want to create a partition for 'recent' records and one for 'archive records, then I'd need to create some sort of a management routine to set a field in the record that would dictate whether the record should fall into 'recent' or 'archive' partitions. That, or I'll need to have some process that watches my tables and 'adds' a partition as we roll to a month, or quarter (or whatever granularity I think is necessary to keep the record set for a reasonable time period small).

Right now, my concern is focused on what I'm reading about PKs and compatibility with partitioning and Fabrik. If anyone has given any thought or research on this topic, I'll be grateful if you share.

Regards,
Bruce Decker
 
Friendly Bump. The more I read about MySQL Paritions the more interested I become. I could really use some input from the developers before I get myself all lathered up for no reason. Thanks/
 
First, the good news: I have now implemented partitions on the underlaying table and with raw queries, they help.

The bad news:
Because I do a complex pre-filter and several joins, it appears that the query may be traversing the entire table (all partitions) before is culls the list of ids down to it's candidates. I've being trying to use the query prefilter feature in list properties to force remaining joins and prefilters to ONLY process against a list of keys from the prefilter query but when I try to add the first prefilter of id IN {$q->id} it appears that only the first id returned from my prefilter query is shown in the list of IDs.

I think I have some fundamental misunderstandings and I need to understand more about:
1) How to properly us prefilter queries
2) How to reference prefilter query results in prefilters to prevent joining and fetching of ids outside of the keylist generated by the prefilter queries
3) The importance of the order of defined JOINS
4) The importance of the order of defined PREFILTERS
5) Getting my head around 'grouping' in prefilters (yes/no)

I would think that if a coding technique in Fabrik can be used that would allow queries to benefit from MySQL partitioning, it would be a huge +++ for many people who run larger databases and need snappy performance.

I've refrained from posting schemas and fabrikdebug info because of the volume of data. However, if one of the developers is willing to help with this, I'll gather and post that information. Alternatively, I'm happy to provide access to the server.

Thanks,
Bruce Decker
 
Continuing to document my experimentation. Progress is slow due to my lack of deep MySQL knowledge. Adding to this thread in case we happen upon success so that the steps are documented.

Some important observations about MySql Partitioning (may be refined as I progress)
1) You can partition a Fabrik table and it appears to do no harm
2) Depending on the design of the prefilter subqueries could cause MySQL to target all partitions even if the result set is stored in subset of partitions
3) You can partition using KEY(column) where column is a varchar column. MySQL applies some sort of hash even to non-integer values and still seems to properly distribute records across partitions.
4) The column that you are using for the partitioning value needs to be a primary key. I added PRIMARY KEY to my target column (cust_id) at the MySQL level.
5) You can't have fulltext indexes when using partitioning (at least as far as I've been able to tell). I remove fulltext indexes for this experiment
6) You need to be on a later release of MySQL (5.5 preferred) to be able to partition based on varchar or char values
7) In MySQL, the EXPLAIN PARITIONS query prefix is invaluable in understanding whether a query is restricting itself to a subset of partitions (as desired).
8) It appears that in MySQL, use of a subquery such as `column_for_partitioning_value` IN (SELECT column FROM othertable) will cause MySQL to target all partitions instead of a subset as desired

The syntax I used for addition partitions to my table:
Code:
ALTER TABLE mytable PARTITION BY KEY(myVarCharColumn) PARTITIONS 101;

* note that 101 is a value I chose but indicates the number of partitions you desire for this table. A row will be written to a partition named p0, p1, p2... and so on to the max number (101 in my example). It doesnt say so in the MySQL manual but for the purposes of hashing, it's generally a good idea to make this number prime to help with even record distribution.

NEXT:
I will document Native query results using hard-coded ids in a IN clause vs. a subquery. I suspect that hard-coded values will use the partitioning and subqueries will not (at least the way I'm doing subqueries). Then, if hard coded IN values work, I'll investigate how to create hard-coded values using Fabrik's EVAL type.
 
Think I'm onto the cuprit. I created a small test using my table with 101 partitions based on the varchar cust_id.
In test one, I performed this query using a subquery that brings back a list of 8 customer_ids:

Code:
SELECT id FROM `bpi_edoc_document_control` WHERE  `bpi_edoc_document_control`.`cust_id` IN (SELECT DISTINCT `customer_id` FROM `bpi_edoc_subscriptions` WHERE `subscriber_id` = '840' AND `status` = 'approved')

When I run a EXPLAIN PARTITIONS against that query, it shows that all partitions and all rows are targeted.

If I run the subquery by itself:
Code:
SELECT DISTINCT `customer_id` FROM `bpi_edoc_subscriptions` WHERE `subscriber_id` = '840' AND `status` = 'approved'

it returns 8 unique customer_ids which are:
Code:
'1010','35325','52408','uas','0000','12345','3778','2708'

Now, if I modify the original query and replace the subquery with a simple list of customer ids like this:

Code:
SELECT *
FROM `bpi_edoc_document_control`
WHERE `bpi_edoc_document_control`.`cust_id`
IN (
'1010', '35325', '52408', 'uas', '0000', '12345', '3778', '2708'
)

Now MySQL properly shows that only a subset partitions (the relevant ones) are targeted by the query. Presumably, this would improve performance of the query. The issue now is how to 'trick' MySQL into fetching only into the partitions for the customer_id's I'm targeting as shown in this last query above.

[edit] While researching #1 below, I ran into the bug report for MySQL which seems to exactly match my discovery: http://bugs.mysql.com/bug.php?id=22765



NEXT:
1) Research whether there is something that can be injected into the query that will cause the Subquery to not invalidate the partition targets. [edit] Yes, there is a known bug in MySQL as documented here: http://bugs.mysql.com/bug.php?id=22765
2) Research a way to have Fabrik return the explicit values through EVAL code and replace the subquery with that result set to see if this will allow the F! query to avoid MySQL's invalidation of partitioning targets due to this proven subquery limitation.

I realize that me bumping new notes into this post is going to reduce it's appeal for the support developers but since I'm on a bit of a roll, I thought I'd just journal things here and perhaps create a wiki from it if the end result is successful.
 
The best I can tell, the MySQL bug which they refer to as 'semijoin' has been fixed in MySQL 6. Unfortunately, I'm at 5.5 so my alternative is to pursue the creation of a explicit list of customer_id values for use within the IN clause. I'll now research how to perform this within a Fabrik prefilter using a 'EVAL' or other mechanism.
 
I've been able to clip the Fabrik query out of the fabrikdebug window and paste it into MySQL to test. I've been able to prove that when I properly construct the query (to not use a subquery but use explicit ids), the MySQL Explain Partitions function indeed verifies that the query only references the partitions where my records live.

Doing a SELECT * of the main partitioned table without it's joins is very snappy but it is also snappy on my table copy that does not have partitions so I don't think we're looking at a partitions issue with this test.

When I add the joins, and the column specifiers I see the performance drop. My result set is about 450 rows of the 200k rows and the record size is not insigificant. When profiling the query from F!, I see that about 80%+ of the time is spent in a mode called Copy to tmp file. I tried moving my joined tables to memory files to prevent use of tmp space but I'm still seeing a large amount of data moving through tmp. I think these is where I'm bottlenecking at present.

I think perhaps my database size is not nearly large enough to warrant the move to partitions and therefore, partitioning is not going to really help at a 200K row count. It might help as the database increases in size but I can't know that without testing.

Right now, I'm having to pull the query apart bit by bit trying to see where the major bottlenecks are adding up. Perhaps 7-10 seconds to return 400 rows of 200K is reasonable and about what I should expect. But I'm really hoping that I can get queries down to 2-3 seconds through some sort of refactoring....

From the lack of commentary on this thread, I suspect that I may be way off the weeds on a topic that few people have experienced.

-BD
 
Apologies, I've been on the road for the last 3 days, attending a family funeral about 750 miles drive away.

I've read through the thread, and can't really make any intelligent comment without getting my hands on the query. If you can fill in your phpMyAdmin details, I'll take a look. And I'll obviously need to know which list I need to look at, so I can grab the query we are issuing with fabrik debug.

Ideally, I'd like to work on a copy of the original data / tables, without partitions. We have clients with tables with many millions of rows, doing complex pre-filtering, and have never had to resort to partitioning to get decent performance.

-- hugh
 
Hi Hugh:
My sincere apologies and condolences.
I believe you are right. The journey learning partitions was interesting but I've also concluded that this is not where my bottleneck lies. I believe it has to do with joins in list view and the use of temp tables by MySQL. I'll work on putting together credentials for my test site and documenting the query.

I really appreciate the help. It is very encouraging to hear of million row databases with performance.

-BD
 
Hi Hugh:
I apologize. Other projects overtook me and I was unable to reply to this until now. I am still extremely interested in taking you up on the kind offer to look at my setup. I suspect that by filling in a few gaps in my knowledge about prefilters that we can lick this think in short order.

So, I have added a new site under my profile. The site name in question begins with smy... There you should find all of the fields populated. I also setup a login for you also in the profile that gives you a configured front-end user and back-end access along with cpanel/phpMyAdmin access in case it is needed. If you want to check to make sure I properly reset all of the database pointers away from the production database, know that the database name should contain the substring 'test'.

This site is a copy and is yours to play with. If it gets messed up, I can set it back up in about an hour so no worries.

The table that exhibits the issue is my_documents. It contains some complex joins and prefilters.

On the front-end, there is a menu-link to this list under My Personal > My Documents > Invoices

You'll see in the list definition that it joins several tables and then applies pre-filter rules. The idea of this system is that users created access requests in a table called subscriptions. This table contains the users id, a document type id, a customer id and a status (approved, declined, pending). The intent is that a normal user should only be able to view documents in the main documents database based on the approved subscriptions for that user and the rule is basically: Only documents that are of a document type and for a customer ID that this joomla user has a valid approved subscription.

The issue is that with 700,000 documents currently in the database, it is taking upwards of 30 seconds to return a prefiltered list and this number is rising rapidly along with customer complains.

I know that I'm missing a vital trick and that with a simple prefilter trick (like properly understanding and using $q->) it can be fixed easily (or so I hope).

I would be so very grateful if you or someone in support could look over my creation and help me figure out how to unlock the magic of Fabrik to solve this performance issue.

I thank you immensely!

Regards,

Bruce Decker
Colorado
 
Update: Another thought came to me and I think I've been going about this filtering backwards and that's probably why the query is taking so long. I had been trying to start from the documents database then join by subscriptions table. But now I'm thinking that's backwards. When a user need to view only their 'approved and subscribed' documents, I should be showing a view that starts with the subscriptions table and joining the documents table from it. Then, I can filter more efficiently and only join the document records that meet the join/prefilter criteria. I've created a new list called subscriptions_my_docs. I've modeled the joins and prefilters this way and so far, unless I'm being fooled my cached record sets from the database, the performance seems to be much better (4 second vs. 30 seconds). I'll do some more testing in the coming days but I think I may be only to the issue and if so, it's not Fabrik... it's me.
 
Unfortunately, the performance over a number of tests has not shown improvement. I'm at a loss and could really use another set of eyes on this issue. I've placed all of my contact/site information in profile. Any assistance greatly appreciated.

Bruce Decker
Colorado
 
At the risk of making this thread look very tired... I'll keep documenting my thoughts and experiments. At this point, I've tried:
1) Checking my indexes
2) Try MySQL partitions to see if that helps
3) Re-organize my list so that rather than joining my subscriptions table from my documents table, I tried joining my documents table from my subscriptions table. The same performance either way
4) Pre-query variants using {$q->} in my prefilters

So far, nothing.

Out of 700K records, prefilters narrow it down to between 3K- and 5K records. I think the fact that I have 4 joins on this table is contributing to the performance issue and I have some questions:

My current ideas:

Concatenated element for new index
I have two elements in my documents table docType and customerId. My subscriptions table contains information that controls the user's access to certain docTypes for certain customerIds. Right now, I think my prefilter has to produce a joined result set that is huge because I want to apply prefilter rules so that the user can see only documents for a certain docType and customerId combinations. So, what if I created an element in my documents table that concatenated my docType value with my customerId value then allowed it to be indexed. In theory then, I could construct a pre-query from my subscriptions table that produced a result of the same docType . customerId and simply the pre-filter to simply return all records with this concatenated value. This way, we don't have to perform so many joins and this should reduce the result set?

Order of Joins and Prefilters
I don't have a good handle on whether the order of prefilters and order of joins can have any impact on performance. I have depended on the database to optimize the query placing less expensive WHEREs first in the processing queue. Is this a bad assumption?

Pre Query
I've been able to pick out some basic understanding of prequery and there might be some solution there but so far my experiments with the table that has all of the joins trying to use prequery hasn't really helped as far as I can tell. I can't tell if the prequery can limit the main pre-filters to process only a subset of records in the main table. I tried experimenting with prequeries that reduce the target record set in the main table then allow the normal joins and prefilters to perform the access control. I would think that if my prequery could take the target rowset from 700K rows down to 3500 rows, that I'd see a noticeable increase in performance. But perhaps I just don't understand how to properly utilize prequery. As an experiment, I did something like (SELECT id FROM documents WHERE custId IN (SELECT cust_id FROM subscriptions WHERE subscriberId = {$my->id}). So then I'd have a rowset of only ids relevant to this user... but then how do I limit the main query to only target these rows? It seems that even if I do a prefilter of id IN {$q->id} that we're back to re-processing the whole table with its joins. Even if I forget about prequery and move that into a normal filter rule, it still seems like I'm creating huge result sets with all of the join overhead. I think there are some key concepts that are evading me and this is where I think I could really use some help.
 
I responded on your other thread.

I've been looking at the 'EXPLAIN' output from phpMyAdmin, I think some of the issues are to do with the #__users table, and how the PK usage is affected by the ORDER BY.

-- hugh
 
That's a good thought. One of the ideas I proposed was to create a F! list that removed the join to J!_users. It's not necessary for the query list display and is mainly used when feeding elements where it's handy to have the user info displayed. What if I removed the users join from a 'copy' of the table then used it for the initial query then custom-linked the view/edit forms as needed? Before I headed there, I wanted to get some validation that I might have heavy joins. The only 'ordering' I should need on this list view is by descending date order.

-BD
 
We are in need of some funding.
More details.

Thank you.

Members online

No members online now.
Back
Top