Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add indexes to database tables #41

Open
nekromoff opened this issue Apr 9, 2017 · 5 comments
Open

Add indexes to database tables #41

nekromoff opened this issue Apr 9, 2017 · 5 comments

Comments

@nekromoff
Copy link

Specially important for handling large amounts of rows in search_analytics.

Definitely index for date, maybe another combined for date, domain and search_engine for queries such as (inc/code/dataCapture.php):

$query = "SELECT COUNT( DISTINCT date ) AS record, date FROM ".MySQL::DB_TABLE_SEARCH_ANALYTICS." WHERE search_engine = 'google' AND domain LIKE '".$website."'";

etc.

@PromInc
Copy link
Owner

PromInc commented Apr 10, 2017

Thank you for the suggestion @nekromoff. I love the idea of this and recognize it's need.

That said, I'll admit I'm not overly familiar with how index work and how to best configure them.

Would you be willing to commit a pull request with the required changes? Do you know of any good resources to help me better understand indexes, how to set them up, and how they operate?

Thank you.

@nekromoff
Copy link
Author

nekromoff commented Apr 10, 2017

Those times when I used to optimize MySQL performance are gone for me, but few basic rules:

  • Indexes slow down inserting
  • Indexes speed up reading / searching (using where clause)
  • Combined indexes must be used left to right, e.g. index on columns date, domain, search_engine mean it will be used only in query using date, domain, search_engine in this order, e.g. where date=... and domain=... will work, index for query where domain=... and search_engine=... won't be used (missing the utmost left parameter)

Also: prepend any SELECT query with EXPLAIN and it will tell you, if any indexes are used.

More on the topic:
https://stackoverflow.com/questions/3049283/mysql-indexes-what-are-the-best-practices
https://www.slideshare.net/billkarwin/how-to-design-indexes-really

I have currently only tried to add these indexes for search_analytics table, but without testing their impact on performance, yet,:

INDEX	date
INDEX	domain
INDEX	domain, date, search_engine

@nekromoff
Copy link
Author

BTW, yes, I can try to add indexes to the .sql file to create database. But maybe you know better what SELECT queries and in what WHERE order are used.

@PromInc
Copy link
Owner

PromInc commented Apr 10, 2017

This is the primary select query that is used on all of the reports.
Report Select

Variables for the query get populated in the Reports class.

Rather than try an muddle through the code, I'd suggest adding echo $reportQuery; on this line and then run a report. You'll see the query used for that report displayed.

@galbaras
Copy link

galbaras commented Jun 7, 2017

search_analytics seems like the only table worth indexing, but search_engine isn't used in reports, and is therefore not required.

Since all of the queries use domain and a range of dates, I've changed the primary key to "domain, date, id" and converted the id index to unique. This way, that table is physically sorted by domain and date, ready for most queries.

Sure enough, this index is used whenever the query and/or page match is set to "broad", and speed seems to be better than before, regardless of the query type. On 344MB of data, the index size is 60MB, which is roughly a 17% overhead.

Adding indexes on domain, date and query or domain, date and page doesn't help, but when indexing on substrings, e.g. domain(50), date and query(255) OR page(255), they are used for the exact match queries, but not for broad matches. Therefore, the benefits are smaller.

I've also converted the table to InnoDB storage, because I think it's faster and because it doesn't need to be maintained (analysed/repaired).

Suggested SQL for this:

CREATE TABLE `search_analytics` (
`id` int(11) NOT NULL,
`domain` varchar(256) NOT NULL,
`date` date NOT NULL,
`search_engine` varchar(50) NOT NULL,
`search_type` varchar(24) DEFAULT NULL,
`device_type` varchar(24) DEFAULT NULL,
`country` varchar(10) DEFAULT NULL,
`query` varchar(500) DEFAULT NULL,
`page` varchar(500) DEFAULT NULL,
`impressions` int(11) NOT NULL,
`clicks` int(11) NOT NULL,
`ctr` float NOT NULL,
`avg_position` float NOT NULL,
`avg_position_click` float DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `search_analytics`
ADD PRIMARY KEY (`domain`(255),`date`,`id`) USING BTREE,
ADD UNIQUE KEY `id_index` (`id`) USING BTREE;

ALTER TABLE `search_analytics`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants