-
Notifications
You must be signed in to change notification settings - Fork 95
/
README
160 lines (115 loc) · 5.07 KB
/
README
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
Partitioned
===========
Partitioned adds assistance to ActiveRecord for manipulating (reading,
creating, updating) an activerecord model that represents data that
may be in one of many database tables (determined by the Models data).
It also has features that support the creation and deleting of child
tables and partitioning support infrastructure.
It supports Postgres partitioning and has specific features to
overcome basic failings of Postgres's implementation of partitioning.
Basics:
A parent table can be inherited by many child tables that inherit most
of the attributes of the parent table including its columns. child
tables typically (and for the uses of this plugin must) have a unique
check constraint the defines which data should be located in that
specific child table.
Such a constraint allows for the SQL planner to ignore most child
tables and target the (hopefully) one child table that contains the
records interested. This splits data, and meta-data (indexes) which
provides streamlined targeted access to the desired data.
Support for bulk inserts and bulk updates is also provided via
Partitioned::Base.create_many and Partitioned::Base.update_many.
Example
=======
Given the following models:
class Company < ActiveRecord::Base
end
class ByCompanyId < Partitioned::ByForeignKey
self.abstract_class = true
belongs_to :company
def self.partition_foreign_key
return :company_id
end
partitioned do |partition|
partition.index :id, :unique => true
end
end
class Employee < ByCompanyId
end
and the following tables:
-- this is the referenced table
create table companies
(
id serial not null primary key,
created_at timestamp not null default now(),
updated_at timestamp,
name text null
);
-- add some companies
insert into table companies (name) values
('company 1'),('company 2'),('company 2');
-- this is the parent table
create table employees
(
id serial not null primary key,
created_at timestamp not null default now(),
updated_at timestamp,
name text null,
company_id integer not null references companies
);
We now need to create some infrastructure for partitioned tables,
in particular, we create a schema to hold the child partition
tables of employees.
Employee.create_infrastructure
Which creates the employees_partitions schema using the following SQL:
create schema employees_partitions;
NOTE: We also install protections on the employees table so it isn't
used as a data table (this SQL is not presented for simplicity but is
apart of the create_infrastructure call).
To add child tables we use the create_new_partitions_tables method:
company_ids = Company.all.map(&:id)
Employee.create_new_partition_tables(company_ids)
which results in the following SQL:
create table employees_partitions.p1
( CHECK ( company_id = 1 ) ) INHERITS (employees);
create table employees_partitions.p2
( CHECK ( company_id = 2 ) ) INHERITS (employees);
create table employees_partitions.p3
( CHECK ( company_id = 3 ) ) INHERITS (employees);
NOTE: Some other SQL is generated in the above example, specifically
the reference to the companies table needs to be explicitly created
for postgres child tables AND the unique index on 'id' is created.
These are not shown for simplicity.
Now we can do operations involving the child partitions.
Since database records exist in a specific child table dependant on
the field "company_id" we need to have creates that turn into database
inserts of the EMPLOYEES table redirect the record insert into the
specific child table determined by the value of COMPANY_ID
eg:
employee = Employee.create(:name => 'Keith', :company_id => 1)
this would normally produce the following:
INSERT INTO employees ('name', company_id) values ('Keith', 1);
but with Partitioned we see:
INSERT INTO employees_partitions.p1 ('name', company_id) values ('Keith', 1);
reads of such a table need some assistance to find the specific child
table the record exists in.
Since we are partitioned by company_id the programmer needs to provide
that information when fetching data, or the database will need to
search all child table for the specific record we are looking for.
This is no longer valid (well, doesn't perform well):
employee = Employee.find(1)
instead, do one of the following:
employee = Employee.from_partition(1).find(1)
employee = Employee.find(:first,
:conditions => {:name => 'Keith', :company_id => 1})
employee = Employee.find(:first,
:conditions => {:id => 1, :company_id => 1})
an update (employee.save where the record already exists in the
database) will take advantage of knowing which child table the record
exists in so it can do some optimization.
so, the following works as expected:
employee.name = "Not Keith"
employee.save
turns into the following SQL:
update employees_partitions.p1 set name = 'Not Keith' where id = 1;
Copyright 2010-2012 fiksu.com, inc, all rights reserved