Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cherry pick "ORCA support ext stats, Fix EPQ..." #855

Merged
merged 35 commits into from
Jan 16, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
486e641
Add a GUC to discard redistribute hashjoin for Orca (#14642)
gpopt Dec 21, 2022
639825b
Fix locking clause on foreign table missing when ORCA is enabled
Jan 4, 2023
2713e9a
Update scripts to use python3
Shakarada Dec 21, 2022
d79d82d
LLVM bitcode generation for gpopt/gporca/gpcloud disabled
Tao-T Jan 10, 2023
cb94d1a
[ORCA] Use extended stats to estimate correlated cardinality (#14674)
dgkimura Jan 12, 2023
db93b96
Enable direct dispatch if distribution column is of 'varchar' type
Nov 23, 2022
4543b3c
Remove unused num_leaf_partitions attribute in Orca (#14777)
chrishajas Jan 17, 2023
97714cd
Fix EPQ for DML operations (#14304)
HustonMmmavr Jan 26, 2023
4c26774
FIXME: Rewrite IndexOpProperties API
Jan 20, 2023
2c631e7
[ORCA] Add support for multi-variant n-distinct correlated stats (#14…
dgkimura Jan 26, 2023
06017c7
Add GUC optimizer_enable_foreign_table (#14844)
chrishajas Jan 31, 2023
648ccb2
Fall back in Orca for queries with RTE of type TableFunc (#14898)
chrishajas Feb 6, 2023
1b192dd
Remove unused Orca partitioning code in Orca
chrishajas Feb 3, 2023
26bc992
Resolve Orca FIXME for FValidPartEnforcers
chrishajas Feb 3, 2023
019a6aa
Remove unused mdpart_constraint from indexes in Orca
chrishajas Feb 6, 2023
4b314cb
Address CTE translation FIXMEs
chrishajas Feb 6, 2023
353ab51
Address combining partition selectors stats FIXME in Orca
chrishajas Feb 7, 2023
4deb6ee
Address Orca FIXME: remove test
chrishajas Feb 7, 2023
04b8167
Address FIXME for Orca constraint assertion
chrishajas Feb 7, 2023
a619fe6
Fix bug that nestloop join fails to materialize the inner child for s…
gpopt Feb 9, 2023
fc8c27b
FIXME remove gp_enable_sort_distinct and noduplicates optimizing (#14…
yaowangm Sep 20, 2022
fb0246a
Fix unused variable compile warnings
chrishajas Feb 9, 2023
4e56c1a
Fix bug that Orca fails to decorrelate subqueries order by outer refe…
gpopt Feb 14, 2023
09f7f42
[ORCA] Allow push down of filter with BETWEEN predicate (#14872)
dgkimura Feb 14, 2023
f9c08c6
Support Direct Dispatch for a randomly distributed table, when filter…
Jan 3, 2023
2628c1d
Orca FIXME: Improve stats calculation during static partition selecti…
chrishajas Feb 16, 2023
9439e3c
Fix incorrect result from hash join on char column
Jan 31, 2023
29703ed
Orca FIXME: skip dropped columns
chrishajas Feb 8, 2023
d627319
Remove Orca FIXME in PrunePartitions
chrishajas Feb 8, 2023
6f4f1ac
Remove renaming orca fixme
chrishajas Feb 13, 2023
5adc3b8
Address a couple of Orca fixmes
chrishajas Feb 23, 2023
13bc6b0
Orca FIXME: Remove references to RelIsPartitioned
chrishajas Feb 25, 2023
a1d6deb
Fix core dump generated by "ORCA support ext stats, Fix EPQ..."
jiaqizho Jan 8, 2025
f17cfce
ORCA ignores empty or unsupported ext stats
jiaqizho Jan 10, 2025
2586ea8
Fix icw test cases generted from "ORCA support ext stats, Fix EPQ..."
jiaqizho Jan 9, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 4 additions & 4 deletions concourse/scripts/builds/GpBuild.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def _run_gpdb_command(self, command, stdout=None, stderr=None, source_env_cmd=''
cmd = source_env_cmd
runcmd = "runuser gpadmin -c \"{0} && {1} \"".format(cmd, command)
if print_command:
print "Executing {}".format(runcmd)
print("Executing {}".format(runcmd))
return subprocess.call([runcmd], shell=True, stdout=stdout, stderr=stderr)

def run_explain_test_suite(self, dbexists):
Expand All @@ -74,7 +74,7 @@ def run_explain_test_suite(self, dbexists):
status = self._run_gpdb_command("psql -q -f stats.sql", stdout=f)
if status:
with open("load_stats.txt", "r") as f:
print f.read()
print(f.read())
fail_on_error(status)

# set gucs if any were specified
Expand All @@ -95,10 +95,10 @@ def run_explain_test_suite(self, dbexists):
if fsql.endswith('.sql') and fsql not in ['stats.sql', 'schema.sql']:
output_fname = 'out/{}'.format(fsql.replace('.sql', '.out'))
with open(output_fname, 'w') as fout:
print "Running query: " + fsql
print("Running query: " + fsql)
current_status = self._run_gpdb_command("psql -a -f sql/{}".format(fsql), stdout=fout, stderr=fout, source_env_cmd=source_env_cmd, print_command=False)
if current_status != 0:
print "ERROR: {0}".format(current_status)
print("ERROR: {0}".format(current_status))
status = status if status != 0 else current_status

return status
Expand Down
16 changes: 8 additions & 8 deletions concourse/scripts/perfsummary.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3
#
# Extracts summary info from a log file that compiles and executes test suite
# queries. Creates a CSV (comma-separated values) summary.
Expand Down Expand Up @@ -209,20 +209,20 @@ def comparePlans(self, base, queryId):
# print header for CSV file
def printHeader(self, numFiles):
if (numFiles == 1):
print 'Query id, planning time, execution time, comment'
print('Query id, planning time, execution time, comment')
else:
print 'Query id, base planning time, planning time, base execution time, execution time, plan changes, base comment, comment'
print('Query id, base planning time, planning time, base execution time, execution time, plan changes, base comment, comment')

# print result for all recorded queries in CSV format for a single log file
def printme(self):
for q in self.query_id_list:
print "%s, %s, %s, %s" % (q, self.query_explain_time_map[q], self.query_exe_time_map[q], self.query_comment_map[q])
print("%s, %s, %s, %s" % (q, self.query_explain_time_map[q], self.query_exe_time_map[q], self.query_comment_map[q]))

# print a CSV file with a comparison between a base file and a test file
def printComparison(self, base, diffDir, diffThreshold, diffLevel):
for q in self.query_id_list:
planDiffs = self.comparePlans(base, q)
print "%s, %s, %s, %s, %s, %s, %s, %s" % (q, base.query_explain_time_map[q], self.query_explain_time_map[q], base.query_exe_time_map[q], self.query_exe_time_map[q], planDiffText[planDiffs], base.query_comment_map[q], self.query_comment_map[q])
print("%s, %s, %s, %s, %s, %s, %s, %s" % (q, base.query_explain_time_map[q], self.query_explain_time_map[q], base.query_exe_time_map[q], self.query_exe_time_map[q], planDiffText[planDiffs], base.query_comment_map[q], self.query_comment_map[q]))
if int(diffLevel) <= int(planDiffs):
baseTime = float(base.query_exe_time_map[q])
testTime = float(self.query_exe_time_map[q])
Expand Down Expand Up @@ -267,7 +267,7 @@ def main():
os.mkdir(diffDir + "/base")
os.mkdir(diffDir + "/test")
except:
print "Unable to create diff directory %s" % diffDir
print("Unable to create diff directory %s" % diffDir)
exit(1)
if args.diffThreshold is not None:
diffThreshold = args.diffThreshold
Expand All @@ -278,11 +278,11 @@ def main():
diffLevel = args.diffLevel
else:
if (args.diffThreshold is not None or args.diffLevel is not None):
print "Please specify the --diffDir option with a directory name to request diff files\n"
print("Please specify the --diffDir option with a directory name to request diff files\n")
exit(1)

if inputfile is None:
print "Expected the name of a log file with test suite queries\n"
print("Expected the name of a log file with test suite queries\n")
exit(1)

if basefile is not None:
Expand Down
4 changes: 2 additions & 2 deletions concourse/scripts/regression_tests_gpcloud.bash
Original file line number Diff line number Diff line change
Expand Up @@ -37,11 +37,11 @@ function setup_gpadmin_user() {
}

function make_cluster() {
PYTHONPATH=${SCRIPT_DIR}:${PYTHONPATH} python2 -c "from builds.GpBuild import GpBuild; GpBuild(\"planner\").create_demo_cluster(install_dir='$GPDB_INSTALL_DIR')"
PYTHONPATH=${SCRIPT_DIR}:${PYTHONPATH} python3 -c "from builds.GpBuild import GpBuild; GpBuild(\"planner\").create_demo_cluster(install_dir='$GPDB_INSTALL_DIR')"
}

function configure_with_planner() {
PYTHONPATH=${SCRIPT_DIR}:${PYTHONPATH} python2 -c "from builds.GpBuild import GpBuild; GpBuild(\"planner\").configure()"
PYTHONPATH=${SCRIPT_DIR}:${PYTHONPATH} python3 -c "from builds.GpBuild import GpBuild; GpBuild(\"planner\").configure()"
}

function copy_gpdb_bits_to_gphome() {
Expand Down
2 changes: 1 addition & 1 deletion concourse/scripts/unit_tests_gporca.bash
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ function build_xerces
OUTPUT_DIR="gpdb_src/gpAux/ext/${BLD_ARCH}"
mkdir -p xerces_patch/concourse
cp -r gpdb_src/src/backend/gporca/concourse/xerces-c xerces_patch/concourse
/usr/bin/python xerces_patch/concourse/xerces-c/build_xerces.py --output_dir=${OUTPUT_DIR}
/usr/bin/python3 xerces_patch/concourse/xerces-c/build_xerces.py --output_dir=${OUTPUT_DIR}
rm -rf build
}

Expand Down
2 changes: 2 additions & 0 deletions contrib/file_fdw/output/file_fdw_optimizer.source
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,8 @@ DETAIL: Feature not supported: Deletes with foreign tables
ERROR: cannot delete from foreign table "agg_csv"
-- but this should be allowed
SELECT * FROM agg_csv FOR UPDATE;
INFO: GPORCA failed to produce a plan, falling back to planner
DETAIL: Feature not supported: Locking clause on foreign table
a | b
-----+---------
100 | 99.097
Expand Down
6 changes: 3 additions & 3 deletions gpMgmt/bin/gppylib/test/unit/test_unit_gpcheckcat.py
Original file line number Diff line number Diff line change
Expand Up @@ -320,7 +320,7 @@ def test_skip_one_test(self, mock_ver, mock_run, mock1, mock2):
testargs = ['gpcheckcat', '-port 1', '-s test2']
with patch.object(sys, 'argv', testargs):
self.subject.main()
mock_run.assert_has_calls(call(['test1', 'test3']))
mock_run.assert_has_calls([call(['test1', 'test3'])])

@patch('gpcheckcat.GPCatalog', return_value=Mock())
@patch('sys.exit')
Expand All @@ -336,7 +336,7 @@ def test_skip_multiple_test(self, mock_ver, mock_run, mock1, mock2):
testargs = ['gpcheckcat', '-port 1', '-s', "test1, test2"]
with patch.object(sys, 'argv', testargs):
self.subject.main()
mock_run.assert_has_calls(call(['test3']))
mock_run.assert_has_calls([call(['test3'])])

@patch('gpcheckcat.GPCatalog', return_value=Mock())
@patch('sys.exit')
Expand All @@ -352,7 +352,7 @@ def test_skip_test_warning(self, mock_ver, mock_run, mock1, mock2):
testargs = ['gpcheckcat', '-port 1', '-s', "test_invalid, test2"]
with patch.object(sys, 'argv', testargs):
self.subject.main()
mock_run.assert_has_calls(call(['test1', 'test3']))
mock_run.assert_has_calls([call(['test1', 'test3'])])
expected_message = "'test_invalid' is not a valid test"
log_messages = [args[0][1] for args in self.subject.logger.log.call_args_list]
self.assertIn(expected_message, log_messages)
Expand Down
3 changes: 2 additions & 1 deletion gpMgmt/requirements-dev.txt
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
gsutil<=4.47
gsutil
behave~=1.2.6
coverage~=4.5
more-itertools<8.1
Expand Down Expand Up @@ -37,3 +37,4 @@ pathlib2<=2.3.5
cffi<=1.13.2
scandir<=1.10.0
pycparser<=2.19
mock<=5.0.0
3 changes: 3 additions & 0 deletions gpcontrib/gpcloud/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,9 @@ endif
MODULE_big = gpcloud
OBJS = src/gpcloud.o lib/http_parser.o lib/ini.o $(addprefix src/,$(COMMON_OBJS))

# Avoid building LLVM Bitcode for gpcloud module.
with_llvm = no

# Launch
ifdef USE_PGXS
PGXS := $(shell pg_config --pgxs)
Expand Down
30 changes: 15 additions & 15 deletions gpcontrib/gpcloud/bin/dummyHTTPServer.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python
#!/usr/bin/env python3

"""
Very simple HTTP server in python.
Expand All @@ -12,8 +12,8 @@
curl -d "foo=bar&bin=baz" http://localhost
"""

from BaseHTTPServer import BaseHTTPRequestHandler, HTTPServer
import SocketServer
from http.server import BaseHTTPRequestHandler, HTTPServer
import socketserver
import getopt
import sys
import os
Expand Down Expand Up @@ -46,26 +46,26 @@ def do_HEAD(self):

def do_PUT(self):
# Just bounce the request back
print "----- SOMETHING WAS PUT ------"
print self.headers
print("----- SOMETHING WAS PUT ------")
print(self.headers)
length = int(self.headers['Content-Length'])
content = self.rfile.read(length)
self._set_headers(length)
self.wfile.write(content)

def do_POST(self):
# Just bounce the request back
print "----- SOMETHING WAS POST ------"
print self.headers
print("----- SOMETHING WAS POST ------")
print(self.headers)
length = int(self.headers['Content-Length'])
content = self.rfile.read(length)
self._set_headers(length)
self.wfile.write(content)

def do_DELETE(self):
# Just bounce the request back
print "----- SOMETHING WAS DELETED ------"
print self.headers
print("----- SOMETHING WAS DELETED ------")
print(self.headers)
length = int(self.headers['Content-Length'])
# content = self.rfile.read(length)
self._set_headers(length)
Expand All @@ -85,8 +85,8 @@ def _getcontent(self):
with open(filename, 'r') as f:
content = f.read()
except Exception:
print "Can not open file:%s" % filename
return content
print("Can not open file:%s" % filename)
return content.encode('utf8')

def do_GET(self):
try:
Expand All @@ -97,7 +97,7 @@ def do_GET(self):
else:
self._set_headers(0)
except KeyError:
print "Header missing S3_Param_Req"
print("Header missing S3_Param_Req")
self._set_headers(0)

def run(server_class=HTTPServer, handler_class=S, port=8553, https=False):
Expand All @@ -117,7 +117,7 @@ def run(server_class=HTTPServer, handler_class=S, port=8553, https=False):
httpd.socket = ssl.wrap_socket (httpd.socket,
keyfile=keyfile,
certfile=certfile, server_side=True)
print 'Starting http server...'
print('Starting http server...')
httpd.serve_forever()

if __name__ == "__main__":
Expand All @@ -128,11 +128,11 @@ def run(server_class=HTTPServer, handler_class=S, port=8553, https=False):
try:
opts, args = getopt.getopt(argv[1:],"hsp:f:t:",["--port=","--filename=", "--type="])
except getopt.GetoptError:
print help_msg
print(help_msg)
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print help_msg
print(help_msg)
sys.exit(0)
elif opt == '-s':
use_ssl = True
Expand Down
2 changes: 1 addition & 1 deletion src/Makefile.global.in
Original file line number Diff line number Diff line change
Expand Up @@ -203,7 +203,7 @@ with_krb_srvnam = @with_krb_srvnam@
with_ldap = @with_ldap@
with_libxml = @with_libxml@
with_libxslt = @with_libxslt@
with_llvm = @with_llvm@
with_llvm ?= @with_llvm@
with_system_tzdata = @with_system_tzdata@
with_uuid = @with_uuid@
with_zlib = @with_zlib@
Expand Down
8 changes: 7 additions & 1 deletion src/backend/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -258,8 +258,14 @@ endif
ifeq ($(with_llvm), yes)
install-bin: install-postgres-bitcode

# GPDB: Bitcode generation in certain subdir can be avoid when `with_llvm = no`
# set there. But the install step is not aware of the setting and would fail if
# we don't tell it the absence of *.bc files under those directory manually.
# As we set ORCA to generate no bitcodes, ORCA directories is excluded here.
bitcode_ignored_subdirs = $(top_builddir)/src/timezone gporca gpopt

install-postgres-bitcode: $(OBJS) all
$(call install_llvm_module,postgres,$(call expand_subsys, $(filter-out $(top_builddir)/src/timezone/objfiles.txt, $(SUBDIROBJS))))
$(call install_llvm_module,postgres,$(call expand_subsys, $(filter-out $(bitcode_ignored_subdirs:%=%/objfiles.txt), $(SUBDIROBJS))))
endif

install-bin: postgres $(POSTGRES_IMP) installdirs
Expand Down
5 changes: 1 addition & 4 deletions src/backend/commands/explain.c
Original file line number Diff line number Diff line change
Expand Up @@ -3012,10 +3012,7 @@ show_sort_keys(SortState *sortstate, List *ancestors, ExplainState *es)
Sort *plan = (Sort *) sortstate->ss.ps.plan;
const char *SortKeystr;

if (sortstate->noduplicates)
SortKeystr = "Sort Key (Distinct)";
else
SortKeystr = "Sort Key";
SortKeystr = "Sort Key";

show_sort_group_keys((PlanState *) sortstate, SortKeystr,
plan->numCols, 0, plan->sortColIdx,
Expand Down
5 changes: 0 additions & 5 deletions src/backend/executor/nodeSort.c
Original file line number Diff line number Diff line change
Expand Up @@ -248,11 +248,6 @@ ExecInitSort(Sort *node, EState *estate, int eflags)
*/
ExecAssignExprContext(estate, &sortstate->ss.ps);

/* CDB */ /* evaluate a limit as part of the sort */
{
sortstate->noduplicates = node->noduplicates;
}

/*
* Miscellaneous initialization
*
Expand Down
6 changes: 6 additions & 0 deletions src/backend/gpopt/config/CConfigParamMapping.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,12 @@ CConfigParamMapping::SConfigMappingElem CConfigParamMapping::m_elements[] = {
GPOS_WSZ_LIT(
"Enable plan alternatives where NLJ's outer child is replicated")},

{EopttraceDiscardRedistributeHashJoin,
&optimizer_discard_redistribute_hashjoin,
false, // m_negate_param
GPOS_WSZ_LIT(
"Discard plan alternatives where hash join has a redistribute motion child")},

{EopttraceMotionHazardHandling, &optimizer_enable_streaming_material,
false, // m_fNegate
GPOS_WSZ_LIT(
Expand Down
Loading
Loading