Issue with column name in PostgresSQL #71

Xiidref1 · 2022-01-24T21:56:39Z

Hello, I'm writing this to report an issue that I've encountered while computing utility score with the framework.
It concerns this function :

code/gdascore/gdaAttack.py

Lines 701 to 769 in e131c71

    
               def getTableCharacteristics(self, tableName=''): 
        
                   """Returns the full contents of the table characteristics 
        
                      Return value is a dict indexed by column name: <br/> 
        
                          { '<colName>': 
        
                              { 
        
                                  'av_rows_per_vals': 3.93149, 
        
                                  'av_uids_per_val': 0.468698, 
        
                                  'column_label': 'continuous', 
        
                                  'column_name': 'dropoff_latitude', 
        
                                  'column_type': 'real', 
        
                                  'max': '898.29382000000000', 
        
                                  'min': '-0.56333297000000', 
        
                                  'num_distinct_vals': 24216, 
        
                                  'num_rows': 95205, 
        
                                  'num_uids': 11350, 
        
                                  'std_rows_per_val': 10.8547, 
        
                                  'std_uids_per_val': 4.09688}, 
        
                              } 
        
                          } 
        
                   """ 
        
                   if len(tableName) == 0: 
        
                       # caller didn't supply a table name, so get it from the 
        
                       # class init 
        
                       tableName = self._p['table'] 
        
                   # Modify table name to the default for the characteristics table 
        
                   tableName += '_char' 
        
                   # Establish connection to database 
        
                   db = getDatabaseInfo(self._p['rawDb']) 
        
                   connStr = str( 
        
                       f"host={db['host']} port={db['port']} dbname={db['dbname']} user={db['user']} password={db['password']}") 
        
                   conn = psycopg2.connect(connStr) 
        
                   cur = conn.cursor() 
        
                   # Set up return dict 
        
                   ret = {} 
        
                   # Query it for column names 
        
                   sql = str(f"""select column_name, data_type  
        
                             from information_schema.columns where 
        
                             table_name='{tableName}'""") 
        
                   try: 
        
                       cur.execute(sql) 
        
                   except psycopg2.Error as e: 
        
                       print(f"Error: getTableCharacteristics() query: '{e}'") 
        
                       self.cleanUp(cleanUpCache=False, doExit=True) 
        
                   cols = cur.fetchall() 
        
                   # Make index for column name (should be 0, but just to be sure) 
        
                   for colNameIndex in range(len(cols)): 
        
                       if cols[colNameIndex][0] == 'column_name': 
        
                           break 
        
                   # Query it for table contents 
        
                   sql = str(f"SELECT * FROM {tableName}") 
        
                   try: 
        
                       cur.execute(sql) 
        
                   except psycopg2.Error as e: 
        
                       print(f"Error: getTableCharacteristics() query: '{e}'") 
        
                       self.cleanUp(cleanUpCache=False, doExit=True) 
        
                   ans = cur.fetchall() 
        
                   for row in ans: 
        
                       colName = row[colNameIndex] 
        
                       ret[colName] = {} 
        
                       for i in range(len(row)): 
        
                           ret[colName][cols[i][0]] = row[i] 
        
                   conn.close() 
        
                   return ret

The issue is that when using the information_schema table to get the column_name column index, you have no warranty that it will output it in the same order as it will be when doing the select * from tablename. Most of the time the orders are the sames, but it's dependent on how PostgreSQL choose to handle its memory internally, so you can sometime swap the columns in the ret variable.

To prevent this, it would be better to make use of the cur.description variable, which contains all the columns from the last request in the same order they are in the result.

The following is an example of a fix using this method :

     def getTableCharacteristics(self, tableName=''): 
         """Returns the full contents of the table characteristics 
  
            Return value is a dict indexed by column name: <br/> 
  
                { '<colName>': 
                    { 
                        'av_rows_per_vals': 3.93149, 
                        'av_uids_per_val': 0.468698, 
                        'column_label': 'continuous', 
                        'column_name': 'dropoff_latitude', 
                        'column_type': 'real', 
                        'max': '898.29382000000000', 
                        'min': '-0.56333297000000', 
                        'num_distinct_vals': 24216, 
                        'num_rows': 95205, 
                        'num_uids': 11350, 
                        'std_rows_per_val': 10.8547, 
                        'std_uids_per_val': 4.09688}, 
                    } 
                } 
  
         """ 
         if len(tableName) == 0: 
             # caller didn't supply a table name, so get it from the 
             # class init 
             tableName = self._p['table'] 
  
         # Modify table name to the default for the characteristics table 
         tableName += '_char' 
  
         # Establish connection to database 
         db = getDatabaseInfo(self._p['rawDb']) 
         connStr = str( 
             f"host={db['host']} port={db['port']} dbname={db['dbname']} user={db['user']} password={db['password']}") 
         conn = psycopg2.connect(connStr) 
         cur = conn.cursor() 
         
         # Query it for table contents 
         sql = str(f"SELECT * FROM {tableName}") 
         try: 
             cur.execute(sql) 
         except psycopg2.Error as e: 
             print(f"Error: getTableCharacteristics() query: '{e}'") 
             self.cleanUp(cleanUpCache=False, doExit=True) 
         ans = cur.fetchall() 
         columns = [desc[0] for desc in cur.description] # Get a list of the columns in the same order they are in ans
         key_value_ans = [dict(zip(columns, values)) for values in ans] # Create the association between column name and values [{col1:val1, col2:val2, ...}, {col1:val1, col2:val2, ...}, ...]
         return {line["column_name"]:line for line in key_value_ans} # Get the same format as we previously had

As a side note, I've found this issue here, but I'm unsure if it's something that can be found in other places in the framework, so maybe there are other places with a similar issue in the code.

(Also feel free to ask for more details if required)

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with column name in PostgresSQL #71

Issue with column name in PostgresSQL #71

Xiidref1 commented Jan 24, 2022 •

edited

Loading

Issue with column name in PostgresSQL #71

Issue with column name in PostgresSQL #71

Comments

Xiidref1 commented Jan 24, 2022 • edited Loading

Xiidref1 commented Jan 24, 2022 •

edited

Loading