trovares_connector.ODBCConnector#

class trovares_connector.ODBCConnector(xgt_server: Connection, odbc_driver: SQLODBCDriver | MongoODBCDriver | OracleODBCDriver | SAPODBCDriver | SnowflakeODBCDriver)[source]#
__init__(xgt_server: Connection, odbc_driver: SQLODBCDriver | MongoODBCDriver | OracleODBCDriver | SAPODBCDriver | SnowflakeODBCDriver)[source]#

Initializes the connector class.

Parameters:
  • xgt_server (xgt.Connection) – Connection object to xGT.

  • odbc_driver (SQLODBCDriver) – Connection object to ODBC.

Methods

__init__(xgt_server, odbc_driver)

Initializes the connector class.

copy_data_to_xgt(xgt_schemas[, batch_size, ...])

Copies data from the ODBC application to the requested table, vertex and/or edge frames in Trovares xGT.

create_xgt_schemas(xgt_schemas[, append, ...])

Creates table, vertex and/or edge frames in Trovares xGT.

get_xgt_schemas([tables, max_text_size, ...])

Retrieve a dictionary containing the schema information for all of the tables requested and their mappings.

transfer_query_to_xgt([query, mapping, ...])

Copies data from the ODBC application to Trovares xGT.

transfer_to_odbc([vertices, edges, tables, ...])

Copies data from Trovares xGT to an ODBC application.

transfer_to_xgt([tables, append, force, ...])

Copies data from the ODBC application to Trovares xGT.

__init__(xgt_server: Connection, odbc_driver: SQLODBCDriver | MongoODBCDriver | OracleODBCDriver | SAPODBCDriver | SnowflakeODBCDriver)[source]#

Initializes the connector class.

Parameters:
  • xgt_server (xgt.Connection) – Connection object to xGT.

  • odbc_driver (SQLODBCDriver) – Connection object to ODBC.

copy_data_to_xgt(xgt_schemas: Mapping, batch_size: int = 10000, transaction_size: int = 0, max_text_size: int = None, max_binary_size: int = None, column_mapping: Mapping[str, str | int] | None = None, suppress_errors: bool = False, row_filter: str = None, on_duplicate_keys: str = 'error') None[source]#

Copies data from the ODBC application to the requested table, vertex and/or edge frames in Trovares xGT.

This function copies data from the ODBC application to xGT for all of the tables, vertices and edges, one frame at a time.

Parameters:
  • xgt_schemas (dict) – Dictionary containing schema information for table, vertex and edge frames to create in xGT. This dictionary can be the value returned from the get_xgt_schemas() method.

  • batch_size (int) – Number of rows to transfer at once. Defaults to 10000.

  • transaction_size (int) – Number of rows to treat as a single transaction to xGT. Defaults to 0. Should be a multiple of the batch size and greater than the batch size. 0 means treat all rows as a single transaction.

  • max_text_size (int) – The upper limit on the buffers used when transferring ODBC variable-length text fields. When using VARCHAR from a database, if a limit isn’t set for the length of the strings like VARCHAR(255), the schema size of each string entry could be whatever the max size of database uses for each entry when reporting to ODBC. For instance, each string in Snowflake has an upper limit of 16MB length. This means when allocating the buffers to store the ODBC batch_size would be 16MB multiplied by the batch_size. This parameter will impose a limit on each string length when transferring. Default is determined by the database.

  • max_binary_size (int) – The upper limit on the buffers used when transferring ODBC variable-length binary fields. When using VARBINARY from a database, if a limit isn’t set for the length of binary data like VARBINARY(255), the schema size of each binary entry could be whatever the max size of database uses for each entry when reporting to ODBC. This parameter will impose a limit on each binary field length when transferring. Default is determined by the database.

  • column_mapping (dictionary) – Maps the frame column names to SQL columns for the ingest. The key of each element is a frame column name. The value is either the name of the SQL column (from the table) or the table column index.

  • suppress_errors (bool) – If true, will continue to insert data if an ingest error is encountered, placing the first 1000 errors in the job history. If false, stops on first error and raises. Defaults to False.

  • row_filter (str) – TQL fragment used to filter, modify and parameterize the raw data from the input to produce the row data fed to the frame.

  • on_duplicate_keys ({‘error’, ‘skip’, 'skip_same'}, default 'error') – Specifies what to do upon encountering a duplicate vertex key. Only works for vertex frames. Is ignored for table and edge frames. Allowed values are : - ‘error’, raise an Exception when a duplicate key is found. - ‘skip’, skip duplicate keys without raising. - ‘skip_same’, skip duplicate keys if the row is exactly the same without raising.

Return type:

None

create_xgt_schemas(xgt_schemas: Mapping, append: bool = False, force: bool = False, easy_edges: bool = False) None[source]#

Creates table, vertex and/or edge frames in Trovares xGT.

This function first infers the schemas for all of the needed frames in xGT to store the requested data. Then those frames are created in xGT.

Parameters:
  • xgt_schemas (dict) – Dictionary containing schema information for vertex and edge frames to create in xGT. This dictionary can be the value returned from the get_xgt_schemas() method.

  • append (boolean) – Set to true when the xGT frames are already created and holding data that should be appended to. Set to false when the xGT frames are to be newly created (removing any existing frames with the same names prior to creation).

  • force (boolean) – Set to true to force xGT to drop edges when a vertex frame has dependencies.

  • easy_edges (boolean) – Set to true to create a basic vertex class with key column for any edges without corresponding vertex frames.

Return type:

None

get_xgt_schemas(tables: Iterable[str] = None, max_text_size: int = None, max_binary_size: int = None) dict[source]#

Retrieve a dictionary containing the schema information for all of the tables requested and their mappings.

Parameters:
  • tables (iterable) – List of requested tables.

  • max_text_size (int) – The upper limit on the buffers used when transferring ODBC variable-length text fields. When using VARCHAR from a database, if a limit isn’t set for the length of the strings like VARCHAR(255), the schema size of each string entry could be whatever the max size of database uses for each entry when reporting to ODBC. For instance, each string in Snowflake has an upper limit of 16MB length. This means when allocating the buffers to store the ODBC batch_size would be 16MB multiplied by the batch_size. This parameter will impose a limit on each string length when transferring. Default is determined by the database.

  • max_binary_size (int) – The upper limit on the buffers used when transferring ODBC variable-length binary fields. When using VARBINARY from a database, if a limit isn’t set for the length of binary data like VARBINARY(255), the schema size of each binary entry could be whatever the max size of database uses for each entry when reporting to ODBC. This parameter will impose a limit on each binary field length when transferring. Default is determined by the database.

Returns:

Dictionary containing the schema information of the tables, vertices, and edges requested.

Return type:

dict

transfer_query_to_xgt(query: str = None, mapping: Mapping | tuple = None, append: bool = False, force: bool = False, easy_edges: bool = False, batch_size: int = 10000, transaction_size: int = 0, max_text_size: int = None, max_binary_size: int = None, column_mapping: Mapping[str, str | int] | None = None, suppress_errors: bool = False, row_filter: str = None, on_duplicate_keys: str = 'error') None[source]#

Copies data from the ODBC application to Trovares xGT.

This function first infers the schemas for the query. Then it maps to the type specificed in mapping. Finally, the data is copied from the ODBC application to xGT.

Parameters:
  • query (string) – SQL query to execute and insert into xGT. Syntax depends on the SQL syntax of the database you are connecting to.

  • mapping – May be a tuple specify a mapping to xGT types. See documentation.

  • append (boolean) – Set to true when the xGT frames are already created and holding data that should be appended to. Set to false when the xGT frames are to be newly created (removing any existing frames with the same names prior to creation).

  • force (boolean) – Set to true to force xGT to drop edges when a vertex frame has dependencies.

  • easy_edges (boolean) – Set to true to create a basic vertex class with key column for any edges without corresponding vertex frames.

  • batch_size (int) – Number of rows to transfer at once. Defaults to 10000.

  • transaction_size (int) – Number of rows to treat as a single transaction to xGT. Defaults to 0. Should be a multiple of the batch size and greater than the batch size. 0 means treat all rows as a single transaction.

  • max_text_size (int) – The upper limit on the buffers used when transferring ODBC variable-length text fields. When using VARCHAR from a database, if a limit isn’t set for the length of the strings like VARCHAR(255), the schema size of each string entry could be whatever the max size of database uses for each entry when reporting to ODBC. For instance, each string in Snowflake has an upper limit of 16MB length. This means when allocating the buffers to store the ODBC batch_size would be 16MB multiplied by the batch_size. This parameter will impose a limit on each string length when transferring. Default is determined by the database.

  • max_binary_size (int) – The upper limit on the buffers used when transferring ODBC variable-length binary fields. When using VARBINARY from a database, if a limit isn’t set for the length of binary data like VARBINARY(255), the schema size of each binary entry could be whatever the max size of database uses for each entry when reporting to ODBC. This parameter will impose a limit on each binary field length when transferring. Default is determined by the database.

  • column_mapping (dictionary) – Maps the frame column names to SQL columns for the ingest. The key of each element is a frame column name. The value is either the name of the SQL column (from the table) or the table column index.

  • suppress_errors (bool) – If true, will continue to insert data if an ingest error is encountered, placing the first 1000 errors in the job history. If false, stops on first error and raises. Defaults to False.

  • row_filter (str) – TQL fragment used to filter, modify and parameterize the raw data from the input to produce the row data fed to the frame.

  • on_duplicate_keys ({‘error’, ‘skip’, 'skip_same'}, default 'error') – Specifies what to do upon encountering a duplicate vertex key. Only works for vertex frames. Is ignored for table and edge frames. Allowed values are : - ‘error’, raise an Exception when a duplicate key is found. - ‘skip’, skip duplicate keys without raising. - ‘skip_same’, skip duplicate keys if the row is exactly the same without raising.

Return type:

None

transfer_to_odbc(vertices: Iterable[str] = None, edges: Iterable[str] = None, tables: Iterable[str] = None, namespace: str = None, batch_size: int = 10000) None[source]#

Copies data from Trovares xGT to an ODBC application.

Parameters:
  • vertices (iterable) – List of requested vertex frame names. May be a tuple specifying: (xgt_frame_name, database_table_name).

  • edges (iterable) – List of requested edge frame names. May be a tuple specifying: (xgt_frame_name, database_table_name).

  • tables (iterable) – List of requested table frame names. May be a tuple specifying: (xgt_frame_name, database_table_name).

  • namespace (str) – Namespace for the selected frames. If none will use the default namespace.

  • batch_size (int) – Number of rows to transfer at once. Defaults to 10000.

Return type:

None

transfer_to_xgt(tables: Iterable = None, append: bool = False, force: bool = False, easy_edges: bool = False, batch_size: int = 10000, transaction_size: int = 0, max_text_size: int = None, max_binary_size: int = None, column_mapping: Mapping[str, str | int] | None = None, suppress_errors: bool = False, row_filter: str = None, on_duplicate_keys: str = 'error') None[source]#

Copies data from the ODBC application to Trovares xGT.

This function first infers the schemas for all of the needed frames in xGT to store the requested data. Then those frames are created in xGT. Finally, all of the tables, vertices, and all of the edges are copied, one frame at a time, from the ODBC application to xGT.

Parameters:
  • tables (Iterable) – List of requested tables names. May be a tuple specify a mapping to xGT types. See documentation.

  • append (boolean) – Set to true when the xGT frames are already created and holding data that should be appended to. Set to false when the xGT frames are to be newly created (removing any existing frames with the same names prior to creation).

  • force (boolean) – Set to true to force xGT to drop edges when a vertex frame has dependencies.

  • easy_edges (boolean) – Set to true to create a basic vertex class wtih key column for any edges without corresponding vertex frames.

  • batch_size (int) – Number of rows to transfer at once. Defaults to 10000.

  • transaction_size (int) – Number of rows to treat as a single transaction to xGT. Defaults to 0. Should be a multiple of the batch size and greater than the batch size. 0 means treat all rows as a single transaction.

  • max_text_size (int) – The upper limit on the buffers used when transferring ODBC variable-length text fields. When using VARCHAR from a database, if a limit isn’t set for the length of the strings like VARCHAR(255), the schema size of each string entry could be whatever the max size of database uses for each entry when reporting to ODBC. For instance, each string in Snowflake has an upper limit of 16MB length. This means when allocating the buffers to store the ODBC batch_size would be 16MB multiplied by the batch_size. This parameter will impose a limit on each string length when transferring. Default is determined by the database.

  • max_binary_size (int) – The upper limit on the buffers used when transferring ODBC variable-length binary fields. When using VARBINARY from a database, if a limit isn’t set for the length of binary data like VARBINARY(255), the schema size of each binary entry could be whatever the max size of database uses for each entry when reporting to ODBC. This parameter will impose a limit on each binary field length when transferring. Default is determined by the database.

  • column_mapping (dictionary) – Maps the frame column names to SQL columns for the ingest. The key of each element is a frame column name. The value is either the name of the SQL column (from the table) or the table column index.

  • suppress_errors (bool) – If true, will continue to insert data if an ingest error is encountered, placing the first 1000 errors in the job history. If false, stops on first error and raises. Defaults to False.

  • row_filter (str) – TQL fragment used to filter, modify and parameterize the raw data from the input to produce the row data fed to the frame.

  • on_duplicate_keys ({‘error’, ‘skip’, 'skip_same'}, default 'error') – Specifies what to do upon encountering a duplicate vertex key. Only works for vertex frames. Is ignored for table and edge frames. Allowed values are : - ‘error’, raise an Exception when a duplicate key is found. - ‘skip’, skip duplicate keys without raising. - ‘skip_same’, skip duplicate keys if the row is exactly the same without raising.

Return type:

None