client¶
-
bigquery.client.
get_client
(project_id, credentials=None, service_url=None, service_account=None, private_key=None, private_key_file=None, json_key=None, json_key_file=None, readonly=True, swallow_results=True)¶ Return a singleton instance of BigQueryClient. Either AssertionCredentials or a service account and private key combination need to be provided in order to authenticate requests to BigQuery.
Parameters: project_id : str
The BigQuery project id
credentials : oauth2client.client.SignedJwtAssertionCredentials, optional
AssertionCredentials instance to authenticate requests to BigQuery (optional, must provide service_account and (private_key or private_key_file) or (json_key or json_key_file) if not included
service_url : str, optional
A URI string template pointing to the location of Google’s API discovery service. Requires two parameters {api} and {apiVersion} that when filled in produce an absolute URI to the discovery document for that service. If not set then the default googleapiclient discovery URI is used. See credentials
service_account : str, optional
The Google API service account name. See credentials
private_key : str, optional
The private key associated with the service account in PKCS12 or PEM format. See credentials
private_key_file : str, optional
The name of the file containing the private key associated with the service account in PKCS12 or PEM format. See credentials
json_key : dict, optional
The JSON key associated with the service account. See credentials
json_key_file : str, optional
The name of the JSON key file associated with the service account. See credentials.
readonly : bool
Bool indicating if BigQuery access is read-only. Has no effect if credentials are provided. Default True.
swallow_results : bool
If set to False, then return the actual response value instead of converting to boolean. Default True.
Returns: BigQueryClient
An instance of the BigQuery client.
BigQueryClient
Class¶
-
class
bigquery.client.
BigQueryClient
(bq_service, project_id, swallow_results=True)¶ Methods
check_dataset
(dataset_id)Check to see if a dataset exists. check_job
(job_id)Return the state and number of results of a query by job id. check_table
(dataset, table)Check to see if a table exists. create_dataset
(dataset_id[, friendly_name, ...])Create a new BigQuery dataset. create_table
(dataset, table, schema[, ...])Create a new table in the dataset. create_view
(dataset, view, query)Create a new view in the dataset. dataset_resource
(ref_id[, friendly_name, ...])See https://developers.google.com/bigquery/docs/reference/v2/datasets#resource delete_dataset
(dataset_id[, delete_contents])Delete a BigQuery dataset. delete_table
(dataset, table)Delete a table from the dataset. export_data_to_uris
(destination_uris, ...[, ...])Export data from a BigQuery table to cloud storage. get_dataset
(dataset_id)Retrieve a dataset if it exists, otherwise return an empty dict. get_datasets
()List all datasets in the project. get_query_results
(job_id[, offset, limit, ...])Execute the query job indicated by the given job id. get_query_rows
(job_id[, offset, limit, timeout])Retrieve a list of rows from a query table by job id. get_query_schema
(job_id)Retrieve the schema of a query by job id. get_table
(dataset, table)Retrieve a table if it exists, otherwise return an empty dict. get_table_schema
(dataset, table)Return the table schema. get_tables
(dataset_id, app_id, start_time, ...)Retrieve a list of tables that are related to the given app id and are inside the range of start and end times. import_data_from_uris
(source_uris, dataset, ...)Imports data into a BigQuery table from cloud storage. patch_dataset
(dataset_id[, friendly_name, ...])Updates information in an existing dataset. patch_table
(dataset, table, schema)Patch an existing table in the dataset. push_rows
(dataset, table, rows[, insert_id_key])Upload rows to BigQuery table. query
(query[, max_results, timeout, dry_run])Submit a query to BigQuery. schema_from_record
(record)Given a dict representing a record instance to be inserted into BigQuery, calculate the schema. update_dataset
(dataset_id[, friendly_name, ...])Updates information in an existing dataset. update_table
(dataset, table, schema)Update an existing table in the dataset. wait_for_job
(job[, interval, timeout])Waits until the job indicated by job_resource is done or has failed write_to_table
(query[, dataset, table, ...])Write query result to table. -
check_dataset
(dataset_id)¶ Check to see if a dataset exists.
Parameters: dataset_id : str
Dataset unique id
Returns: bool
True if dataset at dataset_id exists, else Fasle
-
check_job
(job_id)¶ Return the state and number of results of a query by job id.
Parameters: job_id : str
The job id of the query to check.
Returns: tuple
(
bool
,int
) Whether or not the query has completed and the total number of rows included in the query table if it has completed (else 0)
-
check_table
(dataset, table)¶ Check to see if a table exists.
Parameters: dataset : str
The dataset to check
table : str
The name of the table
Returns: bool
True if table exists, else False
-
create_dataset
(dataset_id, friendly_name=None, description=None, access=None)¶ Create a new BigQuery dataset.
Parameters: dataset_id : str
Unique
str
identifying the dataset with the project (the referenceID of the dataset, not the integer id of the dataset)friendly_name: str, optional
A human readable name
description: str, optional
Longer string providing a description
access : list, optional
Indicating access permissions (see https://developers.google.com/bigquery/docs/reference/v2/datasets#resource)
Returns: Union[bool, dict]
bool
indicating if dataset was created or not, or response from BigQuery if swallow_results is set for False
-
create_table
(dataset, table, schema, expiration_time=None)¶ Create a new table in the dataset.
Parameters: dataset : str
The dataset to create the table in
table : str
The name of the table to create
schema : dict
The table schema
expiration_time : float, optional
The expiry time in milliseconds since the epoch.
Returns: Union[bool, dict]
If the table was successfully created, or response from BigQuery if swallow_results is set to False
-
create_view
(dataset, view, query)¶ Create a new view in the dataset.
Parameters: dataset : str
The dataset to create the view in
view : str
The name of the view to create
query : dict
A query that BigQuery executes when the view is referenced.
Returns: Union[bool, dict]
bool indicating if the view was successfully created or not, or response from BigQuery if swallow_results is set to False.
-
dataset_resource
(ref_id, friendly_name=None, description=None, access=None)¶ See https://developers.google.com/bigquery/docs/reference/v2/datasets#resource
Parameters: ref_id : str
Dataset id (the reference id, not the integer id)
friendly_name : str, optional
An optional descriptive name for the dataset
description : str, optional
An optional description for the dataset
access : list, optional
Indicating access permissions
Returns: dict
Representing BigQuery dataset resource
-
delete_dataset
(dataset_id, delete_contents=False)¶ Delete a BigQuery dataset.
Parameters: dataset_id : str
Unique
str
identifying the datset with the project (the referenceId of the dataset)delete_contents : bool, optional
If True, forces the deletion of the dataset even when the dataset contains data (Default = False)
Returns: Union[bool, dict[
ool indicating if the delete was successful or not, or response from BigQuery if swallow_results is set for False
Raises: HttpError
404 when dataset with dataset_id does not exist
-
delete_table
(dataset, table)¶ Delete a table from the dataset.
Parameters: dataset : str
The dataset to delete the table from.
table : str
The name of the table to delete
Returns: Union[bool, dict]
bool indicating if the table was successfully deleted or not, or response from BigQuery if swallow_results is set for False.
-
export_data_to_uris
(destination_uris, dataset, table, job=None, compression=None, destination_format=None, print_header=None, field_delimiter=None)¶ Export data from a BigQuery table to cloud storage. Optional arguments that are not specified are determined by BigQuery as described: https://developers.google.com/bigquery/docs/reference/v2/jobs
Parameters: destination_urls : Union[str, list]
str
orlist
ofstr
objects representing the URIs on cloud storage of the form: gs://bucket/filenamedataset : str
String id of the dataset
table : str
String id of the table
job : str, optional
String identifying the job (a unique jobid is automatically generated if not provided)
compression : str, optional
One of the JOB_COMPRESSION_* constants
destination_format : str, optional
One of the JOB_DESTination_FORMAT_* constants
print_header : bool, optional
Whether or not to print the header
field_delimiter : str, optional
Character separating fields in delimited file
Returns: dict
A BigQuery job resource
Raises: JobInsertException
On http/auth failures or error in result
-
get_dataset
(dataset_id)¶ Retrieve a dataset if it exists, otherwise return an empty dict.
Parameters: dataset_id : str
Dataset unique id
Returns: dict
Contains dataset object if it exists, else empty
-
get_datasets
()¶ List all datasets in the project.
Returns: list
Dataset resources
-
get_query_results
(job_id, offset=None, limit=None, page_token=None, timeout=0)¶ Execute the query job indicated by the given job id. This is direct mapping to bigquery api https://cloud.google.com/bigquery/docs/reference/v2/jobs/getQueryResults
Parameters: job_id : str
The job id of the query to check
offset : optional
The index the result set should start at.
limit : int, optional
The maximum number of results to retrieve.
page_token : optional
Page token, returned by previous call, to request the next page of results.
timeout : float, optional
Timeout in seconds
Returns: out
The query reply
-
get_query_rows
(job_id, offset=None, limit=None, timeout=0)¶ Retrieve a list of rows from a query table by job id. This method will append results from multiple pages together. If you want to manually page through results, you can use get_query_results method directly.
Parameters: job_id : str
The job id that references a BigQuery query.
offset : int, optional
The offset of the rows to pull from BigQuery
limit : int, optional
The number of rows to retrieve from a query table.
timeout : float, optional
Timeout in seconds.
Returns: list
A
list
ofdict
objects that represent table rows.
-
get_query_schema
(job_id)¶ Retrieve the schema of a query by job id.
Parameters: job_id : str
The job_id that references a BigQuery query
Returns: list
A
list
ofdict
objects that represent the schema.
-
get_table
(dataset, table)¶ Retrieve a table if it exists, otherwise return an empty dict.
Parameters: dataset : str
The dataset that the table is in
table : str
The name of the table
Returns: dict
Containing the table object if it exists, else empty
-
get_table_schema
(dataset, table)¶ Return the table schema.
Parameters: dataset : str
The dataset containing the table.
table : str
The table to get the schema for
Returns: list
A
list
ofdict
objects that represent the table schema. If the table doesn’t exist, None is returned.
-
get_tables
(dataset_id, app_id, start_time, end_time)¶ Retrieve a list of tables that are related to the given app id and are inside the range of start and end times.
Parameters: dataset_id : str
The BigQuery dataset id to consider.
app_id : str
The appspot name
start_time : Union[datetime, int]
The datetime or unix time after which records will be fetched.
end_time : Union[datetime, int]
The datetime or unix time up to which records will be fetched.
Returns: list
A
list
of table names.
-
import_data_from_uris
(source_uris, dataset, table, schema=None, job=None, source_format=None, create_disposition=None, write_disposition=None, encoding=None, ignore_unknown_values=None, max_bad_records=None, allow_jagged_rows=None, allow_quoted_newlines=None, field_delimiter=None, quote=None, skip_leading_rows=None)¶ Imports data into a BigQuery table from cloud storage. Optional arguments that are not specified are determined by BigQuery as described: https://developers.google.com/bigquery/docs/reference/v2/jobs
Parameters: source_urls : list
A
list
ofstr
objects representing the urls on cloud storage of the form: gs://bucket/filenamedataset : str
String id of the dataset
table : str
String id of the table
job : str, optional
Identifies the job (a unique job id is automatically generated if not provided)
schema : list, optional
Represents the BigQuery schema
source_format : str, optional
One of the JOB_SOURCE_FORMAT_* constants
create_disposition : str, optional
One of the JOB_CREATE_* constants
write_disposition : str, optional
One of the JOB_WRITE_* constants
encoding : str, optional
One of the JOB_ENCODING_* constants
ignore_unknown_values : bool, optional
Whether or not to ignore unknown values
max_bad_records : int, optional
Maximum number of bad records
allow_jagged_rows : bool, optional
For csv only
allow_quoted_newlines : bool, optional
For csv only
field_delimiter : str, optional
For csv only
quote : str, optional
Quote character for csv only
skip_leading_rows : int, optional
For csv only
Returns: dict
A BigQuery job response
Raises: JobInsertException
on http/auth failures or error in result
-
patch_dataset
(dataset_id, friendly_name=None, description=None, access=None)¶ Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource.
Parameters: dataset_id : str
Unique string idenfitying the dataset with the project (the referenceId of the dataset)
friendly_name : str, optional
An optional descriptive name for the dataset.
description : str, optional
An optional description of the dataset.
access : list, optional
Indicating access permissions.
Returns: Union[bool, dict]
bool
indicating if the patch was successful or not, or response from BigQuery if swallow_results is set for False.
-
patch_table
(dataset, table, schema)¶ Patch an existing table in the dataset.
Parameters: dataset : str
The dataset to patch the table in
table : str
The name of the table to patch
schema : dict
The table schema
Returns: Union[bool, dict]
Bool indicating if the table was successfully patched or not, or response from BigQuery if swallow_results is set to False
-
push_rows
(dataset, table, rows, insert_id_key=None)¶ Upload rows to BigQuery table.
Parameters: dataset : str
The dataset to upload to
table : str
The name of the table to insert rows into
rows : list
A
list
of rows (dict
objects) to add to the tableinsert_id_key : str, optional
Key for insertId in row
Returns: Union[bool, dict]
bool indicating if insert succeeded or not, or response from BigQuery if swallow_results is set for False.
-
query
(query, max_results=None, timeout=0, dry_run=False)¶ Submit a query to BigQuery.
Parameters: query : str
BigQuery query string
max_results : int, optional
The maximum number of rows to return per page of results.
timeout : float, optional
How long to wait for the query to complete, in seconds before the request times out and returns.
dry_run : bool, optional
If True, the query isn’t actually run. A valid query will return an empty response, while an invalid one will return the same error message it would if it wasn’t a dry run.
Returns: tuple
(job id, query results) if the query completed. If dry_run is True, job id will be None and results will be empty if the query is valid or a
dict
containing the response if invalid.Raises: BigQueryTimeoutException
on timeout
-
classmethod
schema_from_record
(record)¶ Given a dict representing a record instance to be inserted into BigQuery, calculate the schema.
Parameters: record : dict
representing a record to be inserted into big query, where all keys are
str
objects (representing column names in the record) and all values are of typeint
,str
,unicode
,float
,bool
,datetime
, ordict
. Adict
value represents a record, and must conform to the same restrictions as recordReturns: list
BigQuery schema
Notes
Results are undefined if a different value type is provided for a repeated field: E.g.
>>> { rfield: [ { x: 1}, {x: "a string"} ] } # undefined!
-
update_dataset
(dataset_id, friendly_name=None, description=None, access=None)¶ Updates information in an existing dataset. The update method replaces the entire dataset resource, whereas the patch method only replaces fields that are provided in the submitted dataset resource.
Parameters: dataset_id : str
Unique
str
identifying the dataset with the project (the referencedId of the dataset)friendly_name : str, optional
An optional descriptive name for the dataset.
description : str, optional
An optional description of the dataset.
access : list, optional
Indicating access permissions
Returns: Union[bool, dict]
bool
indicating if the update was successful or not, or response from BigQuery if swallow_results is set for False.
-
update_table
(dataset, table, schema)¶ Update an existing table in the dataset.
Parameters: dataset : str
The dataset to update the table in
table : str
The name of the table to update
schema : dict
Table schema
Returns: Union[bool, dict]
bool indicating if the table was successfully updated or not, or response from BigQuery if swallow_results is set to False.
-
wait_for_job
(job, interval=5, timeout=60)¶ Waits until the job indicated by job_resource is done or has failed
Parameters: job : Union[dict, str]
dict
representing a BigQuery job resource, or astr
representing the BigQuery job idinterval : float, optional
Polling interval in seconds, default = 5
timeout : float, optional
Timeout in seconds, default = 60
Returns: dict
Final state of the job resouce, as described here: https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/python/latest/bigquery_v2.jobs.html#get
Raises: Union[JobExecutingException, BigQueryTimeoutException]
On http/auth failures or timeout
-
write_to_table
(query, dataset=None, table=None, external_udf_uris=[], allow_large_results=None, use_query_cache=None, priority=None, create_disposition=None, write_disposition=None)¶ Write query result to table. If dataset or table is not provided, Bigquery will write the result to temporary table. Optional arguments that are not specified are determined by BigQuery as described: https://developers.google.com/bigquery/docs/reference/v2/jobs
Parameters: query : str
BigQuery query string
dataset : str, optional
String id of the dataset
table : str, optional
String id of the table
external_udf_uris : list, optional
Contains extternal UDF URIs. If given, URIs must be Google Cloud Storage and have .js extensions.
allow_large_results : bool, optional
Whether or not to allow large results
use_query_cache : bool, optional
Whether or not to use query cache
priority : str, optional
One of the JOB_PRIORITY_* constants
create_disposition : str, optional
One of the JOB_CREATE_* constants
write_disposition : str, optional
One of the JOB_WRITE_* constants
Returns: dict
A BigQuery job resource
Raises: JobInsertException
On http/auth failures or error in result
-