Description
ODPS Python SDK =============== Elegent way to access ODPS API. `Documentation <http://pyodps.readthedocs.org/>`__ Installation ------------ The quick way: :: pip install pyodps[full] If you donβt need to use Jupyter, just type :: pip install pyodps The dependencies will be installed automatically. Or from source code (not recommended for production use): .. code:: shell $ virtualenv pyodps_env $ source pyodps_env/bin/activate $ pip install git+https://github.com/aliyun/aliyun-odps-python-sdk.git Dependencies ------------ - Python (>=2.7), including Python 3+, pypy, Python 3.7 recommended - setuptools (>=3.0) Run Tests --------- - install pytest - copy conf/test.conf.template to odps/tests/test.conf, and fill it with your account - run ``pytest odps`` Usage ----- .. code:: python >>> import os >>> from odps import ODPS >>> # Make sure environment variable CLOUD_ACCESS_KEY_ID already set to Access Key ID of user >>> # while environment variable CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user. >>> # Not recommended to hardcode Access Key ID or Access Key Secret in your code. >>> o = ODPS( >>> os.getenv('CLOUD_ACCESS_KEY_ID'), >>> os.getenv('CLOUD_ACCESS_KEY_SECRET'), >>> project='**your-project**', >>> endpoint='**your-endpoint**', >>> ) >>> dual = o.get_table('dual') >>> dual.name 'dual' >>> dual.table_schema odps.Schema { c_int_a bigint c_int_b bigint c_double_a double c_double_b double c_string_a string c_string_b string c_bool_a boolean c_bool_b boolean c_datetime_a datetime c_datetime_b datetime } >>> dual.creation_time datetime.datetime(2014, 6, 6, 13, 28, 24) >>> dual.is_virtual_view False >>> dual.size 448 >>> dual.table_schema.columns [<column c_int_a, type bigint>, <column c_int_b, type bigint>, <column c_double_a, type double>, <column c_double_b, type double>, <column c_string_a, type string>, <column c_string_b, type string>, <column c_bool_a, type boolean>, <column c_bool_b, type boolean>, <column c_datetime_a, type datetime>, <column c_datetime_b, type datetime>] Command-line and IPython enhancement ------------------------------------ :: In [1]: %load_ext odps In [2]: %enter Out[2]: <odps.inter.Room at 0x10fe0e450> In [3]: %sql select * from pyodps_iris limit 5 |==========================================| 1 / 1 (100.00%) 2s Out[3]: sepallength sepalwidth petallength petalwidth name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa Python UDF Debugging Tool ------------------------- .. code:: python #file: plus.py from odps.udf import annotate @annotate('bigint,bigint->bigint') class Plus(object): def evaluate(self, a, b): return a + b :: $ cat plus.input 1,1 3,2 $ pyou plus.Plus < plus.input 2 5 Contributing ------------ For a development install, clone the repository and then install from source: :: git clone https://github.com/aliyun/aliyun-odps-python-sdk.git cd pyodps pip install -r requirements.txt -e . License ------- Licensed under the `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0.html>`__
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 0.12.6 | Imported from PyPI (0.12.6) | Low | 4/21/2026 |
| v0.12.6 | # Features - Made pyodpswrapper available externally. When images of DataWorks are upgraded, users can upgrade features pyodpswrapper through `pip install -U pyodps` - Added Python 3.13 and 3.14 wheel support (without free-threaded support yet) - Added support for multi-thread read and write operations - Added `EnhanceWriteCheck` parameter to the CreateWriteSession interface in the storage API - Added `partition_spec` argument support for `open_reader` and `open_writer` methods # Enhan | Medium | 3/25/2026 |
| v0.12.5.1 | ## Bugfixes * Fix pickling error of CredentialProviderAccount. | Low | 12/9/2025 |
| v0.12.5 | ## Enhancements - Allow using ODPS_REGION_NAME to pass region name in envs. - (Experimental) Add support for catalog API and model read methods. - Support hash list of data for upsert. - Add support for BufferedRecordReader to reduce connection period. - Add on_exception handler to control retry. - Add namespace param to tunnel_rest. - (Experimental) Add stub for blob type. Reading and writing for the type not implemented yet. ## Bugfixes - Sync tzlocal workaround for linux tz. - Fix | Low | 9/17/2025 |
| v0.12.4.1 | # Bugfixes * Lock when updating and retrieving access ids & keys in certain accounts. * Add schema name when retrieving temp table in `write_sql_result_to_table`. * Allow SQL starts with WITH to be select stmt when calling MCQA. * Pass inst_kw argument in more table functions invoking SQL. | Low | 7/30/2025 |
| v0.12.4 | # Enhancements * Add an option to allow casting to arrow dtypes on arrow tunnel. * Allow creating table with complex types with `write_table`. * Add an option `infer_type_with_arrow` to allow infering data type with pandas with `write_table`. * Allow selecting record tunnel when calling `to_pandas` methods. * Add support for binary types when using SQLAlchemy. * Remove newsgroup data example due to security concerns about `tarfile` module. * Raise an error when running SQL interactively w | Low | 7/3/2025 |
| v0.12.3 | # Features * Add support for resources for VolumeFile and VolumeArchive for external volume files. * Implements method to write SQL result to table. * [Experimental] Add support for auto-partition table. # Enhancements * Ignore cases for column names in schema and record. * Remove decimal precision & scale check to allow large decimal scale. * Add config for logview latency and print final progress. * Enhance DDL generation for ROW FORMAT SERDE clause. * Add global compress options fo | Low | 5/9/2025 |
| v0.12.2.2 | # Bugfixes * Fixes potential corruption of default global settings. | Low | 4/18/2025 |
| v0.12.2.1 | # Enhancements * Add session refresh option for storage API. * Enable timeout when using asyncmode for table tunnel. * Enhance DDL generation for ROW FORMAT SERDE clause. # Documentation * Add docs for basic types, tunnel and table functions. # Bugfixes * Fix error when loading ODPS engine spec of superset. | Low | 3/20/2025 |
| v0.12.2 | ## Features * (Experimental) Add support for MCQAv2 for sqlalchemy. * Add table alternation utility functions * Add support of job insight instead of logviews. Can be turned on by configuring with `options.use_legacy_logview = False`. ## Enhancements * Make SuperSet support compatible with SuperSet 4.1.0 and later. * Print usage when command not correct for pyodps-pack. * Add support for timestamp_ntz for arrow tunnel. * Add checks for potential None header values before request. * Ad | Low | 1/3/2025 |
| v0.12.1.1 | ## Bugfixes * Add an import to requests in `odps.lib` to resolve compatibility issue of legacy codes. | Low | 12/5/2024 |
| v0.12.1 | ## Features * (Experimental) Add metrics interface for tunnel. * (Experimental) Add support for MCQAv2. * Add support for upsert writer for table object. ## Enhancements * Support hashing of decimal types for primary keys. * Shift CSV field size limit to table field size limit when reading with legacy result interface. * Add cythonized decimal, array, map and struct validators to accelerate reading and writing of arrays. * Add `allow_schema_mismatch` option and CDC info on tables and p | Low | 11/22/2024 |
| v0.12.0 | # Features * Implements `write_table` with pandas to facilitate creating tables or partitions with pandas DataFrames.and `to_pandas` methods to facilitate converting from and to pandas DataFrames. * Add support for converting table data and instance results to pandas DataFrames with `to_pandas` and `iter_pandas` methods. * Add separate delete methods for views and materialized views. * Add support for table freeze command. * Add support for using computational quotas. * Add params to allow | Low | 10/3/2024 |
| v0.11.6.5 | # Enhancements * Switch TableTunnel.create_download_session to async_mode by default. * Support schema version on stream upload session. * Allow creating STS account from env and force reload on expiration. | Low | 8/26/2024 |
| v0.11.6.4 | # Bugfixes * Fix error when uploading multiple batches with BufferedArrowWriter. | Low | 8/16/2024 |
| v0.11.6.3 | # Bugfixes * Fix CRC computation of arrow tunnel interfaces * Fix completeness of upload retry of buffered writers # Enhancements * Allow record and reuse MCQA session with local file # Tests * Fix test failure of storage API | Low | 7/31/2024 |
| v0.11.6.2 | ## Bugfixes * Fix types support of json and timestamp_ntz in sqlalchemy * Fix odps.merge.txn.table.compact argument of merge compact command * Make compatibility for Numpy 2.0 ## Enhancements * Warn when running pyodps-pack with sudo under macOS * Allow reading envs from ODPS.__init__ ## Documentation * Add more docs for tunnel APIs | Low | 7/24/2024 |
| v0.10.1.1 | ## Enhancements * Add support for StsAccount. | Low | 6/3/2024 |
| v0.8.6 | ## Enhancements * Add support for StsAccount. | Low | 6/3/2024 |
| v0.11.6.1 | ## Enhancements * Allow opening resources with full resource path and temp hint * Add MaxFrameTask to models | Low | 5/13/2024 |
| v0.11.6 | Features ======== * Add support for cluster info and views in tables and table DDL output. * Add support for easier threaded writing and writing in multiple processes for TableWriter. Enhancements ============ * Use monotonic time to calculate timeout. * Add support for http+unix socket connection. * Optimize RequestsIO by introducing buffering and simplify threaded sync. * Revoke embedded requests and use buffered writer for table API by default. * Add cython converter for legacy de | Low | 4/17/2024 |
| v0.11.5.post0 | # Bugfix 1. Fix attribute errors for table preview and storage API. | Low | 1/24/2024 |
| v0.11.5 | # Features * Add support for arrow table preview reader * Enhance support for Apache Superset * Add support for storage tier on tables and partitions * (Experimental) Add support for tunnel upsert * (Experimental) Add image argument for DataFrame # Bugfixes * Fill partition value for tunnel records * Use PERCENTILE_APPROX for doubles under ODPS 2.0 * Convert all requirement files to UNIX format for pyodps-pack * Fix error when reloading volume tunnel session * Fix logview settin | Low | 1/5/2024 |
| v0.11.5b2 | Bugfixes ======= * Stop copying and caching for `DataFrame(pd).persist` if possible to reduce memory usage. | Low | 11/20/2023 |
| v0.11.5beta1 | # Features * Add support for arrow table preview reader * Enhance support for Apache Superset * Add support for storage tier on tables and partitions * (Experimental) Add support for tunnel upsert # Bugfixes * Fill partition value for tunnel records * Use PERCENTILE_APPROX for doubles under ODPS 2.0 * Convert all requirement files to UNIX format for pyodps-pack * Fix error when reloading volume tunnel session * Fix logview setting not working in options * Dump SQL statement when | Low | 11/10/2023 |
| v0.11.4.1 | Enhancements ========= * Reuse UDFs when code is same and without closures * Add function to show versions of dependencies * Make stream tunnel to write in blocks * Add quota_name params for various tunnel sessions * Refine MCQA execution API and fallback behavior * Supports JSON column type * Use TABLESAMPLE clause to implement sampling with frac or rows * Allow packing dynamic libraries with pyodps-pack * Auto resolve source dependencies in no docker mode in pyodps-pack Bug fixes | Low | 7/19/2023 |
| v0.11.4.post0 | # Deployment * Restrict urllib3 version to 1.x. | Low | 5/19/2023 |
| v0.11.4 | Features ========= * Add API-by-API implementation for storage API * Add retry for table read API * Add automatic submission for table write API Bugfixes ========== * Fix OSError caused by BPO-29097 under certain Python versions * Show composite error message when failed to parse data type Enhancements ============== * Drop support for Python 2.6 * Add more options of pip into pyodps-pack * Show more information when command not found on pyodps-pack * Refine creating ODPS insta | Low | 5/18/2023 |
| v0.11.3.1 | Enhancements ============== * Add support for none-Docker mode for ``pyodps-pack``. It now supports limited scenarios when Docker not available. * Reduce maximum memory cost of ``to_pandas()`` on tunnels by converting to pandas in batches * Supports complex types when calling ``to_pandas()`` on tunnels * Use default schema when ``odps.namespace.schema`` enabled on tenants, or ``options.always_enable_schema`` set to True * Make sure merging small files is available under schemas * (Experim | Low | 4/10/2023 |
| v0.11.3 | # Features * Add new command line tool `pyodps-pack` to pack third-party libraries, recommended as standard packing mechanism * (Experimental) Add preliminary support for custom DataFrame functions with Python 3.8 / 3.9 / 3.10 * Supports DataFrame column join methods * Support configuring instance settings via connection strings with SQLAlchemy * (Experimental) Supports external volume * Supports ``run_sql_interactive_with_fallback`` interface in pyodps * Supports ``get_max_partition`` | Low | 3/10/2023 |
| v0.11.2.4 | # Bugfixes 1. Add retry on tunnel meta conflicts. 1. Fix forward compatibility for v0.11.3 on table schema arguments. | Low | 2/3/2023 |
| v0.11.3beta1 | # Features * Add new command line tool `pyodps-pack` to pack third-party libraries, recommended as standard packing mechanism * (Experimental) Add preliminary support for custom DataFrame functions with Python 3.8 / 3.9 / 3.10 * Supports DataFrame column join methods * Support configuring instance settings via connection strings with SQLAlchemy * (Experimental) Supports external volume * Supports ``run_sql_interactive_with_fallback`` interface in pyodps * Supports ``get_max_partition`` | Low | 1/17/2023 |
| v0.11.2.3 | * Supports interactive query with retry * Bug fixes | Low | 12/23/2022 |
| v0.11.2.2 | * Supports security queries returning instances * Bug fixes | Low | 9/15/2022 |
| v0.11.2.1 | * Bug and doc fixes | Low | 8/22/2022 |
| v0.11.2 | * Make public API for arrow tunnels * Add support for skew join * Bug & docs fixes | Low | 8/17/2022 |
| v0.11.1 | * Upgrade Mars support to v0.9.0 and switch to different bases for different Python versions and archs * Accelerate tunnel read (Thanks @torshie ) * Fix tests & docs. | Low | 7/5/2022 |
| v0.11.0 | * Add support for Mars 0.8.0 * Fix compatibility for IPython>=0.8.0 * Use cibuildwheel to publish aarch64 wheels as well as Python 3.10 wheels. | Low | 3/17/2022 |
| v0.10.7.1 | - Add sts account support. - Fix `to_pandas` error on Windows. | Low | 12/13/2021 |
| v0.10.7 | - Add more sqa features to support superset - Add docs for downloading with multiprocessing | Low | 4/8/2021 |
| v0.10.6 | - Use split meta to estimate chunk size - Support specifying a predicate when reading partitions - Allow suspending ipywidgets by options - Fix persist to existed partition | Low | 3/8/2021 |
| v0.10.5 | - Apply head optimization for read table - Support reading whole partitioned table - Implement arrow tunnel | Low | 2/23/2021 |
| v0.10.4 | - Config UDF python version if has UDF in query - add biz_id meta to xflow instance - Fix errors when import tensorflow in Mars - Some enhancements | Low | 1/4/2021 |
| v0.10.3 | - doc enhancements - Fix reduce kwargs in pandas backend | Low | 12/21/2020 |
| v0.10.2 | - Fixes for SQLAlchemy. - Add volume filesystem support. - Switch deployment to Github Actions. - Add reserved slot in partition schema. - Optimized error handling logic and interface for session. | Low | 11/5/2020 |
| v0.10.1 | - Upgrade Mars to 0.5.2 - Support rescale workers - Fix memory issues | Low | 10/30/2020 |
| v0.10.0 | - Add sqlalchemy support - Add refresh mechanism for bearer token - Fix behavior of persist Mars dataframe | Low | 9/16/2020 |
| v0.9.5 | - Improvements and bug fixes for Mars integration - Add `stored_as` property for `Table` - Fix some compatibility issues - Documentation improvements | Low | 9/6/2020 |
| v0.9.3.2 | - Add archive table support. - Enhancements in Mars integration. | Low | 8/22/2020 |
| v0.9.3.1 | - Add run_script.py | Low | 7/6/2020 |
| v0.9.3 | - Enhancements in Mars integration. - Documentation enhancement. - Bugfixes. | Low | 7/6/2020 |
