freshcrate
Home > Frameworks > pyodps

pyodps

ODPS Python SDK and data analysis framework

Description

ODPS Python SDK =============== Elegent way to access ODPS API. `Documentation <http://pyodps.readthedocs.org/>`__ Installation ------------ The quick way: :: pip install pyodps[full] If you don’t need to use Jupyter, just type :: pip install pyodps The dependencies will be installed automatically. Or from source code (not recommended for production use): .. code:: shell $ virtualenv pyodps_env $ source pyodps_env/bin/activate $ pip install git+https://github.com/aliyun/aliyun-odps-python-sdk.git Dependencies ------------ - Python (>=2.7), including Python 3+, pypy, Python 3.7 recommended - setuptools (>=3.0) Run Tests --------- - install pytest - copy conf/test.conf.template to odps/tests/test.conf, and fill it with your account - run ``pytest odps`` Usage ----- .. code:: python >>> import os >>> from odps import ODPS >>> # Make sure environment variable CLOUD_ACCESS_KEY_ID already set to Access Key ID of user >>> # while environment variable CLOUD_ACCESS_KEY_SECRET set to Access Key Secret of user. >>> # Not recommended to hardcode Access Key ID or Access Key Secret in your code. >>> o = ODPS( >>> os.getenv('CLOUD_ACCESS_KEY_ID'), >>> os.getenv('CLOUD_ACCESS_KEY_SECRET'), >>> project='**your-project**', >>> endpoint='**your-endpoint**', >>> ) >>> dual = o.get_table('dual') >>> dual.name 'dual' >>> dual.table_schema odps.Schema { c_int_a bigint c_int_b bigint c_double_a double c_double_b double c_string_a string c_string_b string c_bool_a boolean c_bool_b boolean c_datetime_a datetime c_datetime_b datetime } >>> dual.creation_time datetime.datetime(2014, 6, 6, 13, 28, 24) >>> dual.is_virtual_view False >>> dual.size 448 >>> dual.table_schema.columns [<column c_int_a, type bigint>, <column c_int_b, type bigint>, <column c_double_a, type double>, <column c_double_b, type double>, <column c_string_a, type string>, <column c_string_b, type string>, <column c_bool_a, type boolean>, <column c_bool_b, type boolean>, <column c_datetime_a, type datetime>, <column c_datetime_b, type datetime>] Command-line and IPython enhancement ------------------------------------ :: In [1]: %load_ext odps In [2]: %enter Out[2]: <odps.inter.Room at 0x10fe0e450> In [3]: %sql select * from pyodps_iris limit 5 |==========================================| 1 / 1 (100.00%) 2s Out[3]: sepallength sepalwidth petallength petalwidth name 0 5.1 3.5 1.4 0.2 Iris-setosa 1 4.9 3.0 1.4 0.2 Iris-setosa 2 4.7 3.2 1.3 0.2 Iris-setosa 3 4.6 3.1 1.5 0.2 Iris-setosa 4 5.0 3.6 1.4 0.2 Iris-setosa Python UDF Debugging Tool ------------------------- .. code:: python #file: plus.py from odps.udf import annotate @annotate('bigint,bigint->bigint') class Plus(object): def evaluate(self, a, b): return a + b :: $ cat plus.input 1,1 3,2 $ pyou plus.Plus < plus.input 2 5 Contributing ------------ For a development install, clone the repository and then install from source: :: git clone https://github.com/aliyun/aliyun-odps-python-sdk.git cd pyodps pip install -r requirements.txt -e . License ------- Licensed under the `Apache License 2.0 <https://www.apache.org/licenses/LICENSE-2.0.html>`__

Release History

VersionChangesUrgencyDate
0.12.6Imported from PyPI (0.12.6)Low4/21/2026
v0.12.6# Features - Made pyodpswrapper available externally. When images of DataWorks are upgraded, users can upgrade features pyodpswrapper through `pip install -U pyodps` - Added Python 3.13 and 3.14 wheel support (without free-threaded support yet) - Added support for multi-thread read and write operations - Added `EnhanceWriteCheck` parameter to the CreateWriteSession interface in the storage API - Added `partition_spec` argument support for `open_reader` and `open_writer` methods # EnhanMedium3/25/2026
v0.12.5.1## Bugfixes * Fix pickling error of CredentialProviderAccount.Low12/9/2025
v0.12.5## Enhancements - Allow using ODPS_REGION_NAME to pass region name in envs. - (Experimental) Add support for catalog API and model read methods. - Support hash list of data for upsert. - Add support for BufferedRecordReader to reduce connection period. - Add on_exception handler to control retry. - Add namespace param to tunnel_rest. - (Experimental) Add stub for blob type. Reading and writing for the type not implemented yet. ## Bugfixes - Sync tzlocal workaround for linux tz. - FixLow9/17/2025
v0.12.4.1# Bugfixes * Lock when updating and retrieving access ids & keys in certain accounts. * Add schema name when retrieving temp table in `write_sql_result_to_table`. * Allow SQL starts with WITH to be select stmt when calling MCQA. * Pass inst_kw argument in more table functions invoking SQL. Low7/30/2025
v0.12.4# Enhancements * Add an option to allow casting to arrow dtypes on arrow tunnel. * Allow creating table with complex types with `write_table`. * Add an option `infer_type_with_arrow` to allow infering data type with pandas with `write_table`. * Allow selecting record tunnel when calling `to_pandas` methods. * Add support for binary types when using SQLAlchemy. * Remove newsgroup data example due to security concerns about `tarfile` module. * Raise an error when running SQL interactively wLow7/3/2025
v0.12.3# Features * Add support for resources for VolumeFile and VolumeArchive for external volume files. * Implements method to write SQL result to table. * [Experimental] Add support for auto-partition table. # Enhancements * Ignore cases for column names in schema and record. * Remove decimal precision & scale check to allow large decimal scale. * Add config for logview latency and print final progress. * Enhance DDL generation for ROW FORMAT SERDE clause. * Add global compress options foLow5/9/2025
v0.12.2.2# Bugfixes * Fixes potential corruption of default global settings.Low4/18/2025
v0.12.2.1# Enhancements * Add session refresh option for storage API. * Enable timeout when using asyncmode for table tunnel. * Enhance DDL generation for ROW FORMAT SERDE clause. # Documentation * Add docs for basic types, tunnel and table functions. # Bugfixes * Fix error when loading ODPS engine spec of superset.Low3/20/2025
v0.12.2## Features * (Experimental) Add support for MCQAv2 for sqlalchemy. * Add table alternation utility functions * Add support of job insight instead of logviews. Can be turned on by configuring with `options.use_legacy_logview = False`. ## Enhancements * Make SuperSet support compatible with SuperSet 4.1.0 and later. * Print usage when command not correct for pyodps-pack. * Add support for timestamp_ntz for arrow tunnel. * Add checks for potential None header values before request. * AdLow1/3/2025
v0.12.1.1## Bugfixes * Add an import to requests in `odps.lib` to resolve compatibility issue of legacy codes.Low12/5/2024
v0.12.1## Features * (Experimental) Add metrics interface for tunnel. * (Experimental) Add support for MCQAv2. * Add support for upsert writer for table object. ## Enhancements * Support hashing of decimal types for primary keys. * Shift CSV field size limit to table field size limit when reading with legacy result interface. * Add cythonized decimal, array, map and struct validators to accelerate reading and writing of arrays. * Add `allow_schema_mismatch` option and CDC info on tables and pLow11/22/2024
v0.12.0# Features * Implements `write_table` with pandas to facilitate creating tables or partitions with pandas DataFrames.and `to_pandas` methods to facilitate converting from and to pandas DataFrames. * Add support for converting table data and instance results to pandas DataFrames with `to_pandas` and `iter_pandas` methods. * Add separate delete methods for views and materialized views. * Add support for table freeze command. * Add support for using computational quotas. * Add params to allowLow10/3/2024
v0.11.6.5# Enhancements * Switch TableTunnel.create_download_session to async_mode by default. * Support schema version on stream upload session. * Allow creating STS account from env and force reload on expiration. Low8/26/2024
v0.11.6.4# Bugfixes * Fix error when uploading multiple batches with BufferedArrowWriter.Low8/16/2024
v0.11.6.3# Bugfixes * Fix CRC computation of arrow tunnel interfaces * Fix completeness of upload retry of buffered writers # Enhancements * Allow record and reuse MCQA session with local file # Tests * Fix test failure of storage APILow7/31/2024
v0.11.6.2## Bugfixes * Fix types support of json and timestamp_ntz in sqlalchemy * Fix odps.merge.txn.table.compact argument of merge compact command * Make compatibility for Numpy 2.0 ## Enhancements * Warn when running pyodps-pack with sudo under macOS * Allow reading envs from ODPS.__init__ ## Documentation * Add more docs for tunnel APIsLow7/24/2024
v0.10.1.1## Enhancements * Add support for StsAccount.Low6/3/2024
v0.8.6## Enhancements * Add support for StsAccount.Low6/3/2024
v0.11.6.1## Enhancements * Allow opening resources with full resource path and temp hint * Add MaxFrameTask to models Low5/13/2024
v0.11.6Features ======== * Add support for cluster info and views in tables and table DDL output. * Add support for easier threaded writing and writing in multiple processes for TableWriter. Enhancements ============ * Use monotonic time to calculate timeout. * Add support for http+unix socket connection. * Optimize RequestsIO by introducing buffering and simplify threaded sync. * Revoke embedded requests and use buffered writer for table API by default. * Add cython converter for legacy deLow4/17/2024
v0.11.5.post0# Bugfix 1. Fix attribute errors for table preview and storage API.Low1/24/2024
v0.11.5# Features * Add support for arrow table preview reader * Enhance support for Apache Superset * Add support for storage tier on tables and partitions * (Experimental) Add support for tunnel upsert * (Experimental) Add image argument for DataFrame # Bugfixes * Fill partition value for tunnel records * Use PERCENTILE_APPROX for doubles under ODPS 2.0 * Convert all requirement files to UNIX format for pyodps-pack * Fix error when reloading volume tunnel session * Fix logview settinLow1/5/2024
v0.11.5b2Bugfixes ======= * Stop copying and caching for `DataFrame(pd).persist` if possible to reduce memory usage.Low11/20/2023
v0.11.5beta1# Features * Add support for arrow table preview reader * Enhance support for Apache Superset * Add support for storage tier on tables and partitions * (Experimental) Add support for tunnel upsert # Bugfixes * Fill partition value for tunnel records * Use PERCENTILE_APPROX for doubles under ODPS 2.0 * Convert all requirement files to UNIX format for pyodps-pack * Fix error when reloading volume tunnel session * Fix logview setting not working in options * Dump SQL statement whenLow11/10/2023
v0.11.4.1Enhancements ========= * Reuse UDFs when code is same and without closures * Add function to show versions of dependencies * Make stream tunnel to write in blocks * Add quota_name params for various tunnel sessions * Refine MCQA execution API and fallback behavior * Supports JSON column type * Use TABLESAMPLE clause to implement sampling with frac or rows * Allow packing dynamic libraries with pyodps-pack * Auto resolve source dependencies in no docker mode in pyodps-pack Bug fixes Low7/19/2023
v0.11.4.post0# Deployment * Restrict urllib3 version to 1.x.Low5/19/2023
v0.11.4Features ========= * Add API-by-API implementation for storage API * Add retry for table read API * Add automatic submission for table write API Bugfixes ========== * Fix OSError caused by BPO-29097 under certain Python versions * Show composite error message when failed to parse data type Enhancements ============== * Drop support for Python 2.6 * Add more options of pip into pyodps-pack * Show more information when command not found on pyodps-pack * Refine creating ODPS instaLow5/18/2023
v0.11.3.1Enhancements ============== * Add support for none-Docker mode for ``pyodps-pack``. It now supports limited scenarios when Docker not available. * Reduce maximum memory cost of ``to_pandas()`` on tunnels by converting to pandas in batches * Supports complex types when calling ``to_pandas()`` on tunnels * Use default schema when ``odps.namespace.schema`` enabled on tenants, or ``options.always_enable_schema`` set to True * Make sure merging small files is available under schemas * (ExperimLow4/10/2023
v0.11.3# Features * Add new command line tool `pyodps-pack` to pack third-party libraries, recommended as standard packing mechanism * (Experimental) Add preliminary support for custom DataFrame functions with Python 3.8 / 3.9 / 3.10 * Supports DataFrame column join methods * Support configuring instance settings via connection strings with SQLAlchemy * (Experimental) Supports external volume * Supports ``run_sql_interactive_with_fallback`` interface in pyodps * Supports ``get_max_partition`` Low3/10/2023
v0.11.2.4# Bugfixes 1. Add retry on tunnel meta conflicts. 1. Fix forward compatibility for v0.11.3 on table schema arguments.Low2/3/2023
v0.11.3beta1# Features * Add new command line tool `pyodps-pack` to pack third-party libraries, recommended as standard packing mechanism * (Experimental) Add preliminary support for custom DataFrame functions with Python 3.8 / 3.9 / 3.10 * Supports DataFrame column join methods * Support configuring instance settings via connection strings with SQLAlchemy * (Experimental) Supports external volume * Supports ``run_sql_interactive_with_fallback`` interface in pyodps * Supports ``get_max_partition`` Low1/17/2023
v0.11.2.3* Supports interactive query with retry * Bug fixesLow12/23/2022
v0.11.2.2* Supports security queries returning instances * Bug fixesLow9/15/2022
v0.11.2.1* Bug and doc fixesLow8/22/2022
v0.11.2* Make public API for arrow tunnels * Add support for skew join * Bug & docs fixesLow8/17/2022
v0.11.1* Upgrade Mars support to v0.9.0 and switch to different bases for different Python versions and archs * Accelerate tunnel read (Thanks @torshie ) * Fix tests & docs.Low7/5/2022
v0.11.0* Add support for Mars 0.8.0 * Fix compatibility for IPython>=0.8.0 * Use cibuildwheel to publish aarch64 wheels as well as Python 3.10 wheels.Low3/17/2022
v0.10.7.1- Add sts account support. - Fix `to_pandas` error on Windows.Low12/13/2021
v0.10.7- Add more sqa features to support superset - Add docs for downloading with multiprocessingLow4/8/2021
v0.10.6- Use split meta to estimate chunk size - Support specifying a predicate when reading partitions - Allow suspending ipywidgets by options - Fix persist to existed partitionLow3/8/2021
v0.10.5- Apply head optimization for read table - Support reading whole partitioned table - Implement arrow tunnelLow2/23/2021
v0.10.4- Config UDF python version if has UDF in query - add biz_id meta to xflow instance - Fix errors when import tensorflow in Mars - Some enhancementsLow1/4/2021
v0.10.3- doc enhancements - Fix reduce kwargs in pandas backend Low12/21/2020
v0.10.2- Fixes for SQLAlchemy. - Add volume filesystem support. - Switch deployment to Github Actions. - Add reserved slot in partition schema. - Optimized error handling logic and interface for session.Low11/5/2020
v0.10.1- Upgrade Mars to 0.5.2 - Support rescale workers - Fix memory issuesLow10/30/2020
v0.10.0- Add sqlalchemy support - Add refresh mechanism for bearer token - Fix behavior of persist Mars dataframeLow9/16/2020
v0.9.5- Improvements and bug fixes for Mars integration - Add `stored_as` property for `Table` - Fix some compatibility issues - Documentation improvementsLow9/6/2020
v0.9.3.2- Add archive table support. - Enhancements in Mars integration.Low8/22/2020
v0.9.3.1- Add run_script.pyLow7/6/2020
v0.9.3- Enhancements in Mars integration. - Documentation enhancement. - Bugfixes.Low7/6/2020

Dependencies & License Audit

Loading dependencies...

Similar Packages

pre-commitA framework for managing and maintaining multi-language pre-commit hooks.v4.6.0
azure-core-tracing-opentelemetryMicrosoft Azure Azure Core OpenTelemetry plugin Library for Pythonazure-template_0.1.0b6187637
spdx-toolsSPDX parser and tools.0.8.5
lacesDjango components that know how to render themselves.0.1.2
django-tasksA backport of Django's built in Tasks framework0.12.0