freshcrate
Home > Databases > awswrangler

awswrangler

Pandas on AWS.

Description

# AWS SDK for pandas (awswrangler) *Pandas on AWS* Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL). ![AWS SDK for pandas](https://github.com/aws/aws-sdk-pandas/blob/main/docs/source/_static/logo2.png?raw=true "AWS SDK for pandas") ![tracker](https://d3tiqpr4kkkomd.cloudfront.net/img/pixel.png?asset=GVOYN2BOOQ573LTVIHEW) > An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com [![PyPi](https://img.shields.io/pypi/v/awswrangler)](https://pypi.org/project/awswrangler/) [![Conda](https://img.shields.io/conda/vn/conda-forge/awswrangler)](https://anaconda.org/conda-forge/awswrangler) [![Python Version](https://img.shields.io/pypi/pyversions/awswrangler.svg)](https://pypi.org/project/awswrangler/) [![Code style: ruff](https://img.shields.io/badge/code%20style-ruff-000000.svg)](https://github.com/astral-sh/ruff) [![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](https://opensource.org/licenses/Apache-2.0) [![Checked with mypy](http://www.mypy-lang.org/static/mypy_badge.svg)](http://mypy-lang.org/) ![Static Checking](https://github.com/aws/aws-sdk-pandas/workflows/Static%20Checking/badge.svg?branch=main) [![Documentation Status](https://readthedocs.org/projects/aws-sdk-pandas/badge/?version=latest)](https://aws-sdk-pandas.readthedocs.io/?badge=latest) | Source | Downloads | Installation Command | |--------|-----------|----------------------| | **[PyPi](https://pypi.org/project/awswrangler/)** | [![PyPI Downloads](https://img.shields.io/pypi/dm/awswrangler)](https://pypi.org/project/awswrangler/) | `pip install awswrangler` | | **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [![Conda Downloads](https://img.shields.io/conda/dn/conda-forge/awswrangler.svg)](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` | > ⚠️ **Starting version 3.0, optional modules must be installed explicitly:**<br> ➑️`pip install 'awswrangler[redshift]'` ## Table of contents - [Quick Start](#quick-start) - [At Scale](#at-scale) - [Read The Docs](#read-the-docs) - [Getting Help](#getting-help) - [Logging](#logging) ## Quick Start Installation command: `pip install awswrangler` > ⚠️ **Starting version 3.0, optional modules must be installed explicitly:**<br> ➑️`pip install 'awswrangler[redshift]'` ```py3 import awswrangler as wr import pandas as pd from datetime import datetime df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]}) # Storing data on Data Lake wr.s3.to_parquet( df=df, path="s3://bucket/dataset/", dataset=True, database="my_db", table="my_table" ) # Retrieving the data directly from Amazon S3 df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True) # Retrieving the data from Amazon Athena df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db") # Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum con = wr.redshift.connect("my-glue-connection") df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con) con.close() # Amazon Timestream Write df = pd.DataFrame({ "time": [datetime.now(), datetime.now()], "my_dimension": ["foo", "boo"], "measure": [1.0, 1.1], }) rejected_records = wr.timestream.write(df, database="sampleDB", table="sampleTable", time_col="time", measure_col="measure", dimensions_cols=["my_dimension"], ) # Amazon Timestream Query wr.timestream.query(""" SELECT time, measure_value::double, my_dimension FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3 """) ``` ## At scale AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more. ## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/) - [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/about.html) - [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html) - [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#pypi-pip) - [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#conda) - [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-lambda-layer) - [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-glue-python-shell-jobs) - [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-glue-pyspark-jobs) - [Amazon SageMa

Release History

VersionChangesUrgencyDate
3.16.0Imported from PyPI (3.16.0)Low4/21/2026
3.15.1### Security / Dependency Updates πŸ›‘οΈ * fix: upgrade setuptools due to CVE-2026-23949 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3261 * chore: pyasn1, wheel, filelock security fixes by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3262 * chore: wheel security fix #3262 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3263 * chore: Update dependencies by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3268 ### Housekeeping 🧹 * chore(depLow2/5/2026
3.15.0## Notable Changes ⚠️ * fix: upgrade aiohttp due to CVE-2025-69223 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3250 * chore: Build Python 3.14 layers by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3251 * chore: Drop Python 3.9 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3257 ### Features / Enhancements πŸš€ * feat(s3): add to_deltalake_streaming for single-commit Delta writes by @skoschik in https://github.com/aws/aws-sdk-pandas/pull/3231 * fLow1/13/2026
3.14.0## Notable Changes ⚠️ * chore: upgrade pg8000 due to CVE-2025-61385 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3225 ### Features / Enhancements πŸš€ * feat: support redshift `CLEANPATH` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3211 * feat: add result reuse configuration to query execution functions by @DavidKatz-il in https://github.com/aws/aws-sdk-pandas/pull/3212 ### Bugfixes πŸ› * fix: Add `s3_output` parameter to `_start_query_execution` call in "Low10/30/2025
3.13.0## Notable Changes ⚠️ * updated `aiohhtp==3.12.15`to fix CVE-2025-53643 (LOW) by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3197 ### Features / Enhancements πŸš€ * feat: ray 2.49.0 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3194 * feat: add support for aurora-mysql and aurora-postgresql engines by @senorcinco in https://github.com/aws/aws-sdk-pandas/pull/3188 ### Bugfixes πŸ› * fix: opensearch session by @kukushking in https://github.com/aws/aws-sdk-pandasLow9/10/2025
3.12.1## Notable Changes ⚠️ * Moved to [uv package manager](https://github.com/astral-sh/uv) πŸ”₯ πŸ”₯ πŸ”₯ ### Features / Enhancements πŸš€ * feat: uv by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3132 ### Security / Dependency Updates πŸ›‘οΈ * chore(deps): bump the production-dependencies group with 4 updates by @dependabot in https://github.com/aws/aws-sdk-pandas/pull/3159 * chore(deps): bump the production-dependencies group with 4 updates by @dependabot in https://github.com/aws/Low6/18/2025
3.12.0## Notable Changes ⚠️ * AWS Lambda Layers: **pyarrow** was upgraded to 20.0.0 ### Features / Enhancements πŸš€ * feat: add pyarrow_additional_kwargs to athena.to_iceberg by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3094 * feat: add dtype argument to delete_from_iceberg by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3099 * feat: add redshift and rds data api query params by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3111 * chore: ray 2.45 by @kukusLow5/29/2025
3.11.0## Notable Changes ⚠️ * AWS SDK for pandas now supports Python 3.13! πŸŽ‰ * Python 3.8 is no longer supported (reached [end-of-life](https://devguide.python.org/versions/) Oct 7 2024) 🚫 * AWS Lambda Layers: **pyarrow** was upgraded to 18.1.0 * AWS Lambda Layers: **numpy** was upgraded to 2.2.1 ### Features / Enhancements πŸš€ * add support for Python 3.13 & deprecate Python 3.8 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3045 * return opensearch aggregation top hits by @kuLow1/10/2025
3.10.1## Bug fixes πŸ› * fix: update references in introduction notebook by @emmanuel-ferdman in https://github.com/aws/aws-sdk-pandas/pull/3009 * fix: read parquet file in chunked mode per row group by @FredericKayser in https://github.com/aws/aws-sdk-pandas/pull/3016 * fix: add missing raise statement in RS Data API by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3025 ## Documentation πŸ“š * chore: Prepare 3.10.1 release by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/30Low12/4/2024
3.10.0## Features * feat: Support numpy 2.0 by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2944 * feat(redshift): Automatically add new DataFrame columns to Redshift tables during write operation by @jack-dell in https://github.com/aws/aws-sdk-pandas/pull/2948 * feat: modify_refresh_interval flag in opensearch index_documents by @AvihaiSam in https://github.com/aws/aws-sdk-pandas/pull/2980 * feat: support postgresql array types by @kukushking in https://github.com/aws/aws-sdk-pLow10/31/2024
3.9.1## Bug fixes πŸ› * bucketing error with newer version of Modin (0.31.0) by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2896 * `athena.read_sql_query` failing for time columns by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2895 * add an argument to control handling nulls in merge criteria by @brendan-cook-87 in https://github.com/aws/aws-sdk-pandas/pull/2892 * address Ray deprecation warnings by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandasLow8/19/2024
3.9.0## Enhancements πŸŽ‰ * Support ORC and CSV in `redshift.copy_from_files` function by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2849 * Support different merge conditions in `athena.to_iceberg` function by @aldder in https://github.com/aws/aws-sdk-pandas/pull/2861 * Manage `NULL` values in `athena.to_iceberg` merge statement by @aldder in https://github.com/aws/aws-sdk-pandas/pull/2872 * Upgrade Ray to 2.30 by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2870 Low7/8/2024
3.8.0## Enhancements πŸŽ‰ * support client-side parameter resolution in athena.create_ctas_table by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2797 * add commit_transaction to postgres.to_sql by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2795 * add columns parameters support by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2814 * add overwrite_method to `postgresql.to_sql` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2820 * add uLow6/5/2024
3.7.3## Bug fixes πŸ› - Iceberg schema evolution fails for map, array and struct types by @LeonLuttenberger in #2755 - trickle down `s3_output` in `athena.to_iceberg` by @jaidisido in #2767 - respect order of columns in `to_iceberg` by @jaidisido in #2768 - add PyArrow `fixed_size_binary` dtype support by @jaidisido in #2775 - Opensearch serverless vector search collections - remove default `_id` by @kukushking in #2784 - missing keys in `list_to_arrow_table` by @kukushking in #2778 - prevent `Low4/22/2024
3.7.2## Features/Enhancements πŸš€ - Add support for DeltaLake's DynamoDB lock mechanism by @LeonLuttenberger in #2705 ## Bug fixes πŸ› - `wr.athena.to_iceberg` - Insert query has mismatched column types #2678 by @GalvFionic in #2715 - allow `s3_output` in `athena.to_iceberg` by @jaidisido in #2727 - replace deprecated `np.split_array` by @jaidisido in #2735 - Athena `to_iceberg` fails with non-lowercase column names by @LeonLuttenberger in #2736 - Support Ray 2.10 by @kukushking in #2741 ##Low3/27/2024
3.7.1## Bug fixes πŸ› * fix breaking change in `_create_table` by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2711 * pin pyarrow to version 8 and above by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2709 ## Documentation πŸ“š * fix `redshift.to_sql` doc indentation error by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2706 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.7.0...3.7.1Low3/7/2024
3.7.0## Breaking changes πŸ’₯ Lake Formation Governed tables are being phased out and we are dropping support (#2692). ## Features/Enhancements πŸš€ * support parquet client encryption (#2642) by @Marwen94 in https://github.com/aws/aws-sdk-pandas/pull/2674 ## Bug fixes πŸ› * Index columns removed on s3.to_parquet by @robert-schmidtke in https://github.com/aws/aws-sdk-pandas/pull/2655 * Missing timezone metadata by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2682 * remove enforced Low3/5/2024
3.6.0## Features/Enhancements πŸš€ * Enable Iceberg row deletion & add `mode` parameter to `to_iceberg` by @LeonLuttenberger in #2632 * Add support for pyarrow type `large_string` by @joakibo in #2663 * Add `max_results` to `athena.list_query_executions` by @LeonLuttenberger in #2665 ## Bug fixes πŸ› * Pyarrow 15 imports & remove unused code by @kukushking in #2649 ## New Contributors * @joakibo made their first contribution in https://github.com/aws/aws-sdk-pandas/pull/2663 **Full ChangelLow2/14/2024
3.5.2## Bug fixes πŸ› * DynamoDB key & filter expressions attribute overwrite by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2615 * Allow PostgreSQL reserved keywords as column names by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2619 * Add `to_iceberg` support for filling missing columns in the DataFrame with None by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2616 * Forward `ignore_nulls` for container types by @raaidarshad in #2636 ## DocLow1/25/2024
3.5.1## Bug fixes πŸ› * Deserialization error when reading from DynamoDB using `KeyConditionExpression` by @LeonLuttenberger in #2607 * Reading of chunked parquet when columns parameter is specified by @rchromik in #2599 ## Documentation πŸ“š * Add `show_create_table` to Athena API page by @MikeSchriefer in #2610 ## Other πŸ€– * chore: Replace `bump2version` with `bump-my-version` by @LeonLuttenberger in #2608 * chore(deps-dev): bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #2609 * chore(dLow1/12/2024
3.5.0## Breaking changes πŸ’₯ Due to [CVEs](https://www.anyscale.com/blog/update-on-ray-cves-cve-2023-6019-cve-2023-6020-cve-2023-6021-cve-2023-48022-cve-2023-48023), Ray is capped to patched version 2.9.x. As a result, the latest version of the library cannot be used on the Glue for Ray runtime. We have raised the CVEs issue to the Glue team ## Features/Enhancements πŸš€ * Add `spark_properties` to athena spark by @rajagurunath in https://github.com/aws/aws-sdk-pandas/pull/2508 * Add `MERGE INTO`Low1/11/2024
3.4.2## Features/Enhancements πŸš€ * Update pyarrow to 14.0.1 to fix [arbitrary code execution security vulnerability](https://github.com/aws/aws-sdk-pandas/security/dependabot/35) **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.4.1...3.4.2Low11/13/2023
3.4.1## Features/Enhancements πŸš€ * feat: Add schema evolution to `athena.to_iceberg` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2465 * feat: Athena - add `client_request_token` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2474 * feat: Redshift data api - allow all auth combinations by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2475 * feat: add columns comments to iceberg by @frenchytheasian in https://github.com/aws/aws-sdk-pandas/pull/2482 Low10/24/2023
3.4.0## Features/Enhancements πŸš€ * Geospatial - parse Athena geospatial types via geopandas by @kukushking in #2346 * Allow group identifiers to be used in `wr.cloudwatch` queries by @LeonLuttenberger in #2430 * Add ignore null store parquet metadata by @raaidarshad in #2450 ## Bug fixes πŸ› * Add missing boto3 session in `athena.to_iceberg` wait_query by @jaidisido in #2428 * Add catalog ID in `athena.to_iceberg` by @jaidisido in #2446 * Return None for missing column and partition key commLow9/11/2023
3.3.0## Features/Enhancements πŸš€ * Support Athena query prepared statements & Athena parameterized queries by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2344 * Add dtype parameter in to_iceberg function by @paulobrunheroto in https://github.com/aws/aws-sdk-pandas/pull/2359 * Add CleanRooms read module by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2366 * Escape and validate table identifiers and literals in PostreSQL by @kukushking in https://github.com/aws/aws-Low8/1/2023
3.2.1## Fixes πŸ› οΈ * Fix error where library could not be imported on Windows due to `No module named 'pyarrow._orc'` by @LeonLuttenberger in #2341 #2337 * Lower `packaging` version requirement by @LeonLuttenberger in #2340 * Allow Ray 2.5 & downgrade tox by @kukushking in #2338 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.2.0...3.2.1Low6/14/2023
3.2.0### Features/Enhancements πŸš€ * Add `s3.read_orc` and `s3.to_orc` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2312 πŸ”₯ * Apache Spark on Amazon Athena - `wr.athena.create_spark_session` & `wr.athena.run_spark_calculation` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2314 πŸš€ * EMR Serverless by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2304 πŸ”₯ * Add `to_sql` for RDS Data API by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/puLow6/13/2023
3.1.1## What's Changed * fix: Add missing `packaging` dependency by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2281 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.1.0...3.1.1Low5/16/2023
3.1.0### Features/Enhancements πŸš€ * Add `neptune.bulk_load` for bulk loading data into Neptune by @LeonLuttenberger in #2238 #2267 * Add `s3.to_deltalake` function by @LeonLuttenberger in #2228 * Add Timestream Batch Load support by @jaidisido in #2214 * Add Iceberg insert by @kukushking in #2233 * Support upsert mode for OracleDB by @LeonLuttenberger in #2265 * Add `chunked` parameter to DynamoDB read functions by @LeonLuttenberger in #2227 * Upgrade Modin to 0.20.1 & allow Ray 2.4 by @kukuLow5/15/2023
3.0.0### Breaking changes πŸ’₯ * Move dependencies to optional by @jaidisido in #1992 πŸ”“ * Dependencies required by the following modules have been moved to optional: redshift, mysql, postgres, sqlserver, oracle, gremlin, sparql, deltalake * The required dependencies can be easily installed with `pip install awswrangler[<MODULE_NAME>]`, for example `pip install awswrangler[redshift]` * Change SQL formatters for Athena and LakeFormation so that they properly format types by @Taragolis and @LeLow4/13/2023
2.20.1## What's Changed * (fix) Timestream - ignore None, NaN, and NaT measure values by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2072 * (docs) Minor - update opensearch api docs by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2085 * Correct documentation for `chunksize=True` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2087 * fix: timestream empty batches by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2098 * enhancement: Add timesLow3/21/2023
3.0.0rc3## What's Changed ### Breaking changes: * breaking change: Move dependencies to optional by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1992 * breaking change: Use ExecuteStatement instead of Scan for DynamoDB read_partiql by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1964 ### Features/Enhancements: * enhancement: Refactor engine switching when Ray is installed by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1792 * logging: Enable user toLow3/9/2023
2.20.0### Breaking changes - `dynamodb.read_partiql` no longer performs a Scan operation under the hood. Instead the `ExecuteStatement` API is used. It means that the `PartiQL*` IAM permission is required instead of `Scan` ### Noteworthy * (feat): opensearch serverless by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1922. See the [tutorial](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/035%20-%20OpenSearch%20Serverless.ipynb) πŸ”₯ * (breaking change): Use `ExecuteStatemeLow3/1/2023
2.19.0## Noteworthy * Glue Data Quality now supported, checkout the [tutorial](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/034%20-%20Glue%20Data%20Quality.ipynb) πŸ”₯ * Delta lake support by @fvaleye * New DynamoDB `read_items` method by @a-slice-of-py ## Features & enhancements * feat: add read_items to dynamodb module by @a-slice-of-py in https://github.com/aws/aws-sdk-pandas/pull/1877 * Add deltalake support in AWS S3 with Pandas by @fvaleye in https://github.com/aws/aws-sdk-paLow1/9/2023
2.18.0## Noteworthy - Pyarrow 10 support πŸ”₯ by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1731 - Lambda layers now available in `af-south-1` (Cape Town) 🌍 by @malachi-constant ## Features & enhancements - Add unload_approach to athena.read_sql_table by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1634 - Pass additional partition projection params to wr.s3.to_parquet & cat… by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1627 - Regenerate poetry.lock wiLow12/2/2022
3.0.0rc2## What's Changed * (enhancement): Enable missing unit tests and Redshift, Athena, LF load tests by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1736 * (enhancement): configure scheduling options, remove dependencies on internal ray impl by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1734 * (testing): Enable Athena and Redshift tests, and address errors by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1721 * (feat): Make tqdm progress reporting optLow11/23/2022
3.0.0rc1## What's Changed * (enhancement): Move RayLogger out of non-distributed modules by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1686 * (perf): Distribute data types inference by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1692 * (docs): Update config tutorial to include new configuration values by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1696 * (fix): partition block overwriting by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1695Low10/27/2022
3.0.0b3## What's Changed * (feat): Add partitioning on block level by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1653 * (refactor): Make room for additional distributed engines by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1646 * (feat): Distribute s3 write text by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1631 * (docs): Add "Introduction to Ray" Tutorial by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1661 * (fix): Return addreLow10/12/2022
3.0.0b2## What's Changed * (feat) Update to Ray 2.0 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1635 * (feat) Ray logging by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1623 * (enhancement): Reduce LOC in S3 write methods create_table by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1626 * (docs) Tutorial: Run SDK for pandas job on ray cluster by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1616 **Full Changelog**: https://githubLow9/30/2022
3.0.0b1## What's Changed * (test) Consolidate unit and load tests by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1525 * (feat) Distribute S3 read text by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1567 * (feat) Distribute s3 wait_objects by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1539 * (test) Ray Load Tests CDK Stack and Instructions for Load Testing by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1583 * (fix) Fix S3 reaLow9/22/2022
2.17.0## New Functionalities - RedshiftDataAPI serverless support πŸ”₯ #1530 - Check out the [tutorial](https://aws-sdk-pandas.readthedocs.io/en/latest/tutorials/030%20-%20Data%20Api.html) - Add `get_query_results` to the Athena module #1496 - Check out the [function documentation](https://aws-sdk-pandas.readthedocs.io/en/latest/stubs/awswrangler.athena.get_query_results.html#awswrangler.athena.get_query_results) - Add `generate_create_query` to the Athena module #1514 - Check out the Low9/20/2022
3.0.0a2This is a pre-release for the Wrangler@Scale project ## What's Changed * (feat): Add directory for Distributed Wrangler Load Tests by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1464 * (CI): Distribute tests in tox config by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1469 * (feat): Distribute s3 delete objects by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1474 * (CI): Enable new CI pipeline for standard & dLow8/17/2022
3.0.0a1This is a pre-release for the Wrangler@Scale project ## What's Changed * (feat): Add distributed config flag and initialise method by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1389 * (feat): Add distributed Lake Formation read by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1397 * (feat): Distribute S3 select over multiple paths and scan ranges by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1445 * (refactor): Refactor threading/Low8/17/2022
2.16.1### Noteworthy > πŸ› Fixed issue introduced by `2.16.0` to method `s3.read_parquet()` ### Patch - Fix bug: pq_file.schema.names(): TypeError: 'list' object is not callable `s3.read_parquet()` #1412 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. [Just upload it and run](https://aws-data-wrangler.readthedocs.io/en/stable/install.html) or [use](https://aws-data-wrangler.readthedocs.io/en/2.16.1/install.html#public-artifacts) them froLow6/28/2022
2.16.0### Noteworthy > ⚠️ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### New Functionalities - Add support for Oracle Database πŸ”₯ #1259 Check out the [tutorial](https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/007%20-%20Redshift%2C%20MLow6/22/2022
2.15.1### Noteworthy > ⚠️ Dropped Python 3.6 support > ⚠️ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### Patch - Add `sparql` extra & make `SPARQLWrapper` dependency optional #1252 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the Low4/11/2022
2.15.0### Noteworthy > ⚠️ Dropped Python 3.6 support > ⚠️ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### New Functionalities - Amazon Neptune module πŸš€ #1084 Check out the [tutorial](https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/Low3/28/2022
2.14.0### Caveats > ⚠️ **For platforms without PyArrow 6 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### New Functionalities - Support Athena Unload πŸš€ #1038 ### Enhancements - Add the `ExcludeColumnSchema=True` argument to the glue.get_partitions call to reduce response siLow1/28/2022
2.13.0### Caveats > ⚠️ **For platforms without PyArrow 6 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### Breaking changes - Fix sanitize methods to align with Glue/Hive naming conventions #579 ### New Functionalities - AWS Lake Formation Governed Tables πŸš€ #570 - Support forLow12/3/2021
2.12.1### Caveats > ⚠️ **For platforms without PyArrow 5 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> ➑️ `pip install pyarrow==2 awswrangler` ### Patch - Removing unnecessary dev dependencies from main #961 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. [Just uploaLow10/18/2021

Dependencies & License Audit

Loading dependencies...

Similar Packages

sagemaker-studioPython library to interact with Amazon SageMaker Unified Studio1.1.13
azure-storage-blobMicrosoft Azure Blob Storage Client Library for Pythonazure-template_0.1.0b6187637
azure-storage-file-shareMicrosoft Azure Azure File Share Storage Client Library for Pythonazure-template_0.1.0b6187637
mirakuruProcess executor (not only) for tests.3.0.2
opentelemetry-instrumentation-qdrantOpenTelemetry Qdrant instrumentation0.60.0