Description
# AWS SDK for pandas (awswrangler) *Pandas on AWS* Easy integration with Athena, Glue, Redshift, Timestream, OpenSearch, Neptune, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).   > An [AWS Professional Service](https://aws.amazon.com/professional-services/) open source initiative | aws-proserve-opensource@amazon.com [](https://pypi.org/project/awswrangler/) [](https://anaconda.org/conda-forge/awswrangler) [](https://pypi.org/project/awswrangler/) [](https://github.com/astral-sh/ruff) [](https://opensource.org/licenses/Apache-2.0) [](http://mypy-lang.org/)  [](https://aws-sdk-pandas.readthedocs.io/?badge=latest) | Source | Downloads | Installation Command | |--------|-----------|----------------------| | **[PyPi](https://pypi.org/project/awswrangler/)** | [](https://pypi.org/project/awswrangler/) | `pip install awswrangler` | | **[Conda](https://anaconda.org/conda-forge/awswrangler)** | [](https://anaconda.org/conda-forge/awswrangler) | `conda install -c conda-forge awswrangler` | > β οΈ **Starting version 3.0, optional modules must be installed explicitly:**<br> β‘οΈ`pip install 'awswrangler[redshift]'` ## Table of contents - [Quick Start](#quick-start) - [At Scale](#at-scale) - [Read The Docs](#read-the-docs) - [Getting Help](#getting-help) - [Logging](#logging) ## Quick Start Installation command: `pip install awswrangler` > β οΈ **Starting version 3.0, optional modules must be installed explicitly:**<br> β‘οΈ`pip install 'awswrangler[redshift]'` ```py3 import awswrangler as wr import pandas as pd from datetime import datetime df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]}) # Storing data on Data Lake wr.s3.to_parquet( df=df, path="s3://bucket/dataset/", dataset=True, database="my_db", table="my_table" ) # Retrieving the data directly from Amazon S3 df = wr.s3.read_parquet("s3://bucket/dataset/", dataset=True) # Retrieving the data from Amazon Athena df = wr.athena.read_sql_query("SELECT * FROM my_table", database="my_db") # Get a Redshift connection from Glue Catalog and retrieving data from Redshift Spectrum con = wr.redshift.connect("my-glue-connection") df = wr.redshift.read_sql_query("SELECT * FROM external_schema.my_table", con=con) con.close() # Amazon Timestream Write df = pd.DataFrame({ "time": [datetime.now(), datetime.now()], "my_dimension": ["foo", "boo"], "measure": [1.0, 1.1], }) rejected_records = wr.timestream.write(df, database="sampleDB", table="sampleTable", time_col="time", measure_col="measure", dimensions_cols=["my_dimension"], ) # Amazon Timestream Query wr.timestream.query(""" SELECT time, measure_value::double, my_dimension FROM "sampleDB"."sampleTable" ORDER BY time DESC LIMIT 3 """) ``` ## At scale AWS SDK for pandas can also run your workflows at scale by leveraging [Modin](https://modin.readthedocs.io/en/stable/) and [Ray](https://www.ray.io/). Both projects aim to speed up data workloads by distributing processing over a cluster of workers. Read our [docs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/scale.html) or head to our latest [tutorials](https://github.com/aws/aws-sdk-pandas/tree/main/tutorials) to learn more. ## [Read The Docs](https://aws-sdk-pandas.readthedocs.io/) - [**What is AWS SDK for pandas?**](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/about.html) - [**Install**](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html) - [PyPi (pip)](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#pypi-pip) - [Conda](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#conda) - [AWS Lambda Layer](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-lambda-layer) - [AWS Glue Python Shell Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-glue-python-shell-jobs) - [AWS Glue PySpark Jobs](https://aws-sdk-pandas.readthedocs.io/en/3.16.0/install.html#aws-glue-pyspark-jobs) - [Amazon SageMa
Release History
| Version | Changes | Urgency | Date |
|---|---|---|---|
| 3.16.0 | Imported from PyPI (3.16.0) | Low | 4/21/2026 |
| 3.15.1 | ### Security / Dependency Updates π‘οΈ * fix: upgrade setuptools due to CVE-2026-23949 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3261 * chore: pyasn1, wheel, filelock security fixes by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3262 * chore: wheel security fix #3262 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3263 * chore: Update dependencies by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3268 ### Housekeeping π§Ή * chore(dep | Low | 2/5/2026 |
| 3.15.0 | ## Notable Changes β οΈ * fix: upgrade aiohttp due to CVE-2025-69223 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3250 * chore: Build Python 3.14 layers by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3251 * chore: Drop Python 3.9 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3257 ### Features / Enhancements π * feat(s3): add to_deltalake_streaming for single-commit Delta writes by @skoschik in https://github.com/aws/aws-sdk-pandas/pull/3231 * f | Low | 1/13/2026 |
| 3.14.0 | ## Notable Changes β οΈ * chore: upgrade pg8000 due to CVE-2025-61385 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3225 ### Features / Enhancements π * feat: support redshift `CLEANPATH` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3211 * feat: add result reuse configuration to query execution functions by @DavidKatz-il in https://github.com/aws/aws-sdk-pandas/pull/3212 ### Bugfixes π * fix: Add `s3_output` parameter to `_start_query_execution` call in " | Low | 10/30/2025 |
| 3.13.0 | ## Notable Changes β οΈ * updated `aiohhtp==3.12.15`to fix CVE-2025-53643 (LOW) by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3197 ### Features / Enhancements π * feat: ray 2.49.0 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3194 * feat: add support for aurora-mysql and aurora-postgresql engines by @senorcinco in https://github.com/aws/aws-sdk-pandas/pull/3188 ### Bugfixes π * fix: opensearch session by @kukushking in https://github.com/aws/aws-sdk-pandas | Low | 9/10/2025 |
| 3.12.1 | ## Notable Changes β οΈ * Moved to [uv package manager](https://github.com/astral-sh/uv) π₯ π₯ π₯ ### Features / Enhancements π * feat: uv by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3132 ### Security / Dependency Updates π‘οΈ * chore(deps): bump the production-dependencies group with 4 updates by @dependabot in https://github.com/aws/aws-sdk-pandas/pull/3159 * chore(deps): bump the production-dependencies group with 4 updates by @dependabot in https://github.com/aws/ | Low | 6/18/2025 |
| 3.12.0 | ## Notable Changes β οΈ * AWS Lambda Layers: **pyarrow** was upgraded to 20.0.0 ### Features / Enhancements π * feat: add pyarrow_additional_kwargs to athena.to_iceberg by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3094 * feat: add dtype argument to delete_from_iceberg by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3099 * feat: add redshift and rds data api query params by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3111 * chore: ray 2.45 by @kukus | Low | 5/29/2025 |
| 3.11.0 | ## Notable Changes β οΈ * AWS SDK for pandas now supports Python 3.13! π * Python 3.8 is no longer supported (reached [end-of-life](https://devguide.python.org/versions/) Oct 7 2024) π« * AWS Lambda Layers: **pyarrow** was upgraded to 18.1.0 * AWS Lambda Layers: **numpy** was upgraded to 2.2.1 ### Features / Enhancements π * add support for Python 3.13 & deprecate Python 3.8 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/3045 * return opensearch aggregation top hits by @ku | Low | 1/10/2025 |
| 3.10.1 | ## Bug fixes π * fix: update references in introduction notebook by @emmanuel-ferdman in https://github.com/aws/aws-sdk-pandas/pull/3009 * fix: read parquet file in chunked mode per row group by @FredericKayser in https://github.com/aws/aws-sdk-pandas/pull/3016 * fix: add missing raise statement in RS Data API by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/3025 ## Documentation π * chore: Prepare 3.10.1 release by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/30 | Low | 12/4/2024 |
| 3.10.0 | ## Features * feat: Support numpy 2.0 by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2944 * feat(redshift): Automatically add new DataFrame columns to Redshift tables during write operation by @jack-dell in https://github.com/aws/aws-sdk-pandas/pull/2948 * feat: modify_refresh_interval flag in opensearch index_documents by @AvihaiSam in https://github.com/aws/aws-sdk-pandas/pull/2980 * feat: support postgresql array types by @kukushking in https://github.com/aws/aws-sdk-p | Low | 10/31/2024 |
| 3.9.1 | ## Bug fixes π * bucketing error with newer version of Modin (0.31.0) by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2896 * `athena.read_sql_query` failing for time columns by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2895 * add an argument to control handling nulls in merge criteria by @brendan-cook-87 in https://github.com/aws/aws-sdk-pandas/pull/2892 * address Ray deprecation warnings by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas | Low | 8/19/2024 |
| 3.9.0 | ## Enhancements π * Support ORC and CSV in `redshift.copy_from_files` function by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2849 * Support different merge conditions in `athena.to_iceberg` function by @aldder in https://github.com/aws/aws-sdk-pandas/pull/2861 * Manage `NULL` values in `athena.to_iceberg` merge statement by @aldder in https://github.com/aws/aws-sdk-pandas/pull/2872 * Upgrade Ray to 2.30 by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2870 | Low | 7/8/2024 |
| 3.8.0 | ## Enhancements π * support client-side parameter resolution in athena.create_ctas_table by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2797 * add commit_transaction to postgres.to_sql by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2795 * add columns parameters support by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2814 * add overwrite_method to `postgresql.to_sql` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2820 * add u | Low | 6/5/2024 |
| 3.7.3 | ## Bug fixes π - Iceberg schema evolution fails for map, array and struct types by @LeonLuttenberger in #2755 - trickle down `s3_output` in `athena.to_iceberg` by @jaidisido in #2767 - respect order of columns in `to_iceberg` by @jaidisido in #2768 - add PyArrow `fixed_size_binary` dtype support by @jaidisido in #2775 - Opensearch serverless vector search collections - remove default `_id` by @kukushking in #2784 - missing keys in `list_to_arrow_table` by @kukushking in #2778 - prevent ` | Low | 4/22/2024 |
| 3.7.2 | ## Features/Enhancements π - Add support for DeltaLake's DynamoDB lock mechanism by @LeonLuttenberger in #2705 ## Bug fixes π - `wr.athena.to_iceberg` - Insert query has mismatched column types #2678 by @GalvFionic in #2715 - allow `s3_output` in `athena.to_iceberg` by @jaidisido in #2727 - replace deprecated `np.split_array` by @jaidisido in #2735 - Athena `to_iceberg` fails with non-lowercase column names by @LeonLuttenberger in #2736 - Support Ray 2.10 by @kukushking in #2741 ## | Low | 3/27/2024 |
| 3.7.1 | ## Bug fixes π * fix breaking change in `_create_table` by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2711 * pin pyarrow to version 8 and above by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2709 ## Documentation π * fix `redshift.to_sql` doc indentation error by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2706 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.7.0...3.7.1 | Low | 3/7/2024 |
| 3.7.0 | ## Breaking changes π₯ Lake Formation Governed tables are being phased out and we are dropping support (#2692). ## Features/Enhancements π * support parquet client encryption (#2642) by @Marwen94 in https://github.com/aws/aws-sdk-pandas/pull/2674 ## Bug fixes π * Index columns removed on s3.to_parquet by @robert-schmidtke in https://github.com/aws/aws-sdk-pandas/pull/2655 * Missing timezone metadata by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2682 * remove enforced | Low | 3/5/2024 |
| 3.6.0 | ## Features/Enhancements π * Enable Iceberg row deletion & add `mode` parameter to `to_iceberg` by @LeonLuttenberger in #2632 * Add support for pyarrow type `large_string` by @joakibo in #2663 * Add `max_results` to `athena.list_query_executions` by @LeonLuttenberger in #2665 ## Bug fixes π * Pyarrow 15 imports & remove unused code by @kukushking in #2649 ## New Contributors * @joakibo made their first contribution in https://github.com/aws/aws-sdk-pandas/pull/2663 **Full Changel | Low | 2/14/2024 |
| 3.5.2 | ## Bug fixes π * DynamoDB key & filter expressions attribute overwrite by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2615 * Allow PostgreSQL reserved keywords as column names by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2619 * Add `to_iceberg` support for filling missing columns in the DataFrame with None by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2616 * Forward `ignore_nulls` for container types by @raaidarshad in #2636 ## Doc | Low | 1/25/2024 |
| 3.5.1 | ## Bug fixes π * Deserialization error when reading from DynamoDB using `KeyConditionExpression` by @LeonLuttenberger in #2607 * Reading of chunked parquet when columns parameter is specified by @rchromik in #2599 ## Documentation π * Add `show_create_table` to Athena API page by @MikeSchriefer in #2610 ## Other π€ * chore: Replace `bump2version` with `bump-my-version` by @LeonLuttenberger in #2608 * chore(deps-dev): bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #2609 * chore(d | Low | 1/12/2024 |
| 3.5.0 | ## Breaking changes π₯ Due to [CVEs](https://www.anyscale.com/blog/update-on-ray-cves-cve-2023-6019-cve-2023-6020-cve-2023-6021-cve-2023-48022-cve-2023-48023), Ray is capped to patched version 2.9.x. As a result, the latest version of the library cannot be used on the Glue for Ray runtime. We have raised the CVEs issue to the Glue team ## Features/Enhancements π * Add `spark_properties` to athena spark by @rajagurunath in https://github.com/aws/aws-sdk-pandas/pull/2508 * Add `MERGE INTO` | Low | 1/11/2024 |
| 3.4.2 | ## Features/Enhancements π * Update pyarrow to 14.0.1 to fix [arbitrary code execution security vulnerability](https://github.com/aws/aws-sdk-pandas/security/dependabot/35) **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.4.1...3.4.2 | Low | 11/13/2023 |
| 3.4.1 | ## Features/Enhancements π * feat: Add schema evolution to `athena.to_iceberg` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2465 * feat: Athena - add `client_request_token` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2474 * feat: Redshift data api - allow all auth combinations by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2475 * feat: add columns comments to iceberg by @frenchytheasian in https://github.com/aws/aws-sdk-pandas/pull/2482 | Low | 10/24/2023 |
| 3.4.0 | ## Features/Enhancements π * Geospatial - parse Athena geospatial types via geopandas by @kukushking in #2346 * Allow group identifiers to be used in `wr.cloudwatch` queries by @LeonLuttenberger in #2430 * Add ignore null store parquet metadata by @raaidarshad in #2450 ## Bug fixes π * Add missing boto3 session in `athena.to_iceberg` wait_query by @jaidisido in #2428 * Add catalog ID in `athena.to_iceberg` by @jaidisido in #2446 * Return None for missing column and partition key comm | Low | 9/11/2023 |
| 3.3.0 | ## Features/Enhancements π * Support Athena query prepared statements & Athena parameterized queries by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2344 * Add dtype parameter in to_iceberg function by @paulobrunheroto in https://github.com/aws/aws-sdk-pandas/pull/2359 * Add CleanRooms read module by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/2366 * Escape and validate table identifiers and literals in PostreSQL by @kukushking in https://github.com/aws/aws- | Low | 8/1/2023 |
| 3.2.1 | ## Fixes π οΈ * Fix error where library could not be imported on Windows due to `No module named 'pyarrow._orc'` by @LeonLuttenberger in #2341 #2337 * Lower `packaging` version requirement by @LeonLuttenberger in #2340 * Allow Ray 2.5 & downgrade tox by @kukushking in #2338 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.2.0...3.2.1 | Low | 6/14/2023 |
| 3.2.0 | ### Features/Enhancements π * Add `s3.read_orc` and `s3.to_orc` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2312 π₯ * Apache Spark on Amazon Athena - `wr.athena.create_spark_session` & `wr.athena.run_spark_calculation` by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2314 π * EMR Serverless by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2304 π₯ * Add `to_sql` for RDS Data API by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pu | Low | 6/13/2023 |
| 3.1.1 | ## What's Changed * fix: Add missing `packaging` dependency by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2281 **Full Changelog**: https://github.com/aws/aws-sdk-pandas/compare/3.1.0...3.1.1 | Low | 5/16/2023 |
| 3.1.0 | ### Features/Enhancements π * Add `neptune.bulk_load` for bulk loading data into Neptune by @LeonLuttenberger in #2238 #2267 * Add `s3.to_deltalake` function by @LeonLuttenberger in #2228 * Add Timestream Batch Load support by @jaidisido in #2214 * Add Iceberg insert by @kukushking in #2233 * Support upsert mode for OracleDB by @LeonLuttenberger in #2265 * Add `chunked` parameter to DynamoDB read functions by @LeonLuttenberger in #2227 * Upgrade Modin to 0.20.1 & allow Ray 2.4 by @kuku | Low | 5/15/2023 |
| 3.0.0 | ### Breaking changes π₯ * Move dependencies to optional by @jaidisido in #1992 π * Dependencies required by the following modules have been moved to optional: redshift, mysql, postgres, sqlserver, oracle, gremlin, sparql, deltalake * The required dependencies can be easily installed with `pip install awswrangler[<MODULE_NAME>]`, for example `pip install awswrangler[redshift]` * Change SQL formatters for Athena and LakeFormation so that they properly format types by @Taragolis and @Le | Low | 4/13/2023 |
| 2.20.1 | ## What's Changed * (fix) Timestream - ignore None, NaN, and NaT measure values by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2072 * (docs) Minor - update opensearch api docs by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2085 * Correct documentation for `chunksize=True` by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/2087 * fix: timestream empty batches by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/2098 * enhancement: Add times | Low | 3/21/2023 |
| 3.0.0rc3 | ## What's Changed ### Breaking changes: * breaking change: Move dependencies to optional by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1992 * breaking change: Use ExecuteStatement instead of Scan for DynamoDB read_partiql by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1964 ### Features/Enhancements: * enhancement: Refactor engine switching when Ray is installed by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1792 * logging: Enable user to | Low | 3/9/2023 |
| 2.20.0 | ### Breaking changes - `dynamodb.read_partiql` no longer performs a Scan operation under the hood. Instead the `ExecuteStatement` API is used. It means that the `PartiQL*` IAM permission is required instead of `Scan` ### Noteworthy * (feat): opensearch serverless by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1922. See the [tutorial](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/035%20-%20OpenSearch%20Serverless.ipynb) π₯ * (breaking change): Use `ExecuteStateme | Low | 3/1/2023 |
| 2.19.0 | ## Noteworthy * Glue Data Quality now supported, checkout the [tutorial](https://github.com/aws/aws-sdk-pandas/blob/main/tutorials/034%20-%20Glue%20Data%20Quality.ipynb) π₯ * Delta lake support by @fvaleye * New DynamoDB `read_items` method by @a-slice-of-py ## Features & enhancements * feat: add read_items to dynamodb module by @a-slice-of-py in https://github.com/aws/aws-sdk-pandas/pull/1877 * Add deltalake support in AWS S3 with Pandas by @fvaleye in https://github.com/aws/aws-sdk-pa | Low | 1/9/2023 |
| 2.18.0 | ## Noteworthy - Pyarrow 10 support π₯ by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1731 - Lambda layers now available in `af-south-1` (Cape Town) π by @malachi-constant ## Features & enhancements - Add unload_approach to athena.read_sql_table by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1634 - Pass additional partition projection params to wr.s3.to_parquet & catβ¦ by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1627 - Regenerate poetry.lock wi | Low | 12/2/2022 |
| 3.0.0rc2 | ## What's Changed * (enhancement): Enable missing unit tests and Redshift, Athena, LF load tests by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1736 * (enhancement): configure scheduling options, remove dependencies on internal ray impl by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1734 * (testing): Enable Athena and Redshift tests, and address errors by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1721 * (feat): Make tqdm progress reporting opt | Low | 11/23/2022 |
| 3.0.0rc1 | ## What's Changed * (enhancement): Move RayLogger out of non-distributed modules by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1686 * (perf): Distribute data types inference by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1692 * (docs): Update config tutorial to include new configuration values by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1696 * (fix): partition block overwriting by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1695 | Low | 10/27/2022 |
| 3.0.0b3 | ## What's Changed * (feat): Add partitioning on block level by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1653 * (refactor): Make room for additional distributed engines by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1646 * (feat): Distribute s3 write text by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1631 * (docs): Add "Introduction to Ray" Tutorial by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1661 * (fix): Return addre | Low | 10/12/2022 |
| 3.0.0b2 | ## What's Changed * (feat) Update to Ray 2.0 by @kukushking in https://github.com/aws/aws-sdk-pandas/pull/1635 * (feat) Ray logging by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1623 * (enhancement): Reduce LOC in S3 write methods create_table by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1626 * (docs) Tutorial: Run SDK for pandas job on ray cluster by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1616 **Full Changelog**: https://github | Low | 9/30/2022 |
| 3.0.0b1 | ## What's Changed * (test) Consolidate unit and load tests by @jaidisido in https://github.com/aws/aws-sdk-pandas/pull/1525 * (feat) Distribute S3 read text by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1567 * (feat) Distribute s3 wait_objects by @LeonLuttenberger in https://github.com/aws/aws-sdk-pandas/pull/1539 * (test) Ray Load Tests CDK Stack and Instructions for Load Testing by @malachi-constant in https://github.com/aws/aws-sdk-pandas/pull/1583 * (fix) Fix S3 rea | Low | 9/22/2022 |
| 2.17.0 | ## New Functionalities - RedshiftDataAPI serverless support π₯ #1530 - Check out the [tutorial](https://aws-sdk-pandas.readthedocs.io/en/latest/tutorials/030%20-%20Data%20Api.html) - Add `get_query_results` to the Athena module #1496 - Check out the [function documentation](https://aws-sdk-pandas.readthedocs.io/en/latest/stubs/awswrangler.athena.get_query_results.html#awswrangler.athena.get_query_results) - Add `generate_create_query` to the Athena module #1514 - Check out the | Low | 9/20/2022 |
| 3.0.0a2 | This is a pre-release for the Wrangler@Scale project ## What's Changed * (feat): Add directory for Distributed Wrangler Load Tests by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1464 * (CI): Distribute tests in tox config by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1469 * (feat): Distribute s3 delete objects by @malachi-constant in https://github.com/awslabs/aws-data-wrangler/pull/1474 * (CI): Enable new CI pipeline for standard & d | Low | 8/17/2022 |
| 3.0.0a1 | This is a pre-release for the Wrangler@Scale project ## What's Changed * (feat): Add distributed config flag and initialise method by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1389 * (feat): Add distributed Lake Formation read by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1397 * (feat): Distribute S3 select over multiple paths and scan ranges by @jaidisido in https://github.com/awslabs/aws-data-wrangler/pull/1445 * (refactor): Refactor threading/ | Low | 8/17/2022 |
| 2.16.1 | ### Noteworthy > π Fixed issue introduced by `2.16.0` to method `s3.read_parquet()` ### Patch - Fix bug: pq_file.schema.names(): TypeError: 'list' object is not callable `s3.read_parquet()` #1412 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. [Just upload it and run](https://aws-data-wrangler.readthedocs.io/en/stable/install.html) or [use](https://aws-data-wrangler.readthedocs.io/en/2.16.1/install.html#public-artifacts) them fro | Low | 6/28/2022 |
| 2.16.0 | ### Noteworthy > β οΈ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### New Functionalities - Add support for Oracle Database π₯ #1259 Check out the [tutorial](https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/007%20-%20Redshift%2C%20M | Low | 6/22/2022 |
| 2.15.1 | ### Noteworthy > β οΈ Dropped Python 3.6 support > β οΈ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### Patch - Add `sparql` extra & make `SPARQLWrapper` dependency optional #1252 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the | Low | 4/11/2022 |
| 2.15.0 | ### Noteworthy > β οΈ Dropped Python 3.6 support > β οΈ **For platforms without PyArrow 7 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### New Functionalities - Amazon Neptune module π #1084 Check out the [tutorial](https://aws-data-wrangler.readthedocs.io/en/latest/tutorials/ | Low | 3/28/2022 |
| 2.14.0 | ### Caveats > β οΈ **For platforms without PyArrow 6 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### New Functionalities - Support Athena Unload π #1038 ### Enhancements - Add the `ExcludeColumnSchema=True` argument to the glue.get_partitions call to reduce response si | Low | 1/28/2022 |
| 2.13.0 | ### Caveats > β οΈ **For platforms without PyArrow 6 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### Breaking changes - Fix sanitize methods to align with Glue/Hive naming conventions #579 ### New Functionalities - AWS Lake Formation Governed Tables π #570 - Support for | Low | 12/3/2021 |
| 2.12.1 | ### Caveats > β οΈ **For platforms without PyArrow 5 support (e.g. MWAA, [EMR](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#emr-cluster), [Glue PySpark Job](https://aws-data-wrangler.readthedocs.io/en/stable/install.html#aws-glue-pyspark-jobs)):**<br> β‘οΈ `pip install pyarrow==2 awswrangler` ### Patch - Removing unnecessary dev dependencies from main #961 --- ***P.S.*** The AWS Lambda Layer file (.zip) and the AWS Glue file (.whl) are available below. [Just uploa | Low | 10/18/2021 |
