Talk Python To Me - Python Conversations For Passionate Developers

#503: The PyArrow Revolution

Informações:

Synopsis

Pandas is at a the core of virtually all data science done in Python, that is virtually all data science. Since it's beginning, Pandas has been based upon numpy. But changes are afoot to update those internals and you can now optionally use PyArrow. PyArrow comes with a ton of benefits including it's columnar format which makes answering analytical questions faster, support for a range of high performance file formats, inter-machine data streaming, faster file IO and more. Reuven Lerner is here to give us the low-down on the PyArrow revolution. Episode sponsors NordLayer Auth0 Talk Python Courses Links from the show Reuven: github.com/reuven Apache Arrow: github.com Parquet: parquet.apache.org Feather format: arrow.apache.org Python Workout Book: manning.com Pandas Workout Book: manning.com Pandas: pandas.pydata.org PyArrow CSV docs: arrow.apache.org Future string inference in Pandas: pandas.pydata.org Pandas NA/nullable dtypes: pandas.pydata.org Pandas `.iloc` indexing: pandas.pydata.org DuckDB: duckdb.or