Conecta fits very well as a optimizable substitute of connectox
and can
easily be used in Polars.
Using conecta in polars.
You have several ways of using conecta
in Polars:
- Using
conecta.read_sql
directly and creating a Polars dataframe.
import polars as pl
import conecta as cn
table = cn.read_sql(
conn="postgres://postgres:postgres@localhost:5432",
queries=['select * from lineitem'],
partition_on='l_orderkey',
partition_num=4,
)
df = pl.DataFrame(table)
# or
from polars import from_arrow
df = from_arrow(table)
- Patching your current polars code.
conecta.ext.patch_polars
monkey patches polars.read_database_uri
to use conecta
instead
of connectorx
, this action is reversible with conecta.ext.unpatch_polars
.
import polars as pl
from conecta.ext import patch_polars
# This will run polars as usual, and will fail because connectorx
# does not implement `array_uuid`
df = pl.read_database_uri(
'select uuid_array from datatypes',
uri='postgres://postgres:[email protected]:5400')
# pyo3_runtime.PanicException: not implemented: _uuid
patch_polars() # Everything after this line will use `conecta.read_sql` instead.
# This will work!
df = pl.read_database_uri(
'select uuid_array from datatypes',
uri='postgres://postgres:[email protected]:5400')
# ┌─────────────────────────────────┐
# │ uuid_array │
# │ --- │
# │ list[binary] │
# ╞═════════════════════════════════╡
# │ [b"\x9b\xd0\xa0\xcf)\xbeK\xea\… │
# └─────────────────────────────────┘
If at some point you want to un-do the patching, you can call unpatch_polars
If you only want some methods to be patched, you can use it as a decorator:
import polars as pl
from conecta.ext import patch_polars
@patch_polars
def will_succeed():
df = pl.read_database_uri(
'select uuid_array from datatypes',
uri='postgres://postgres:[email protected]:5400')
# After will_succeed() exists, polars is unpatched.
def will_fail():
df = pl.read_database_uri(
'select uuid_array from datatypes',
uri='postgres://postgres:[email protected]:5400')
will_succeed()
will_fail()
Recommendations
The general recommendation is to use conecta whenever you can and optimize it for
your use case, you can achieve this by just calling patch_polars
at the beginning of
your code. If in some parts of your code you need to load from a database not yet supported
by conecta, you can either use unpatch_polars
before it or use patch_polars
decorator
where you want to use conecta.