Skip to content

Postgres

Source for PostgresSQL database.

URI

The supported URI format is:

scheme://[user[:password]@][netloc][:port][/dbname][?param1=value1&...]

For example:

postgresql://user:password@localhost:5432/postgres_database

Several schemes are supported, and others like sqlalchemy's are supported for compatability.

Valid schemes:

postgresql://
postgres://
postgres+psycopg2://
postgresql+psycopg://

In the case of sqlalchemy, the part on the right e.g. psycopg2 is ignored.

Types supported

Primitive datatypes

Postgres type Supported Native type Arrow Notes
BOOL bool DataType::Boolean
INT2 16 DataType::Int16
INT4 i32 DataType::Int32
INT8 i64 DataType::Int64
FLOAT4 f32 DataType::Float32
FLOAT8 f64 DataType::Float64
CHAR String DataType::Utf8
BPCHAR String DataType::Utf8
TEXT String DataType::Utf8
VARCHAR String DataType::Utf8
UUID uuid::Uuid DataType::FixedSizeBinary(16)
BYTEA &[u8] DataType::Binary max size is i32::MAX bytes
NUMERIC bigdecimal::BigDecimal DataType::Decimal128

Time datatypes

Postgres type Supported Native type Arrow Notes
DATE chrono::NaiveDate DataType::Date32 32 bit
TIME chrono::NaiveDateTime DataType::Time64(TimeUnit::Microsecond) precision is microseconds
TIMESTAMP chrono::NaiveDateTime DataType::Timestamp<TimeUnit::Microsecond> precision is microseconds
TIMESTAMPTZ chrono::DateTime<Utc> DataType::Timestamp

Geo-spatial datatypes

Native GEOSPATIAL types

Postgres type Supported Native type Arrow Notes
POINT geo::Point list<double> List with two elements representing a point (x, y)
CIRCLE conecta::postgres::Circle list<double> List with three elements representing the center (x, y) and r and radius. (x, y, r)
LINE conecta::postgres::Line list<double> List with three elements (a, b, c) from ax + by + c = 0 linear equation
BOX conecta::postgres::Box list<double> List with four elements, (x1, y1, x2, y2) where (x1, y1) and (x2, y2) are any two opposite corners of the box
LSEG conecta::postgres::Lseg list<double> List with four elements, (x1, y1, x2, y2) where (x1,y1) and (x2,y2) are the end points of the line segment.
PATH conecta::postgres::Path list<double> List with minimum of two elements, (o, c, x1, y1, x2, y2...) where o is whether the path is open or not, c is the total count of points and the rest are points components.
POLYGON conecta::postgres::Polygon list<double> List with points, (x1, y1, x2, y2... xn, yn)

PostGis

Postgres type Supported Native type Arrow Notes
Geometry conecta::postgres::PostgisBinary binary If ST_AsEWKT or ST_AsText is used, text will be returned.
from conecta import read_sql
table = read_sql(
    "postgres://postgres:postgres@localhost:32789/test" ,
    query="""
    SELECT 
        ST_GeomFromText( 'POLYGON((0 0,0 1,1 1,1 0,0 0))', 4326 ) as geo1,
        ST_AsEWKB( ST_GeomFromText('POLYGON((0 0,0 1,1 1,1 0,0 0))', 4326) ) as geo2,
        ST_AsBinary( ST_GeomFromText('POLYGON((0 0,0 1,1 1,1 0,0 0))', 4326) ) as geo3,
        ST_AsText( ST_GeomFromText( 'POLYGON((0 0,0 1,1 1,1 0,0 0))', 4326 ) ) as geo4
    """,
)

print(table)
# pyarrow.Table
# geo1: binary
# geo2: binary
# geo3: binary
# st_astext: string
# ----
# geo1: [[0103000020E61000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F00 (... 94 chars omitted)]]
# geo2: [[0103000020E61000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F00 (... 94 chars omitted)]]
# geo3: [[01030000000100000005000000000000000000000000000000000000000000000000000000000000000000F03F0000000000 (... 86 chars omitted)]]
# geo4: [["POLYGON((0 0,0 1,1 1,1 0,0 0))"]]

# pip install geoarrow-pyarrow
from geoarrow.pyarrow import as_geoarrow, as_wkt, to_geopandas

for column in table:
    print(as_wkt(as_geoarrow(column)))
# [
#   [
#     "POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))"
#   ]
# ]
# [
#   [
#     "POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))"
#   ]
# ]
# [
#   [
#     "POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))"
#   ]
# ]
# [
#   [
#     "POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))"
#   ]
# ]

for column in table:
    print(to_geopandas(as_geoarrow(column)))

# 0    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
# dtype: geometry
# 0    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
# dtype: geometry
# 0    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
# dtype: geometry
# 0    POLYGON ((0 0, 0 1, 1 1, 1 0, 0 0))
# dtype: geometry

Array datatypes

Postgres type Supported Native type Arrow Notes
INT2_ARRAY Vec<i16> DataType::List<i16>
INT4_ARRAY Vec<i32> DataType::List<i32>
INT8_ARRAY Vec<i64> DataType::List<i64>
FLOAT4_ARRAY Vec<f32> DataType::List<f32>
FLOAT8_ARRAY Vec<f64> DataType::List<f64>
TEXT_ARRAY Vec<String> DataType::List
UUID_ARRAY Vec<uuid::Uuid> DataType::List
BOOL_ARRAY Vec<bool> DataType::List
Array[DATE] Vec<NaiveDate> DataType::List
Array[TIMESTAMP] Vec<NaiveDateTime> DataType::List
Array[TIMESTAMPTZ] Vec<DateTime<Utc> DataType::List
Array[NUMERIC] Vec<BigDecimal> DataType::List
BYTEA_ARRAY Vec<Option<&[u8]>> DataType::List<Binary>

Example

from conecta import read_sql

t = read_sql(
    conn="postgres://postgres:postgres@localhost:5432",
    queries=['select * from lineitem'],
    partition_on='l_orderkey',
    partition_num=4,
)