Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add support for google spanner graph #622

Merged
merged 34 commits into from
Jan 22, 2025
Merged

add support for google spanner graph #622

merged 34 commits into from
Jan 22, 2025

Conversation

DataBoyTX
Copy link
Contributor

@DataBoyTX DataBoyTX commented Dec 21, 2024

Add Google Spanner GQL integration and demo notebooks showing how to connect to google spanner graph database, and visualize the results

graphistry/PlotterBase.py Outdated Show resolved Hide resolved
Returns:
PlotterBase: A PlotterBase instance configured with SpannerGraph.
"""
return Plotter().spannergraph(project_id, instance_id, database_id)
Copy link
Contributor

@lmeyerov lmeyerov Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re:interface --

Connection

i wonder if we want something similar to the bolt connectors, where we can do streamlined auth + passing in a client

CONN_CFG = { ... }
g1 = graphistry.spanner_connect(**CONN_CFG)
g2 = g1.spanner_query("...")
native_client = google.spanner.client(...)
g1 = graphistry.spanner_connect(native_client)
g2 = g1.spanner_query("....")

Return-typed query methods

Separately, a pattern I'm finding to be type-friendly with remote-mode gfql has been separating return-shape-typed methods:

g2 = g1.spanner_g("get a graph")
df = g1.spanner_df("get a table"")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, in the neo4j & gremlin cases, when building real examples:

  • hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db
  • stored procedures: does GSQL have some notion of passing inputs
  • reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates

Part of this makes me want to have some bigger examples to motivate helper functions on top. I suspect query_g() and query_df() can be good starts, but we need these examples to make more realistic

(The lack of bulk reads/writes such as via arrow is a bit puzzling, but that can wait?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I added CONF to register

hydration: I found scenarios like we'd get either node IDs or edge IDs and would want to then hydrate them against the db

I plan to take the two additional demo datasets that were shared (transit data, and FinVest) and create notebooks for those next, after that, I will work on the hydration from some other datasets we have

stored procedures: does GSQL have some notion of passing inputs

Yes, I believe so. But need to check syntax https://cloud.google.com/spanner/docs/reference/standard-sql/graph-query-statements

reads vs creates vs writes: any notions here? a good flow would be an example of going through uploading some data, querying it, doing some local enrichment, and then writing back the updates

agreed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found an example for parameterized queries: graph_snippets.py will have to look at adding this in v2

@DataBoyTX DataBoyTX marked this pull request as ready for review January 16, 2025 07:03
@lmeyerov
Copy link
Contributor

@DataBoyTX looks like ci issues?

Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made a quick pass, see comments

  • lint, types, e.g., missing type signatures
  • public interface should be 'fluent' wrt plottable, so PlotterBase::spanner_gsql(self: Plottable, query: str... , using the passed in plottable instead of a fresh graphistry.edges(...)
  • remove stray top-level logging.setLevel
  • move remaining spanner top-level import into method bodies
  • rename to spanner_gql as spanner can be sql too?

Ideally but I get if hard in this pr, we can do a fast-follow next week?

  • where possible, better to do nested types like List[Dict[... or List[Union[Kind1, ...
  • unit tests on the json2df
  • document df cols in pydocs where relevant

probably good Help Wanted item:

  • infer source/dest name from passed in plottable or remote graph, vs current hard-coding

graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/plugins/spannergraph.py Outdated Show resolved Hide resolved
graphistry/pygraphistry.py Show resolved Hide resolved
graphistry/pygraphistry.py Outdated Show resolved Hide resolved
graphistry/pygraphistry.py Outdated Show resolved Hide resolved
@lmeyerov lmeyerov self-requested a review January 16, 2025 07:53
Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see comments wrt ipynb

@DataBoyTX DataBoyTX requested a review from lmeyerov January 17, 2025 02:15
@DataBoyTX
Copy link
Contributor Author

@DataBoyTX looks like ci issues?

@lmeyerov - think everything is passing, can you re-review pls?

graphistry/Plottable.py Outdated Show resolved Hide resolved
graphistry/PlotterBase.py Outdated Show resolved Hide resolved
"id": "814017ae-af4e-4331-b0b0-ed016d010ade",
"metadata": {},
"source": [
"# Demo Notebook - Graphistry and Google Spanner Graph \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intro is quite long...

Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataBoyTX

really close!!!! see comments

Landable with


Follow-up PR

Not blocking, esp useful before marketing push or future dev

  • see other noted nits: thin out pygraphistry/plotterbase calls to just call impls in plugin, cleaner docstrs for users/gpt/etc, ...

  • ipynb

    • shorten intro
    • include pregenerated graphistry viz in output so folks can click-to-watch
    • include output tables in inpyb, and ensure .head() so not full dump
    • various prose seems WIP/TODO notes, like various sentences starting with lowercase letters or parenthicals
    • cta:
      • 10 minutes to, not 10-mins
      • get rid of copyright, if we do that, should be via sphinx theme?
  • tests, esp on format conversions, and sample data to mock that

@DataBoyTX DataBoyTX requested a review from lmeyerov January 22, 2025 19:31
graphistry/PlotterBase.py Outdated Show resolved Hide resolved
Copy link
Contributor

@lmeyerov lmeyerov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DataBoyTX looking good --

  • please fix CI errors, then landable
  • optional nits, and on this pr or a follow up, would recommend saving g.plot() calls so live viz appear in docs

otherwise it's ready

@DataBoyTX DataBoyTX merged commit 85234d1 into master Jan 22, 2025
49 checks passed
@DataBoyTX DataBoyTX deleted the tcook-add-spanner branch January 22, 2025 22:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants