Taken at Hack The North 25

Starting Problem

Imagine you were given a bunch of tables with columns

Table 1

  1. birthdady
  2. first_name
  3. last_name

Table 2

  1. birth_day
  2. full_name

Table 3

  1. date_of_birth
  2. family_name
  3. first_name

How would you know which columns are semantically equivalent? It’s easy with a few, but on a much larger scale, connected problems this like becomes difficult.

Problem

Imagine you were given a bunch of tables with columns

  1. birthday
  2. first_name
  3. last_name

and also

  1. birth_Day
  2. full_name

and also more.

While you know which columns are semantically equivalent? While it’s easy with a few, what happens when they grow out of control.

IRI: internationalized resource identifiers

  • like URI or URL You can often compress these in Qnames.

DBpedia is Wikipedia but in a graph form like we’ve been talking about.

Here are some examples

@prefix dbr:<https://dbpedia.org/resource/>
@prefix rdf:<http://www.w2.org/1999/02/22-rdf-syntax-ns#>
@prefix dbo:<https://dbpedia.org/ontology/>
dbr:Tim_Cook rdf:type dbo:Person

So we have two notes and a directed edge.

“dbr:Tim_Cook” is a “rdf:type” of “dbo:Person”

Example: https://colab.research.google.com/drive/1Tz-o5p4IrwM-_AYtuoUzT7odAWk1rytl?usp=sharing

Section 2: Graph Neural Networks

Classes of Graph ML Problems:

  • Node prediction
  • Edge prediction
  • Graph classification

We will focus on Edge production, you can think of it as “generating a missing edge” or “Creating a edge that makes sense”

Problem? Machine Learning does not lend itself well to graph problems. As they need direction such as text, or matrix pixels. Why? Because of Permutation invariant pooling but that’s outside of the scope.

So we can use vector embeddings to almost make edges. Such as King + Woman - Man gives queen, this is like an edge almost.

In order to continue, we need to classify:

  • Nodes
  • Edges
  • Local and Global topology Graph Embeddings is a new concept.

Problem Statement: Recommend industries for a given company to enter.

These are embeddings we need.

  1. Company Embeddings
  2. Industry Embeddings
  3. Conditions or requirements to say “this is a good fit”

Agentic Graph Rag

  1. RAG
  2. Graph RAG
  3. Agentic Graph RAG (our approach)

To perform Agentic Graph RAG, we propose 4 agents:

  1. Planner Agent (START) - plan the search
  2. Retriever Agent (CONTINUE) - Explore the graph
  3. Critic Agent (STOP) - Know when to stop
  4. Generator Agent - Compose an answer

Example

This is simpler: just Graph RAG, not agentic

  • As well, this example looks for peers for a company

New Concepts

Example of mapping tabular data to graph views Virtualized Knowledge Graphs

RDF4J - sparkle ontop of tabular storage

10th Annual Machine Learning Conference by Bloomberg at Columbia

Sixth normal form

Personal note:

My favorite thing about these presenters is how directed they are about answering questions. I’ll ask something and if they didn’t completely understand, they immediately say “Could you summarize your question?” or “Could you ask that in a different way?” like so directed, to the point, beautiful.