Taken at Hack The North 25
Starting Problem
Imagine you were given a bunch of tables with columns
Table 1
- birthdady
- first_name
- last_name
Table 2
- birth_day
- full_name
Table 3
- date_of_birth
- family_name
- first_name
How would you know which columns are semantically equivalent? It’s easy with a few, but on a much larger scale, connected problems this like becomes difficult.
Problem
Imagine you were given a bunch of tables with columns
- birthday
- first_name
- last_name
and also
- birth_Day
- full_name
and also more.
While you know which columns are semantically equivalent? While it’s easy with a few, what happens when they grow out of control.
IRI: internationalized resource identifiers
- like URI or URL You can often compress these in Qnames.
DBpedia is Wikipedia but in a graph form like we’ve been talking about.
Here are some examples
@prefix dbr:<https://dbpedia.org/resource/>
@prefix rdf:<http://www.w2.org/1999/02/22-rdf-syntax-ns#>
@prefix dbo:<https://dbpedia.org/ontology/>
dbr:Tim_Cook rdf:type dbo:Person
So we have two notes and a directed edge.
“dbr:Tim_Cook” is a “rdf:type” of “dbo:Person”
Example: https://colab.research.google.com/drive/1Tz-o5p4IrwM-_AYtuoUzT7odAWk1rytl?usp=sharing
Section 2: Graph Neural Networks
Classes of Graph ML Problems:
- Node prediction
- Edge prediction
- Graph classification
We will focus on Edge production, you can think of it as “generating a missing edge” or “Creating a edge that makes sense”
Problem? Machine Learning does not lend itself well to graph problems. As they need direction such as text, or matrix pixels. Why? Because of Permutation invariant pooling but that’s outside of the scope.
So we can use vector embeddings to almost make edges. Such as King + Woman - Man gives queen, this is like an edge almost.
In order to continue, we need to classify:
- Nodes
- Edges
- Local and Global topology → Graph Embeddings is a new concept.
Problem Statement: Recommend industries for a given company to enter.
These are embeddings we need.
- Company Embeddings
- Industry Embeddings
- Conditions or requirements to say “this is a good fit”
Agentic Graph Rag
- RAG
- Graph RAG
- Agentic Graph RAG (our approach)
To perform Agentic Graph RAG, we propose 4 agents:
- Planner Agent (START) - plan the search
- Retriever Agent (CONTINUE) - Explore the graph
- Critic Agent (STOP) - Know when to stop
- Generator Agent - Compose an answer
Example
This is simpler: just Graph RAG, not agentic
- As well, this example looks for peers for a company
New Concepts
Example of mapping tabular data to graph views Virtualized Knowledge Graphs
RDF4J - sparkle ontop of tabular storage
10th Annual Machine Learning Conference by Bloomberg at Columbia
Sixth normal form
Personal note:
My favorite thing about these presenters is how directed they are about answering questions. I’ll ask something and if they didn’t completely understand, they immediately say “Could you summarize your question?” or “Could you ask that in a different way?” like so directed, to the point, beautiful.