using query output for gephi social network analysis
Posted: 11 Jul 2021 19:47
In case any of you are interested in social network analysis, I thought I'd drop my query for Father relationships to set up in Gephi. (I hope to one day figure out the query setup for listing all people connections so I don't have to do Father, Mother, X separately.)
SNA in Gephi in general
You can use any program to run social network analysis. I prefer the free, cross-platform Gephi. Gephi SNA works by taking a list of IDs ("nodes table") and connecting them as a Source and Target in an "edges table".
I mostly use gephi to analyze/visualize my triangulation data from gedmatch (using Modularity Class, Eigenvector Centrality, & Girvan-Newman Clustering statistics, and the ForceAtlas2 viz), but I use it occasionally for my family history database, too, when I want to see how each person is related to other people (or places, or whatever).
FH queries to set up Gephi data
The nodes table is simple; it requires two columns with specific headings, and you can have as many other columns with characteristics/attributes as you want after them:
For the edges table, a minimum of four columns are required: Source, Target, Type, and Weight. Type can be either "directed" or "undirected", and Weight is a measure of importance. You can have other columns; in the case of my FH columns, I use Individual, Father/Mother (for the name of the Source and Target, depending on which relationship table it is).
The Father Edges table has:
Explanation of Weight: I opted to give more weight to my direct ancestors. You could choose many different things (e.g., flags) for this, but it does need to be a number for Gephi, so I forced the boolean IsAncestorOf() function into a number with NumberIf(). You don't have to choose 2 and 1 (though avoid using 0 for any because Gephi will ignore any edges with a weight of 0); for my triangulation data I use the centimorgans value (decimal) for the weight.
You will need to do some post-processing in a sheet editing program to add in an "Unknown" ID in the nodes table and fill in all the blank "Father" edges with that in the edges table, if you want them to show up in your network. Doing so results in an anchored ForceAtlas2 viz, because every line will eventually end in an unknown: By default, the whole graph is grey, but I have the Modularity Class statistic set up to color eight groups by default, and then I change the appearance by partitioning on the Modularity Class calculation.
Here's a different viz (Fruchterman Reingold), with the unknown father rows removed, zoomed in and with the labels turned on:
In any case, Family Historian's query builder is awesome for exporting data for analysis. w00t!!
SNA in Gephi in general
You can use any program to run social network analysis. I prefer the free, cross-platform Gephi. Gephi SNA works by taking a list of IDs ("nodes table") and connecting them as a Source and Target in an "edges table".
- If the network is me and you, the nodes table will have two rows (me and you) possibly with characteristics for each of us, and the edges table will have one row (me connected to you) possibly with information about our connection.
I mostly use gephi to analyze/visualize my triangulation data from gedmatch (using Modularity Class, Eigenvector Centrality, & Girvan-Newman Clustering statistics, and the ForceAtlas2 viz), but I use it occasionally for my family history database, too, when I want to see how each person is related to other people (or places, or whatever).
FH queries to set up Gephi data
The nodes table is simple; it requires two columns with specific headings, and you can have as many other columns with characteristics/attributes as you want after them:
- Id: =RecordId()
- Label: %INDI.NAME%
For the edges table, a minimum of four columns are required: Source, Target, Type, and Weight. Type can be either "directed" or "undirected", and Weight is a measure of importance. You can have other columns; in the case of my FH columns, I use Individual, Father/Mother (for the name of the Source and Target, depending on which relationship table it is).
The Father Edges table has:
- Source: =RecordId()
- Target: =RecordId(%INDI.~FATH>%)
- Type: =ForceText("Directed")
- Weight: =NumberIf(IsAncestorOf(%INDI%,FileRoot()),2,1)
Explanation of Weight: I opted to give more weight to my direct ancestors. You could choose many different things (e.g., flags) for this, but it does need to be a number for Gephi, so I forced the boolean IsAncestorOf() function into a number with NumberIf(). You don't have to choose 2 and 1 (though avoid using 0 for any because Gephi will ignore any edges with a weight of 0); for my triangulation data I use the centimorgans value (decimal) for the weight.
You will need to do some post-processing in a sheet editing program to add in an "Unknown" ID in the nodes table and fill in all the blank "Father" edges with that in the edges table, if you want them to show up in your network. Doing so results in an anchored ForceAtlas2 viz, because every line will eventually end in an unknown: By default, the whole graph is grey, but I have the Modularity Class statistic set up to color eight groups by default, and then I change the appearance by partitioning on the Modularity Class calculation.
Here's a different viz (Fruchterman Reingold), with the unknown father rows removed, zoomed in and with the labels turned on:
In any case, Family Historian's query builder is awesome for exporting data for analysis. w00t!!