graphxplore.Basis.BaseGraph package
A base graph in graphxplore represents a relational dataset as a graph structure which can be stored in a Neo4J
database. This enables efficient data retrieval and forms the basis of all data exploration tasks.
The BaseGraph object is the graph structure that is created by
GraphTranslator. A BaseNode represents a
unique value of a variable. A node x for a primary key value has an outgoing
BaseEdge to another node y if the values of x and y appear in the same row
of the relational data table. As all variable/value combinations are unique within the graph, two primary key values
(representing their respective CSV rows) x1 and x2 with the same value for one variable will both have an outgoing
edge to the same node y. As a result, lookups by value (select statements in SQL) can be done very efficiently.
Foreign key relations are also stored this way, enabling efficient lookup across tables without tedious join statements.
Module contents
- class graphxplore.Basis.BaseGraph.BaseEdge(source: int, target: int, edge_type: BaseEdgeType)[source]
Bases:
objectThis class is the parent of almost all other types of edges. It resembles a directed edge point from a source node to a target node.
- Parameters:
source (int) – The ID of the source
BaseNodetarget (int) – The ID of the source
BaseNodeedge_type (BaseEdgeType) – The type of base edge
- static check_csv_row(row: Dict[str, str]) None[source]
Checks if all required fields are present in the CSV row and have the correct data type.
- Parameters:
row (Dict[str, str]) – The CSV row to check
- Return type:
None
- data_for_cypher_write_query() Tuple[str, Dict[str, Any]][source]
Returns edge type and empty parameter dictionary for a Cypher MERGE statement to insert the edge into a Neo4J database.
- Returns:
Returns the data for the Cypher statement as a pair of edge type and empty parameter dictionary
- Return type:
Tuple[str, Dict[str, Any]]
- static from_csv_row(row: Dict[str, str]) BaseEdge[source]
Parses an edge from a CSV row.
- Parameters:
row (Dict[str, str]) – The CSV row as a dictionary
- Returns:
Return the parsed objects
- Return type:
- class graphxplore.Basis.BaseGraph.BaseEdgeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,EnumThe type of
BaseEdge.UNASSIGNED: invalid, has to be reset later
HAS_ATTR_VAL: points from a primary key node to an attribute node contained in its relational table row
CONNECTED_TO: points from a foreign key node to the primary key node in the same relational table row
ASSIGNED_BIN: points from an attribute node of a metric variable to its assigned attribute bin node
- ASSIGNED_BIN = 'ASSIGNED_BIN'
- CONNECTED_TO = 'CONNECTED_TO'
- HAS_ATTR_VAL = 'HAS_ATTR_VAL'
- UNASSIGNED = 'UNASSIGNED'
- class graphxplore.Basis.BaseGraph.BaseGraph(nodes: List[BaseNode] | None = None, edges: List[BaseEdge] | None = None)[source]
Bases:
GraphThis is the graph holding
BaseNodeandBaseEdgeobjects. It forms the basis of all further data science procedures.
- class graphxplore.Basis.BaseGraph.BaseLabels(membership_labels: Tuple[str, ...], node_type: BaseNodeType)[source]
Bases:
objectThe labels assigned to a
BaseNode.- Parameters:
membership_labels (Tuple[str, ...]) – One or more labels describing the membership of the node into categories. The origin table should always be one label
node_type (BaseNodeType) – The type of node
- static from_label_string(label_string: str) BaseLabels[source]
Generate a
BaseLabelsobject from a label string. The single values should be seperated by semicolons and theBaseNodeTypelabel should appear last. Raises an exception if parsing failed.- Parameters:
label_string (str) – The input string from which the object is parsed
- Returns:
Returns the parsed object
- Return type:
- to_label_string() str[source]
Converts the object to a string. The individual labels are concatenated by semicolons, the
BaseNodeTypeappears last.- Returns:
Returns the converted string
- Return type:
str
- class graphxplore.Basis.BaseGraph.BaseNode(node_id: int, labels: BaseLabels, name: str, val: str | int | float, desc: str | None = None, bin_info: BinBoundInfo | None = None)[source]
Bases:
objectThe base node class from which most other node classes inherit. It contains the name of a column and the cell value, and additionally a description, labels and binning info (if the node is of type ‘AttributeBin’).
- Parameters:
node_id (int) – The ID of the node, used for various lookups.
labels (BaseLabels) – The labels of the node’s origin table and categories
name (str) – The column name
val (str | int | float) – The cell value
desc (str | None) – The description of the data column
bin_info (BinBoundInfo | None) – The lower and upper bound used for binning
- static check_csv_row(row: Dict[str, str]) None[source]
Checks if all required fields are present in the CSV row and have the correct data type.
- Parameters:
row (Dict[str, str]) – The CSV row to check
- Return type:
None
- data_for_cypher_write_query() Tuple[List[str], Dict[str, Any]][source]
Returns labels and parameter dictionary for a Cypher MERGE statement to insert the node into a Neo4J database.
- Returns:
Returns the data for the Cypher statement as a pair of label list and parameter dictionary
- Return type:
Tuple[List[str], Dict[str, Any]]
- static from_csv_row(row: Dict[str, str]) BaseNode[source]
Parses a node from a CSV row.
- Parameters:
row (Dict[str, str]) – The CSV row as a dictionary
- Returns:
Return the parsed objects
- Return type:
- static get_csv_header(data_type: NodeDataType) List[str][source]
Generates the header for a CSV storing the nodes
- Parameters:
data_type (NodeDataType) – The data type of the nodes value
- Returns:
Returns the generated header
- Return type:
List[str]
- class graphxplore.Basis.BaseGraph.BaseNodeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,EnumThe type of
BaseNode- Attribute = 'Attribute'
- AttributeBin = 'AttributeBin'
- Key = 'Key'
- class graphxplore.Basis.BaseGraph.BinBoundInfo(ref_lower: float, ref_upper: float)[source]
Bases:
objectThe lower and upper bound for a ‘normal’ value. Values above ref_upper are considered ‘high’, below ref_lower as ‘low’.
- Parameters:
ref_lower (float) – The lower bound
ref_upper (float) – The upper bound
- ref_lower: float
- ref_upper: float