graphxplore.Basis.BaseGraph package

A base graph in graphxplore represents a relational dataset as a graph structure which can be stored in a Neo4J database. This enables efficient data retrieval and forms the basis of all data exploration tasks. The BaseGraph object is the graph structure that is created by GraphTranslator. A BaseNode represents a unique value of a variable. A node x for a primary key value has an outgoing BaseEdge to another node y if the values of x and y appear in the same row of the relational data table. As all variable/value combinations are unique within the graph, two primary key values (representing their respective CSV rows) x1 and x2 with the same value for one variable will both have an outgoing edge to the same node y. As a result, lookups by value (select statements in SQL) can be done very efficiently. Foreign key relations are also stored this way, enabling efficient lookup across tables without tedious join statements.

Module contents

class graphxplore.Basis.BaseGraph.BaseEdge(source: int, target: int, edge_type: BaseEdgeType)[source]

Bases: object

This class is the parent of almost all other types of edges. It resembles a directed edge point from a source node to a target node.

Parameters:
  • source (int) – The ID of the source BaseNode

  • target (int) – The ID of the source BaseNode

  • edge_type (BaseEdgeType) – The type of base edge

static check_csv_row(row: Dict[str, str]) None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:

row (Dict[str, str]) – The CSV row to check

Return type:

None

data_for_cypher_write_query() Tuple[str, Dict[str, Any]][source]

Returns edge type and empty parameter dictionary for a Cypher MERGE statement to insert the edge into a Neo4J database.

Returns:

Returns the data for the Cypher statement as a pair of edge type and empty parameter dictionary

Return type:

Tuple[str, Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) BaseEdge[source]

Parses an edge from a CSV row.

Parameters:

row (Dict[str, str]) – The CSV row as a dictionary

Returns:

Return the parsed objects

Return type:

BaseEdge

static get_csv_header() List[str][source]

Generates the header for a CSV storing the edges.

Returns:

Returns the generated header

Return type:

List[str]

to_csv_row() List[str | float | int][source]

Converts the object to a csv row as list.

Returns:

Returns the list

Return type:

List[str | float | int]

class graphxplore.Basis.BaseGraph.BaseEdgeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

The type of BaseEdge.

  • UNASSIGNED: invalid, has to be reset later

  • HAS_ATTR_VAL: points from a primary key node to an attribute node contained in its relational table row

  • CONNECTED_TO: points from a foreign key node to the primary key node in the same relational table row

  • ASSIGNED_BIN: points from an attribute node of a metric variable to its assigned attribute bin node

ASSIGNED_BIN = 'ASSIGNED_BIN'
CONNECTED_TO = 'CONNECTED_TO'
HAS_ATTR_VAL = 'HAS_ATTR_VAL'
UNASSIGNED = 'UNASSIGNED'
class graphxplore.Basis.BaseGraph.BaseGraph(nodes: List[BaseNode] | None = None, edges: List[BaseEdge] | None = None)[source]

Bases: Graph

This is the graph holding BaseNode and BaseEdge objects. It forms the basis of all further data science procedures.

Parameters:
  • nodes (List[BaseNode] | None) – The list of nodes

  • edges (List[BaseEdge] | None) – The list of edges

class graphxplore.Basis.BaseGraph.BaseLabels(membership_labels: Tuple[str, ...], node_type: BaseNodeType)[source]

Bases: object

The labels assigned to a BaseNode.

Parameters:
  • membership_labels (Tuple[str, ...]) – One or more labels describing the membership of the node into categories. The origin table should always be one label

  • node_type (BaseNodeType) – The type of node

static from_label_string(label_string: str) BaseLabels[source]

Generate a BaseLabels object from a label string. The single values should be seperated by semicolons and the BaseNodeType label should appear last. Raises an exception if parsing failed.

Parameters:

label_string (str) – The input string from which the object is parsed

Returns:

Returns the parsed object

Return type:

BaseLabels

to_label_string() str[source]

Converts the object to a string. The individual labels are concatenated by semicolons, the BaseNodeType appears last.

Returns:

Returns the converted string

Return type:

str

class graphxplore.Basis.BaseGraph.BaseNode(node_id: int, labels: BaseLabels, name: str, val: str | int | float, desc: str | None = None, bin_info: BinBoundInfo | None = None)[source]

Bases: object

The base node class from which most other node classes inherit. It contains the name of a column and the cell value, and additionally a description, labels and binning info (if the node is of type ‘AttributeBin’).

Parameters:
  • node_id (int) – The ID of the node, used for various lookups.

  • labels (BaseLabels) – The labels of the node’s origin table and categories

  • name (str) – The column name

  • val (str | int | float) – The cell value

  • desc (str | None) – The description of the data column

  • bin_info (BinBoundInfo | None) – The lower and upper bound used for binning

static check_csv_row(row: Dict[str, str]) None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:

row (Dict[str, str]) – The CSV row to check

Return type:

None

data_for_cypher_write_query() Tuple[List[str], Dict[str, Any]][source]

Returns labels and parameter dictionary for a Cypher MERGE statement to insert the node into a Neo4J database.

Returns:

Returns the data for the Cypher statement as a pair of label list and parameter dictionary

Return type:

Tuple[List[str], Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) BaseNode[source]

Parses a node from a CSV row.

Parameters:

row (Dict[str, str]) – The CSV row as a dictionary

Returns:

Return the parsed objects

Return type:

BaseNode

static get_csv_header(data_type: NodeDataType) List[str][source]

Generates the header for a CSV storing the nodes

Parameters:

data_type (NodeDataType) – The data type of the nodes value

Returns:

Returns the generated header

Return type:

List[str]

to_csv_row() List[str | float | int][source]

Converts the object to a csv row as list.

Returns:

Returns the list

Return type:

List[str | float | int]

class graphxplore.Basis.BaseGraph.BaseNodeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

The type of BaseNode

Attribute = 'Attribute'
AttributeBin = 'AttributeBin'
Key = 'Key'
class graphxplore.Basis.BaseGraph.BinBoundInfo(ref_lower: float, ref_upper: float)[source]

Bases: object

The lower and upper bound for a ‘normal’ value. Values above ref_upper are considered ‘high’, below ref_lower as ‘low’.

Parameters:
  • ref_lower (float) – The lower bound

  • ref_upper (float) – The upper bound

ref_lower: float
ref_upper: float
class graphxplore.Basis.BaseGraph.NodeDataType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

The datatype of the value parameter of a BaseNode.

Bin = 'Bin'
Decimal = 'Decimal'
Integer = 'Integer'
String = 'String'