graphxplore.Basis.AttributeAssociationGraph package

Attribute association graphs capture statistical traits of attributes (unique variable values) within groups of primary keys as nodes, and the conditional dependencies between attributes as edges. These graphs can later be explored visually in Neo4J without the need for coding/scripting skills. Statistical traits will be encoded by color, size and arrow thickness. AttributeAssociationGraph objects are created by AttributeAssociationGraphGenerator. An AttributeAssociationNode object inherits from and represents a BaseNode. It captures absolute count, missing rate and prevalence of its attribute within each defined group. Additionally, it compares the prevalence between groups by difference and ratio. positive and negative groups can be defined and colors will encode the association of edge node with these groups in the visualization. AttributeAssociationEdge objects inherit from BaseEdge capture the conditional relationship between the attributes of their source and target node. They contain the absolute co-occurrence, conditional probability of the target attribute given the source attribute, and the impact of the added condition to the prevalence of the target node’s attribute.

Module contents

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdge(source: int, target: int, groups: List[str], edge_type: AttributeAssociationEdgeType = AttributeAssociationEdgeType.UNASSIGNED, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, co_occurrence: Dict[str, int] | None = None, conditional_prevalence: Dict[str, float] | None = None, conditional_increase: Dict[str, float] | None = None, increase_ratio: Dict[str, float] | None = None)[source]

Bases: BaseEdge

This class describes the conditional relationship between the attributes of two AttributeAssociationNode objects. It contains statistical parameters for the absolute co-occurrence and the conditional prevalence of the target attribute given the source attribute. Additionally, the difference and ratio of the conditional prevalence and the prevalence of the target node are contained. This way, the influence of the added condition of the source attribute is expressed. These statistical measurements are stored for one or multiple groups of primary keys. Based on the maximum difference and ratio, the edge is assigned a type reflecting the degree of the conditional relationship.

Parameters:
  • source (int) – The ID of the source AttributeAssociationNode

  • target (int) – The ID of the target AttributeAssociationNode

  • edge_type (AttributeAssociationEdgeType) – The type of edge describing the degree of conditional implication

  • positive_group (str | None) – The name of the positive group (e.g. the disease cohort) or None

  • negative_group (str | None) – The name of the negative group (e.g. the control cohort) or None

  • group_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None

  • co_occurrence (Dict[str, int] | None) – The absolute count of group members having both the source and target attribute. Specified for each group. Will be initialized with 0 for each group if None

  • conditional_prevalence (Dict[str, float] | None) – The co-occurrence divided by absolute count of the source attribute, resulting in the conditional prevalence of the target attribute given the source attribute. Specified for each group. Will be initialized with 0.0 for each group if None

  • conditional_increase (Dict[str, float] | None) – The conditional prevalence minus the prevalence of the target node. Specified for each group. Might be negative. Will be initialized with 0.0 for each group if None

  • increase_ratio (Dict[str, float] | None) – The conditional prevalence divided by the prevalence of the target node. Specified for each group. Might be smaller than 1. Will be initialized with 0.0 for each group if None

  • groups (List[str])

static check_csv_row(row: Dict[str, str]) None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:

row (Dict[str, str]) – The CSV row to check

Return type:

None

data_for_cypher_write_query() Tuple[str, Dict[str, Any]][source]

Returns edge type and parameter dictionary for a Cypher MERGE statement to insert the edge into a Neo4J database.

Returns:

Returns the data for the Cypher statement as a pair of edge type and empty parameter dictionary

Return type:

Tuple[str, Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) AttributeAssociationEdge[source]

Parses an edge from a CSV row.

Parameters:

row (Dict[str, str]) – The CSV row as a dictionary

Returns:

Return the parsed objects

Return type:

AttributeAssociationEdge

static get_csv_header() List[str][source]

Generates the header for a CSV storing the edges.

Returns:

Returns the generated header

Return type:

List[str]

to_csv_row() List[str | float | int][source]

Converts the object to a csv row as list.

Returns:

Returns the list

Return type:

List[str | float | int]

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdgeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

The type of edge, specifying the degree of conditional relationship between the source and target node.

HIGH_RELATION = 'HIGH_RELATION'
LOW_RELATION = 'LOW_RELATION'
MEDIUM_RELATION = 'MEDIUM_RELATION'
UNASSIGNED = 'UNASSIGNED'
class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationGraph(nodes: List[AttributeAssociationNode] | None = None, edges: List[AttributeAssociationEdge] | None = None)[source]

Bases: Graph

This is the graph holding AttributeAssociationNode and AttributeAssociationEdge objects. It captures statistical measurements about the occurrence of attributes within one or multiple groups of primary keys, as well as the conditional relations between attributes within these groups.

Parameters:
class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationLabels(membership_labels: Tuple[str, ...], node_type: BaseNodeType, frequency_label: FrequencyLabel | None = None, distinction_label: DistinctionLabel | None = None)[source]

Bases: BaseLabels

These labels describe AttributeAssociationNode objects and inherit from BaseLabels.

Parameters:
  • membership_labels (Tuple[str, ...]) – One or more labels describing the membership of the node into categories. The origin table should always be one label

  • node_type (BaseNodeType) – The type of node

  • frequency_label (FrequencyLabel | None) – Describes how frequent the attribute appears in one or at least of multiple groups of primary keys

  • distinction_label (DistinctionLabel | None) – Describes the difference and quotient in frequencies between primary key groups

static from_label_list(label_list: List[str]) AttributeAssociationLabels[source]

Generate a AttributeAssociationLabels object from a list of strings. The single values should be seperated by semicolons and the BaseNodeType label should appear last. Raises an exception if parsing failed.

Parameters:

label_list (List[str]) – The input list from which the object is parsed

Returns:

Returns the parsed object

Return type:

AttributeAssociationLabels

static from_label_string(label_string: str) AttributeAssociationLabels[source]

Generate a AttributeAssociationLabels object from a label string. The single values should be seperated by semicolons and the BaseNodeType label should appear last. Raises an exception if parsing failed.

Parameters:

label_string (str) – The input string from which the object is parsed

Returns:

Returns the parsed object

Return type:

AttributeAssociationLabels

to_label_string() str[source]

Converts the object to a string. The individual labels are concatenated by semicolons, the FrequencyLabel appears last.

Returns:

Returns the converted string

Return type:

str

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationNode(node_id: int, labels: AttributeAssociationLabels, name: str, val: str | int | float, groups: List[str], desc: str = 'NaN', bin_info: BinBoundInfo | None = None, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, count: Dict[str, int] | None = None, missing: Dict[str, float] | None = None, prevalence: Dict[str, float] | None = None, prevalence_difference: float = nan, prevalence_ratio: float = nan)[source]

Bases: BaseNode

This class contains the information of a (and inherits from) BaseNode of type BaseNodeType.Attribute or BaseNodeType.AttributeBin. In addition, it captures several statistical traits of the node’s attribute within one or multiple groups of primary keys: Its absolute count, its prevalence, and the ratio of group members with a missing value for the variable name. Moreover, if multiple groups are defined, the absolute difference and ratio of prevalence is calculated. If positive_group and negative_group are specified, the difference and ratio between their prevalence values is calculated. Else, between the maximum and minimum prevalence

Parameters:
  • node_id (int) – The internal Neo4J ID of the BaseNode. Used for identity checks. As a result, nodes can only be compared if originating from the same BaseGraph

  • labels (AttributeAssociationLabels) – The labels of the BaseNode and potentially a FrequencyLabel and DistinctionLabel

  • name (str) – The name of the BaseNode

  • val (str | int | float) – The value of the BaseNode

  • groups (List[str]) – The name of the groups

  • desc (str) – The description of the BaseNode

  • bin_info (BinBoundInfo | None) – The binning info of the BaseNode

  • positive_group (str | None) – The name of the positive group (e.g. the disease cohort) or None

  • negative_group (str | None) – The name of the negative group (e.g. the control cohort) or None

  • group_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None

  • count (Dict[str, int] | None) – The absolute counts of group members having this attribute. Will be initialized with 0 for each group if None

  • missing (Dict[str, float] | None) – The ratio of group members with a missing value for variable name. Will be initialized with 0.0 for each group if None

  • prevalence (Dict[str, float] | None) – The count divided by the number group members not having a missing value for the variable name. Will be initialized with 0.0 for each group if None

  • prevalence_difference (float) – The absolute difference between the prevalence of the positive_group and negative_group if defined, or between the maximum and minimum prevalence. Defaults to NaN

  • prevalence_ratio (float) – The larger divided by the smaller prevalence of the positive_group and negative_group if defined, or quotient between the maximum and minimum prevalence. Defaults to Nan

static check_csv_row(row: Dict[str, str]) None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:

row (Dict[str, str]) – The CSV row to check

Return type:

None

data_for_cypher_write_query() Tuple[List[str], Dict[str, Any]][source]

Returns labels and parameter dictionary for a Cypher MERGE statement to insert the node into a Neo4J database.

Returns:

Returns the data for the Cypher statement as a pair of label list and parameter dictionary

Return type:

Tuple[List[str], Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) AttributeAssociationNode[source]

Parses a node from a CSV row.

Parameters:

row (Dict[str, str]) – The CSV row as a dictionary

Returns:

Return the parsed objects

Return type:

AttributeAssociationNode

static get_csv_header(data_type: NodeDataType) List[str][source]

Generates the header for a CSV storing the nodes

Parameters:

data_type (NodeDataType) – The data type of the nodes value

Returns:

Returns the generated header

Return type:

List[str]

to_csv_row() List[str | float | int][source]

Converts the object to a csv row as list.

Returns:

Returns the list

Return type:

List[str | float | int]

class graphxplore.Basis.AttributeAssociationGraph.DistinctionLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Describes how much the relative attribute shares differ between the groups (if multiple groups exist).

HighlyInverse = 'HighlyInverse'
HighlyRelated = 'HighlyRelated'
Inverse = 'Inverse'
Related = 'Related'
Unrelated = 'Unrelated'
class graphxplore.Basis.AttributeAssociationGraph.FrequencyLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Describes how frequent the property associated with a AttributeAssociationNode appears in one or at least of multiple groups of primary keys.

Frequent = 'Frequent'
HighlyFrequent = 'HighlyFrequent'
Infrequent = 'Infrequent'