graphxplore.Basis.AttributeAssociationGraph package

Attribute association graphs capture statistical traits of attributes (unique variable values) within groups of primary keys as nodes, and the conditional dependencies between attributes as edges. These graphs can later be explored visually in Neo4J without the need for coding/scripting skills. Statistical traits will be encoded by color, size and arrow thickness. AttributeAssociationGraph objects are created by AttributeAssociationGraphGenerator. An AttributeAssociationNode object inherits from and represents a BaseNode. It captures absolute count, missing rate and prevalence of its attribute within each defined group. Additionally, it compares the prevalence between groups by difference and ratio. positive and negative groups can be defined and colors will encode the association of edge node with these groups in the visualization. AttributeAssociationEdge objects inherit from BaseEdge capture the conditional relationship between the attributes of their source and target node. They contain the absolute co-occurrence, conditional probability of the target attribute given the source attribute, and the impact of the added condition to the prevalence of the target node’s attribute.

Module contents

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdge(source: int, target: int, groups: List[str], edge_type: AttributeAssociationEdgeType = AttributeAssociationEdgeType.UNASSIGNED, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, co_occurrence: Dict[str, int] | None = None, conditional_prevalence: Dict[str, float] | None = None, conditional_increase: Dict[str, float] | None = None, increase_ratio: Dict[str, float] | None = None)[source]

Bases: BaseEdge

This class describes the conditional relationship between the attributes of two AttributeAssociationNode objects. It contains statistical parameters for the absolute co-occurrence and the conditional prevalence of the target attribute given the source attribute. Additionally, the difference and ratio of the conditional prevalence and the prevalence of the target node are contained. This way, the influence of the added condition of the source attribute is expressed. These statistical measurements are stored for one or multiple groups of primary keys. Based on the maximum difference and ratio, the edge is assigned a type reflecting the degree of the conditional relationship.

Parameters:

source (int) – The ID of the source AttributeAssociationNode
target (int) – The ID of the target AttributeAssociationNode
edge_type (AttributeAssociationEdgeType) – The type of edge describing the degree of conditional implication
positive_group (str | None) – The name of the positive group (e.g. the disease cohort) or None
negative_group (str | None) – The name of the negative group (e.g. the control cohort) or None
group_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None
co_occurrence (Dict[str, int] | None) – The absolute count of group members having both the source and target attribute. Specified for each group. Will be initialized with 0 for each group if None
conditional_prevalence (Dict[str, float] | None) – The co-occurrence divided by absolute count of the source attribute, resulting in the conditional prevalence of the target attribute given the source attribute. Specified for each group. Will be initialized with 0.0 for each group if None
conditional_increase (Dict[str, float] | None) – The conditional prevalence minus the prevalence of the target node. Specified for each group. Might be negative. Will be initialized with 0.0 for each group if None
increase_ratio (Dict[str, float] | None) – The conditional prevalence divided by the prevalence of the target node. Specified for each group. Might be smaller than 1. Will be initialized with 0.0 for each group if None
groups (List[str])

static check_csv_row(row: Dict[str, str]) → None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:: row (Dict[str, str]) – The CSV row to check
Return type:: None

data_for_cypher_write_query() → Tuple[str, Dict[str, Any]][source]

Returns edge type and parameter dictionary for a Cypher MERGE statement to insert the edge into a Neo4J database.

Returns:: Returns the data for the Cypher statement as a pair of edge type and empty parameter dictionary
Return type:: Tuple[str, Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) → AttributeAssociationEdge[source]

Parses an edge from a CSV row.

Parameters:: row (Dict[str, str]) – The CSV row as a dictionary
Returns:: Return the parsed objects
Return type:: AttributeAssociationEdge

static get_csv_header() → List[str][source]

Generates the header for a CSV storing the edges.

Returns:: Returns the generated header
Return type:: List[str]

to_csv_row() → List[str | float | int][source]

Converts the object to a csv row as list.

Returns:: Returns the list
Return type:: List[str | float | int]

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdgeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

The type of edge, specifying the degree of conditional relationship between the source and target node.

HIGH_RELATION = 'HIGH_RELATION'

LOW_RELATION = 'LOW_RELATION'

MEDIUM_RELATION = 'MEDIUM_RELATION'

UNASSIGNED = 'UNASSIGNED'

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationGraph(nodes: List[AttributeAssociationNode] | None = None, edges: List[AttributeAssociationEdge] | None = None)[source]

Bases: Graph

This is the graph holding AttributeAssociationNode and AttributeAssociationEdge objects. It captures statistical measurements about the occurrence of attributes within one or multiple groups of primary keys, as well as the conditional relations between attributes within these groups.

Parameters:

nodes (List[AttributeAssociationNode] | None) – The list of nodes
edges (List[AttributeAssociationEdge] | None) – The list of edges

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationLabels(membership_labels: Tuple[str, ...], node_type: BaseNodeType, frequency_label: FrequencyLabel | None = None, distinction_label: DistinctionLabel | None = None)[source]

Bases: BaseLabels

These labels describe AttributeAssociationNode objects and inherit from BaseLabels.

Parameters:

membership_labels (Tuple[str, ...]) – One or more labels describing the membership of the node into categories. The origin table should always be one label
node_type (BaseNodeType) – The type of node
frequency_label (FrequencyLabel | None) – Describes how frequent the attribute appears in one or at least of multiple groups of primary keys
distinction_label (DistinctionLabel | None) – Describes the difference and quotient in frequencies between primary key groups

static from_label_list(label_list: List[str]) → AttributeAssociationLabels[source]

Generate a AttributeAssociationLabels object from a list of strings. The single values should be seperated by semicolons and the BaseNodeType label should appear last. Raises an exception if parsing failed.

Parameters:: label_list (List[str]) – The input list from which the object is parsed
Returns:: Returns the parsed object
Return type:: AttributeAssociationLabels

static from_label_string(label_string: str) → AttributeAssociationLabels[source]

Generate a AttributeAssociationLabels object from a label string. The single values should be seperated by semicolons and the BaseNodeType label should appear last. Raises an exception if parsing failed.

Parameters:: label_string (str) – The input string from which the object is parsed
Returns:: Returns the parsed object
Return type:: AttributeAssociationLabels

to_label_string() → str[source]

Converts the object to a string. The individual labels are concatenated by semicolons, the FrequencyLabel appears last.

Returns:: Returns the converted string
Return type:: str

class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationNode(node_id: int, labels: AttributeAssociationLabels, name: str, val: str | int | float, groups: List[str], desc: str = 'NaN', bin_info: BinBoundInfo | None = None, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, count: Dict[str, int] | None = None, missing: Dict[str, float] | None = None, prevalence: Dict[str, float] | None = None, prevalence_difference: float = nan, prevalence_ratio: float = nan)[source]

Bases: BaseNode

This class contains the information of a (and inherits from) BaseNode of type BaseNodeType.Attribute or BaseNodeType.AttributeBin. In addition, it captures several statistical traits of the node’s attribute within one or multiple groups of primary keys: Its absolute count, its prevalence, and the ratio of group members with a missing value for the variable name. Moreover, if multiple groups are defined, the absolute difference and ratio of prevalence is calculated. If positive_group and negative_group are specified, the difference and ratio between their prevalence values is calculated. Else, between the maximum and minimum prevalence

Parameters:

node_id (int) – The internal Neo4J ID of the BaseNode. Used for identity checks. As a result, nodes can only be compared if originating from the same BaseGraph
labels (AttributeAssociationLabels) – The labels of the BaseNode and potentially a FrequencyLabel and DistinctionLabel
name (str) – The name of the BaseNode
val (str | int | float) – The value of the BaseNode
groups (List[str]) – The name of the groups
desc (str) – The description of the BaseNode
bin_info (BinBoundInfo | None) – The binning info of the BaseNode
positive_group (str | None) – The name of the positive group (e.g. the disease cohort) or None
negative_group (str | None) – The name of the negative group (e.g. the control cohort) or None
group_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None
count (Dict[str, int] | None) – The absolute counts of group members having this attribute. Will be initialized with 0 for each group if None
missing (Dict[str, float] | None) – The ratio of group members with a missing value for variable name. Will be initialized with 0.0 for each group if None
prevalence (Dict[str, float] | None) – The count divided by the number group members not having a missing value for the variable name. Will be initialized with 0.0 for each group if None
prevalence_difference (float) – The absolute difference between the prevalence of the positive_group and negative_group if defined, or between the maximum and minimum prevalence. Defaults to NaN
prevalence_ratio (float) – The larger divided by the smaller prevalence of the positive_group and negative_group if defined, or quotient between the maximum and minimum prevalence. Defaults to Nan

static check_csv_row(row: Dict[str, str]) → None[source]

Checks if all required fields are present in the CSV row and have the correct data type.

Parameters:: row (Dict[str, str]) – The CSV row to check
Return type:: None

data_for_cypher_write_query() → Tuple[List[str], Dict[str, Any]][source]

Returns labels and parameter dictionary for a Cypher MERGE statement to insert the node into a Neo4J database.

Returns:: Returns the data for the Cypher statement as a pair of label list and parameter dictionary
Return type:: Tuple[List[str], Dict[str, Any]]

static from_csv_row(row: Dict[str, str]) → AttributeAssociationNode[source]

Parses a node from a CSV row.

Parameters:: row (Dict[str, str]) – The CSV row as a dictionary
Returns:: Return the parsed objects
Return type:: AttributeAssociationNode

static get_csv_header(data_type: NodeDataType) → List[str][source]

Generates the header for a CSV storing the nodes

Parameters:: data_type (NodeDataType) – The data type of the nodes value
Returns:: Returns the generated header
Return type:: List[str]

to_csv_row() → List[str | float | int][source]

Converts the object to a csv row as list.

Returns:: Returns the list
Return type:: List[str | float | int]

class graphxplore.Basis.AttributeAssociationGraph.DistinctionLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Describes how much the relative attribute shares differ between the groups (if multiple groups exist).

HighlyInverse = 'HighlyInverse'

HighlyRelated = 'HighlyRelated'

Inverse = 'Inverse'

Related = 'Related'

Unrelated = 'Unrelated'

class graphxplore.Basis.AttributeAssociationGraph.FrequencyLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: str, Enum

Describes how frequent the property associated with a AttributeAssociationNode appears in one or at least of multiple groups of primary keys.

Frequent = 'Frequent'

HighlyFrequent = 'HighlyFrequent'

Infrequent = 'Infrequent'