graphxplore.Basis.AttributeAssociationGraph package
Attribute association graphs capture statistical traits of attributes (unique variable values) within groups of
primary keys as nodes, and the conditional dependencies between attributes as edges. These graphs can later be explored
visually in Neo4J without the need for coding/scripting skills. Statistical traits will be encoded by color, size and
arrow thickness.
AttributeAssociationGraph objects are created by
AttributeAssociationGraphGenerator.
An AttributeAssociationNode object inherits from and represents a
BaseNode. It captures absolute count, missing rate and prevalence of its
attribute within each defined group. Additionally, it compares the prevalence between groups by difference and ratio.
positive and negative groups can be defined and colors will encode the association of edge node with these groups
in the visualization. AttributeAssociationEdge objects inherit from
BaseEdge capture the conditional relationship between the attributes of their
source and target node. They contain the absolute co-occurrence, conditional
probability of the target attribute given the source attribute, and the impact of the added condition to the prevalence
of the target node’s attribute.
Module contents
- class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdge(source: int, target: int, groups: List[str], edge_type: AttributeAssociationEdgeType = AttributeAssociationEdgeType.UNASSIGNED, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, co_occurrence: Dict[str, int] | None = None, conditional_prevalence: Dict[str, float] | None = None, conditional_increase: Dict[str, float] | None = None, increase_ratio: Dict[str, float] | None = None)[source]
Bases:
BaseEdgeThis class describes the conditional relationship between the attributes of two
AttributeAssociationNodeobjects. It contains statistical parameters for the absolute co-occurrence and the conditional prevalence of the target attribute given the source attribute. Additionally, the difference and ratio of the conditional prevalence and the prevalence of the target node are contained. This way, the influence of the added condition of the source attribute is expressed. These statistical measurements are stored for one or multiple groups of primary keys. Based on the maximum difference and ratio, the edge is assigned a type reflecting the degree of the conditional relationship.- Parameters:
source (int) – The ID of the source
AttributeAssociationNodetarget (int) – The ID of the target
AttributeAssociationNodeedge_type (AttributeAssociationEdgeType) – The type of edge describing the degree of conditional implication
positive_group (str | None) – The name of the positive group (e.g. the disease cohort) or
Nonenegative_group (str | None) – The name of the negative group (e.g. the control cohort) or
Nonegroup_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None
co_occurrence (Dict[str, int] | None) – The absolute count of group members having both the source and target attribute. Specified for each group. Will be initialized with 0 for each group if None
conditional_prevalence (Dict[str, float] | None) – The co-occurrence divided by absolute count of the source attribute, resulting in the conditional prevalence of the target attribute given the source attribute. Specified for each group. Will be initialized with 0.0 for each group if None
conditional_increase (Dict[str, float] | None) – The conditional prevalence minus the prevalence of the target node. Specified for each group. Might be negative. Will be initialized with 0.0 for each group if None
increase_ratio (Dict[str, float] | None) – The conditional prevalence divided by the prevalence of the target node. Specified for each group. Might be smaller than 1. Will be initialized with 0.0 for each group if None
groups (List[str])
- static check_csv_row(row: Dict[str, str]) None[source]
Checks if all required fields are present in the CSV row and have the correct data type.
- Parameters:
row (Dict[str, str]) – The CSV row to check
- Return type:
None
- data_for_cypher_write_query() Tuple[str, Dict[str, Any]][source]
Returns edge type and parameter dictionary for a Cypher MERGE statement to insert the edge into a Neo4J database.
- Returns:
Returns the data for the Cypher statement as a pair of edge type and empty parameter dictionary
- Return type:
Tuple[str, Dict[str, Any]]
- static from_csv_row(row: Dict[str, str]) AttributeAssociationEdge[source]
Parses an edge from a CSV row.
- Parameters:
row (Dict[str, str]) – The CSV row as a dictionary
- Returns:
Return the parsed objects
- Return type:
- class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationEdgeType(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,EnumThe type of edge, specifying the degree of conditional relationship between the source and target node.
- HIGH_RELATION = 'HIGH_RELATION'
- LOW_RELATION = 'LOW_RELATION'
- MEDIUM_RELATION = 'MEDIUM_RELATION'
- UNASSIGNED = 'UNASSIGNED'
- class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationGraph(nodes: List[AttributeAssociationNode] | None = None, edges: List[AttributeAssociationEdge] | None = None)[source]
Bases:
GraphThis is the graph holding
AttributeAssociationNodeandAttributeAssociationEdgeobjects. It captures statistical measurements about the occurrence of attributes within one or multiple groups of primary keys, as well as the conditional relations between attributes within these groups.- Parameters:
nodes (List[AttributeAssociationNode] | None) – The list of nodes
edges (List[AttributeAssociationEdge] | None) – The list of edges
- class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationLabels(membership_labels: Tuple[str, ...], node_type: BaseNodeType, frequency_label: FrequencyLabel | None = None, distinction_label: DistinctionLabel | None = None)[source]
Bases:
BaseLabelsThese labels describe
AttributeAssociationNodeobjects and inherit fromBaseLabels.- Parameters:
membership_labels (Tuple[str, ...]) – One or more labels describing the membership of the node into categories. The origin table should always be one label
node_type (BaseNodeType) – The type of node
frequency_label (FrequencyLabel | None) – Describes how frequent the attribute appears in one or at least of multiple groups of primary keys
distinction_label (DistinctionLabel | None) – Describes the difference and quotient in frequencies between primary key groups
- static from_label_list(label_list: List[str]) AttributeAssociationLabels[source]
Generate a
AttributeAssociationLabelsobject from a list of strings. The single values should be seperated by semicolons and theBaseNodeTypelabel should appear last. Raises an exception if parsing failed.- Parameters:
label_list (List[str]) – The input list from which the object is parsed
- Returns:
Returns the parsed object
- Return type:
- static from_label_string(label_string: str) AttributeAssociationLabels[source]
Generate a
AttributeAssociationLabelsobject from a label string. The single values should be seperated by semicolons and theBaseNodeTypelabel should appear last. Raises an exception if parsing failed.- Parameters:
label_string (str) – The input string from which the object is parsed
- Returns:
Returns the parsed object
- Return type:
- to_label_string() str[source]
Converts the object to a string. The individual labels are concatenated by semicolons, the
FrequencyLabelappears last.- Returns:
Returns the converted string
- Return type:
str
- class graphxplore.Basis.AttributeAssociationGraph.AttributeAssociationNode(node_id: int, labels: AttributeAssociationLabels, name: str, val: str | int | float, groups: List[str], desc: str = 'NaN', bin_info: BinBoundInfo | None = None, positive_group: str | None = None, negative_group: str | None = None, group_size: Dict[str, int] | None = None, count: Dict[str, int] | None = None, missing: Dict[str, float] | None = None, prevalence: Dict[str, float] | None = None, prevalence_difference: float = nan, prevalence_ratio: float = nan)[source]
Bases:
BaseNodeThis class contains the information of a (and inherits from)
BaseNodeof typeBaseNodeType.AttributeorBaseNodeType.AttributeBin. In addition, it captures several statistical traits of the node’s attribute within one or multiple groups of primary keys: Its absolute count, its prevalence, and the ratio of group members with a missing value for the variablename. Moreover, if multiple groups are defined, the absolute difference and ratio of prevalence is calculated. Ifpositive_groupandnegative_groupare specified, the difference and ratio between their prevalence values is calculated. Else, between the maximum and minimum prevalence- Parameters:
node_id (int) – The internal Neo4J ID of the
BaseNode. Used for identity checks. As a result, nodes can only be compared if originating from the sameBaseGraphlabels (AttributeAssociationLabels) – The labels of the
BaseNodeand potentially aFrequencyLabelandDistinctionLabelname (str) – The name of the
BaseNodeval (str | int | float) – The value of the
BaseNodegroups (List[str]) – The name of the groups
desc (str) – The description of the
BaseNodebin_info (BinBoundInfo | None) – The binning info of the
BaseNodepositive_group (str | None) – The name of the positive group (e.g. the disease cohort) or
Nonenegative_group (str | None) – The name of the negative group (e.g. the control cohort) or
Nonegroup_size (Dict[str, int] | None) – The number of group members. Will be initialized with 0 for each group if None
count (Dict[str, int] | None) – The absolute counts of group members having this attribute. Will be initialized with 0 for each group if None
missing (Dict[str, float] | None) – The ratio of group members with a missing value for variable
name. Will be initialized with 0.0 for each group if Noneprevalence (Dict[str, float] | None) – The count divided by the number group members not having a missing value for the variable name. Will be initialized with 0.0 for each group if None
prevalence_difference (float) – The absolute difference between the prevalence of the
positive_groupandnegative_groupif defined, or between the maximum and minimum prevalence. Defaults to NaNprevalence_ratio (float) – The larger divided by the smaller prevalence of the
positive_groupandnegative_groupif defined, or quotient between the maximum and minimum prevalence. Defaults to Nan
- static check_csv_row(row: Dict[str, str]) None[source]
Checks if all required fields are present in the CSV row and have the correct data type.
- Parameters:
row (Dict[str, str]) – The CSV row to check
- Return type:
None
- data_for_cypher_write_query() Tuple[List[str], Dict[str, Any]][source]
Returns labels and parameter dictionary for a Cypher MERGE statement to insert the node into a Neo4J database.
- Returns:
Returns the data for the Cypher statement as a pair of label list and parameter dictionary
- Return type:
Tuple[List[str], Dict[str, Any]]
- static from_csv_row(row: Dict[str, str]) AttributeAssociationNode[source]
Parses a node from a CSV row.
- Parameters:
row (Dict[str, str]) – The CSV row as a dictionary
- Returns:
Return the parsed objects
- Return type:
- static get_csv_header(data_type: NodeDataType) List[str][source]
Generates the header for a CSV storing the nodes
- Parameters:
data_type (NodeDataType) – The data type of the nodes value
- Returns:
Returns the generated header
- Return type:
List[str]
- class graphxplore.Basis.AttributeAssociationGraph.DistinctionLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,EnumDescribes how much the relative attribute shares differ between the groups (if multiple groups exist).
- HighlyInverse = 'HighlyInverse'
- HighlyRelated = 'HighlyRelated'
- Inverse = 'Inverse'
- Related = 'Related'
- class graphxplore.Basis.AttributeAssociationGraph.FrequencyLabel(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)[source]
Bases:
str,EnumDescribes how frequent the property associated with a
AttributeAssociationNodeappears in one or at least of multiple groups of primary keys.- Frequent = 'Frequent'
- HighlyFrequent = 'HighlyFrequent'
- Infrequent = 'Infrequent'