Skip to content

jelli.utils.data_io

escape(name)

Escape special characters in a name for hashing.

Parameters:

Name Type Description Default
name str

The name to be escaped.

required

Returns:

Type Description
str

The escaped name.

Source code in jelli/utils/data_io.py
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def escape(name: str) -> str:
    '''
    Escape special characters in a name for hashing.

    Parameters
    ----------
    name : str
        The name to be escaped.

    Returns
    -------
    str
        The escaped name.
    '''
    return name.replace('\\', '\\\\').replace('|', '\\|')

get_json_schema(json_data)

Extract the schema name and version from the JSON data.

Parameters:

Name Type Description Default
json_data dict

The JSON data containing the schema information.

required

Returns:

Type Description
tuple

A tuple containing the schema name and version. If not found, returns (None, None).

Source code in jelli/utils/data_io.py
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
def get_json_schema(json_data):
    '''
    Extract the schema name and version from the JSON data.

    Parameters
    ----------
    json_data : dict
        The JSON data containing the schema information.

    Returns
    -------
    tuple
        A tuple containing the schema name and version. If not found, returns `(None, None)`.
    '''
    schema_name = None
    schema_version = None
    if '$schema' in json_data:
        schema = json_data['$schema']
        if isinstance(schema, (np.ndarray, list)):
            schema = str(schema[0])
        else:
            schema = str(schema)
        match = json_schema_name_pattern.search(schema)
        if match:
            schema_name = match.group(1)
            schema_version = match.group(3)
    return schema_name, schema_version

hash_names(*name_groups)

Generate a unique hash for a combination of name groups.

Parameters:

Name Type Description Default
*name_groups Iterable[str]

Variable number of iterables, each containing names (strings).

()

Returns:

Type Description
str

A hexadecimal MD5 hash representing the combination of name groups.

Source code in jelli/utils/data_io.py
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
def hash_names(*name_groups: Iterable[str]) -> str:
    '''
    Generate a unique hash for a combination of name groups.

    Parameters
    ----------
    *name_groups : Iterable[str]
        Variable number of iterables, each containing names (strings).

    Returns
    -------
    str
        A hexadecimal MD5 hash representing the combination of name groups.
    '''
    parts = []
    for group in name_groups:
        if group:
            escaped = '|'.join(escape(o) for o in sorted(group))
            parts.append(escaped)
    block_id = '||'.join(parts)
    return hashlib.md5(block_id.encode('utf-8')).hexdigest()

pad_arrays(arrays)

Pad arrays to the same length by repeating the last element.

Parameters:

Name Type Description Default
arrays list of np.ndarray

List of 1D numpy arrays to be padded.

required

Returns:

Type Description
ndarray

A 2D numpy array where each row corresponds to a padded input array.

Source code in jelli/utils/data_io.py
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
def pad_arrays(arrays):
    '''
    Pad arrays to the same length by repeating the last element.

    Parameters
    ----------
    arrays : list of np.ndarray
        List of 1D numpy arrays to be padded.

    Returns
    -------
    np.ndarray
        A 2D numpy array where each row corresponds to a padded input array.
    '''
    max_len = max(len(arr) for arr in arrays)
    return np.array([
        np.pad(arr, (0, max_len - len(arr)), mode='edge')
        for arr in arrays
    ])