Modules
Copyright (c) 2020 Bell Eapen
This software is released under the MIT License. https://opensource.org/licenses/MIT
Fhiry
¶
Bases: BaseFhiry
Read and process FHIR Bundles (.json) from file or folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_json
|
Optional JSON string or file path with column transforms. |
None
|
Source code in src/fhiry/fhiry.py
20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
|
delete_col_raw_coding
property
writable
¶
bool: Whether to drop raw coding/display columns after extraction.
df
property
¶
pd.DataFrame | None: The current working dataframe, if any.
filename
property
writable
¶
str: The path to the currently selected input file, if any.
folder
property
writable
¶
str: The path to the input folder containing Bundle JSON files.
process_bundle_dict(bundle_dict)
¶
Process a FHIR Bundle dictionary and return its dataframe.
Source code in src/fhiry/fhiry.py
118 119 120 121 122 |
|
process_file(filename)
¶
Process a single Bundle JSON file and return its dataframe.
Source code in src/fhiry/fhiry.py
112 113 114 115 116 |
|
process_source()
¶
Process either the selected file or the entire folder.
Only columns common across resources will be mapped.
Source code in src/fhiry/fhiry.py
90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 |
|
read_bundle_from_file(filename)
¶
Load a FHIR Bundle JSON file and normalize its entries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
filename
|
str
|
Path to a FHIR Bundle JSON file. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame: Dataframe of the Bundle entries. |
Source code in src/fhiry/fhiry.py
76 77 78 79 80 81 82 83 84 85 86 87 88 |
|
Copyright (c) 2020 Bell Eapen
This software is released under the MIT License. https://opensource.org/licenses/MIT
BaseFhiry
¶
Bases: object
Base class providing common dataframe processing utilities for FHIR.
This class encapsulates common logic for transforming FHIR bundle data into a pandas DataFrame, including column cleanup, code extraction, and patient ID derivation.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_json
|
Either a JSON string or a path to a JSON file specifying transformations with keys: - "REMOVE": list[str] of column prefixes to remove - "RENAME": dict[str, str] mapping old->new column names If None, a sensible default is used. |
None
|
Source code in src/fhiry/base_fhiry.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
delete_col_raw_coding
property
writable
¶
bool: Whether to drop raw coding/display columns after extraction.
df
property
¶
pd.DataFrame | None: The current working dataframe, if any.
add_patient_id()
¶
Add a patientId column inferred from resource fields.
If the resource type is Patient, uses the resource id; otherwise attempts to derive the patient identifier from known subject/patient reference fields.
Source code in src/fhiry/base_fhiry.py
255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 |
|
check_subject_reference(row)
¶
Extract patient id from subject/patient reference fields.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
row
|
Mapping[str, Any]
|
A dataframe row as a mapping. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
The patient id (without "Patient/" or "urn:uuid:" prefix) or |
|
an empty string if not found. |
Source code in src/fhiry/base_fhiry.py
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
|
convert_object_to_list()
¶
Extract codes/display from nested objects into flat list columns.
For columns containing "coding" or "display" in their names, extract a list of codes or display texts into new columns with ".codes" or ".display" suffixes. Optionally drops raw source columns.
Source code in src/fhiry/base_fhiry.py
206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 |
|
delete_unwanted_cols()
¶
Delete unwanted columns from the dataframe.
Uses the "REMOVE" list from the configuration. Any column that equals a listed value or starts with that value followed by a dot will be removed. Safely no-ops if the dataframe or configuration is missing.
Source code in src/fhiry/base_fhiry.py
89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 |
|
drop_empty_cols()
¶
Drop columns that are completely empty (all NaN values).
Source code in src/fhiry/base_fhiry.py
180 181 182 183 184 185 186 187 188 |
|
empty_list_to_nan()
¶
Convert empty list values in object columns to NaN.
Source code in src/fhiry/base_fhiry.py
169 170 171 172 173 174 175 176 177 178 |
|
get_info()
¶
Return a concise info string for the current dataframe.
Returns:
Name | Type | Description |
---|---|---|
str |
Dataframe info text or a message if no dataframe is set. |
Source code in src/fhiry/base_fhiry.py
320 321 322 323 324 325 326 327 328 |
|
llm_query(query, llm, embed_model=None, verbose=True)
¶
Execute a natural language query against the dataframe using LLM tools.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str
|
The natural language question. |
required |
llm
|
Any
|
The language model instance usable by llama_index. |
required |
embed_model
|
str | None
|
Optional HuggingFace embedding model name. |
None
|
verbose
|
bool
|
Whether to enable verbose output from the query engine. |
True
|
Raises:
Type | Description |
---|---|
Exception
|
If required libraries are not installed. |
Exception
|
If the dataframe is empty. |
Returns:
Name | Type | Description |
---|---|---|
Any |
The query result from the underlying engine. |
Source code in src/fhiry/base_fhiry.py
349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 |
|
process_bundle_dict(bundle_dict)
¶
Load and process a FHIR Bundle dictionary.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bundle_dict
|
dict
|
A FHIR Bundle object. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame | None: The processed dataframe, or None if empty. |
Source code in src/fhiry/base_fhiry.py
190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 |
|
process_df()
¶
Run the standard transformation pipeline on the dataframe.
Steps include: - Extracting codes from coding/display objects to flat columns - Adding a patientId column - Removing common prefix from column names - Converting empty lists to NaN - Dropping empty columns - Deleting unwanted columns - Renaming columns per config
Returns:
Type | Description |
---|---|
pd.DataFrame | None: The processed dataframe, or None if unset. |
Source code in src/fhiry/base_fhiry.py
145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 |
|
process_list(myList)
¶
Extract code or display strings from a list of coding-like dicts.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
myList
|
list
|
A list of dictionaries that may contain "code" or "display" keys. |
required |
Returns:
Type | Description |
---|---|
list[str]: A list of extracted codes/display texts. |
Source code in src/fhiry/base_fhiry.py
330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 |
|
read_bundle_from_bundle_dict(bundle_dict)
¶
Normalize a FHIR Bundle dict to a dataframe of entries.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bundle_dict
|
dict
|
A FHIR Bundle object with an "entry" list. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame: Dataframe where each row corresponds to a Bundle entry. |
Source code in src/fhiry/base_fhiry.py
78 79 80 81 82 83 84 85 86 87 |
|
remove_string_from_columns(string_to_remove='resource.')
¶
Remove a literal substring from all column names.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
string_to_remove
|
Substring to remove from column names. |
'resource.'
|
Returns:
Type | Description |
---|---|
pd.DataFrame | None: The updated dataframe or None if unset. |
Source code in src/fhiry/base_fhiry.py
128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 |
|
rename_cols()
¶
Rename dataframe columns according to the configuration.
Uses the "RENAME" mapping from the configuration. Safely no-ops if the dataframe is empty.
Source code in src/fhiry/base_fhiry.py
117 118 119 120 121 122 123 124 125 126 |
|
Copyright (c) 2023 Bell Eapen
This software is released under the MIT License. https://opensource.org/licenses/MIT
BQsearch
¶
Bases: BaseFhiry
Query FHIR datasets in Google BigQuery and process results.
Source code in src/fhiry/bqsearch.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
search(query=None)
¶
Run a BigQuery SQL query and return a processed dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
query
|
str | None
|
Either a SQL string, a path to a .sql file, or None to run a default sample query. |
None
|
Returns:
Type | Description |
---|---|
pd.DataFrame: The query results after standard processing. |
Source code in src/fhiry/bqsearch.py
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 |
|
Fhirsearch
¶
Bases: BaseFhiry
Search FHIR servers and aggregate results into a dataframe.
This client pages through FHIR search results and builds a unified pandas DataFrame using the BaseFhiry processing pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fhir_base_url
|
str
|
Base URL of the FHIR server (e.g., "https://.../fhir"). |
required |
config_json
|
Optional JSON string or file path with column transforms. |
None
|
Source code in src/fhiry/fhirsearch.py
7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
search(resource_type='Patient', search_parameters={})
¶
Search the FHIR server and return the combined results.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
resource_type
|
str
|
FHIR resource type to search (e.g., "Patient"). |
'Patient'
|
search_parameters
|
dict
|
Query parameters per FHIR spec; _count is auto-set to the configured page size if absent. |
{}
|
Returns:
Type | Description |
---|---|
pd.DataFrame: Combined search results across all pages. |
Source code in src/fhiry/fhirsearch.py
32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 |
|
get_next_page_url(bundle_dict)
¶
Return the URL of the next page from a FHIR Bundle, if present.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
bundle_dict
|
dict
|
The FHIR Bundle JSON object. |
required |
Returns:
Type | Description |
---|---|
str | None: The 'next' page URL, or None if no more pages. |
Source code in src/fhiry/fhirsearch.py
79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
Copyright (c) 2024 Bell Eapen
This software is released under the MIT License. https://opensource.org/licenses/MIT
FlattenFhir
¶
Bases: ABC
Flatten FHIR resources to concise human-readable text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
fhirobject
|
dict
|
A FHIR resource or Bundle to flatten. |
{}
|
config_json
|
Currently unused placeholder for future options. |
None
|
Source code in src/fhiry/flattenfhir.py
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
fhirobject
property
writable
¶
Prodict: The current FHIR object as Prodict.
flattened
property
¶
str: The last flattened output string.
flatten()
¶
Compute the flattened text for the current FHIR object.
Returns:
Name | Type | Description |
---|---|---|
str |
The flattened string. |
Source code in src/fhiry/flattenfhir.py
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 |
|
flatten_allergyintolerance(allergyintolerance)
¶
Flatten an AllergyIntolerance into a short sentence.
Source code in src/fhiry/flattenfhir.py
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
|
flatten_condition(condition)
¶
Flatten a Condition into a short sentence.
Source code in src/fhiry/flattenfhir.py
235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 |
|
flatten_documentreference(documentreference)
¶
Flatten a DocumentReference into a short sentence.
Source code in src/fhiry/flattenfhir.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 |
|
flatten_medication(medication)
¶
Flatten a Medication into a short sentence.
Source code in src/fhiry/flattenfhir.py
197 198 199 200 201 202 203 204 205 206 207 208 209 210 |
|
flatten_observation(observation)
¶
Flatten an Observation into a short sentence.
Source code in src/fhiry/flattenfhir.py
131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 |
|
flatten_patient(patient)
¶
Flatten a Patient into a short sentence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
patient
|
Patient resource object. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Flattened snippet. |
Source code in src/fhiry/flattenfhir.py
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 |
|
flatten_procedure(procedure)
¶
Flatten a Procedure into a short sentence.
Source code in src/fhiry/flattenfhir.py
212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 |
|
get_flattened_text(entry)
¶
Append flattened text for a single FHIR entry to the buffer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entry
|
Prodict
|
A FHIR resource object. |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
The updated flattened string. |
Source code in src/fhiry/flattenfhir.py
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 |
|
get_timeago(datestring)
¶
Return a human-friendly time-ago string for the given date.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
datestring
|
str
|
ISO-like date string (YYYY-MM-DD...). |
required |
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Human-friendly relative time. |
Source code in src/fhiry/flattenfhir.py
97 98 99 100 101 102 103 104 105 106 107 |
|
Copyright (c) 2020 Bell Eapen
This software is released under the MIT License. https://opensource.org/licenses/MIT
Fhirndjson
¶
Bases: BaseFhiry
Read and process NDJSON FHIR resources from a folder.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
config_json
|
Optional JSON string or file path with column transforms. |
None
|
Source code in src/fhiry/fhirndjson.py
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
df
property
¶
pd.DataFrame | None: The current working dataframe, if any.
folder
property
writable
¶
str: The folder containing NDJSON files to process.
process_file(file)
¶
Process a single NDJSON file and append its rows to the dataframe.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file
|
str
|
Filename within the configured folder to process. |
required |
Returns:
Type | Description |
---|---|
pd.DataFrame | None: The updated dataframe. |
Source code in src/fhiry/fhirndjson.py
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
|
process_source()
¶
Process all NDJSON files in the folder into a single dataframe.
Only columns common across resources will be mapped.
Source code in src/fhiry/fhirndjson.py
51 52 53 54 55 56 57 58 |
|
read_resource_from_line(line)
¶
Normalize a single NDJSON line (JSON object) to a dataframe row.
Source code in src/fhiry/fhirndjson.py
47 48 49 |
|
ndjson(folder, config_json=None)
¶
Process many NDJSON files in parallel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
Directory path or a single file path. |
required |
config_json
|
Optional JSON string or file path with column transforms. |
None
|
Returns:
Type | Description |
---|---|
pd.DataFrame: Concatenated dataframe across all processed files. |
Source code in src/fhiry/parallel.py
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 |
|
process(folder, config_json=None)
¶
Process many Bundle JSON files in parallel.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
folder
|
str
|
Directory path or a single file path. |
required |
config_json
|
Optional JSON string or file path with column transforms. |
None
|
Returns:
Type | Description |
---|---|
pd.DataFrame: Concatenated dataframe across all processed files. |
Source code in src/fhiry/parallel.py
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
|