Demo¶

Setup¶

CRISP-T is a python package that can be installed using pip and used from the command line. Your system should have python 3.11 or higher and pip installed. You can download and install Python for your operating system here. Optionally, CRISP-T can be imported in python scripts or jupyter notebooks, but this is not covered in this demo. See documentation for more details.
Install Crisp-T with pip install crisp-t[ml] or uv pip install crisp-t[ml]
(Optional) Download covid narratives data to crisp_source folder in home directory or current directory using crisp --covid covidstories.omeka.net --source crisp_source. You may use any other source of textual data (e.g. journal articles, interview transcripts) in .txt or .pdf format in the crisp_source folder or the folder you specify with --source option.
(Optional) Download Psycological Effects of COVID dataset to crisp_source folder. You may use any other numeric dataset in .csv format in the crisp_source folder or the folder you specify with --source option.
Create a crisp_input folder in home directory or current directory for keeping imported data for analysis.

Import data¶

Run the following command to import data from crisp_source folder to crisp_input folder.
--source reads data from a directory (reads .txt, .pdf and a single .csv) or from a URL

crisp --source crisp_source --out crisp_input

Ignore warnings related to pdf files.

Perform Exploratory tasks using NLP¶

Run the following command to perform a topic modelling and assign topics(keywords) to each narrative.
--inp crisp_input below is optional as it defaults to crisp_input folder.

crisp --inp crisp_input --assign --out crisp_input

The results will be saved in the same crisp_input folder, overwriting the corpus file.
You may run several other analyses (see documentation for details) and tweak parameters as needed.
Hints will be provided in the terminal.

From now on, we will use crisp_input folder as input folder unless specified otherwise as that is the default.

Explore results¶

crisp --print "documents 10"

Notice that we have omitted --inp as it defaults to crisp_input folder. If you want to use a different folder, use --inp to specify it. The --out option helps to save intermediate results in a different folder.
The above command prints first 10 documents in the corpus.
Next, let us see the metadata assigned to each document.

crisp --print "documents metadata"

Notice keywords/topics assigned to each narrative.
You will notice interviewee and interviewer keywords. These are assigned based on the presence of these words in the narratives and may not be useful.
You may remove these keywords by using --ignore with assign and check the results again.

crisp --clear --assign --ignore interviewee,interviewer --out crisp_input
crisp --print "documents metadata"

--clear option clears the cache before running the analysis. ⚠️ While analysing multiple datasets, use crisp --clear option to clear cache before switching datasets. ⚠️
Now you will see that these keywords are removed from the results.
It prints the first 5 documents by default.

crisp --print "metadata clusters"

Prints the clusters assigned to each document based on keywords.
There are many other options to explore the results. See documentation for details.
Let us choose narratives that contain 'mask' keyword and show the concepts/topics in these narratives.

crisp --inp crisp_input --clear --filters keywords=mask --topics

The above results will not be saved as --out is not specified.
Notice time, people as topics in this subset of narratives.
If --filters is used, only the filtered documents are used for the analysis. When using filters you should explicitly specify --inp and --out options with different folders to avoid overwriting the input data.

Quantitative exploratory analysis¶

Let us see do a kmeans clustering of the csv dataset of covid data.

crisp --include relaxed,self_time,sleep_bal,time_dp,travel_time,home_env --kmeans

Notice 3 clusters with different centroids. (number of clusters can be changed with --num option).

Confirmation¶

Let us add a relationship between numb:self_time and text:work in the corpus for future confirmation with LLMs.

crispt --add-rel "text:work|numb:self_time|correlates"

Let us do a regression analysis to see how relaxed is affected by other variables.

crisp --include relaxed,self_time,sleep_bal,time_dp,travel_time,home_env --regression --outcome relaxed

self_time has a positive correlation with relaxed.
What about a decision tree analysis?

crisp --include relaxed,self_time,sleep_bal,time_dp,travel_time,home_env --cls --outcome relaxed

Relaxed is converted to binary variable internally for classification.
Ideally, you should do the binary conversion externally based on domain knowledge.
Notice that self_time is the most important variable in predicting relaxed.

Topological Data Analysis Rudkin, S., & Dlotko, P. (2024)¶

Let us do a TDA analysis to see the shape of the data.
parameters to --tdabm are specified as follows: outcome:varables:radius


crispt --tdabm relaxed:self_time,sleep_bal,time_dp,travel_time:0.6 --out crisp_input

Let us visualize the TDA network.

crispviz --tdabm --out viz_out/

Sense-making by triangulation ¶

Now let us try out a csv dataset with text and numeric data.¶

Download SMS Smishing Collection Data Set from Kaggle and convert the text file to csv adding the headers id, CLASS and SMS. Convert CLASS to numeric 0 and 1 for ham and smish respectively and add id as serial numbers.
Place the csv file in a new crisp_source folder.
Import the csv file to crisp_input folder using the following command.

crisp --source crisp_source/ --unstructured SMS

Notice that the text column SMS is specified with --unstructured option. This creates CRISP documents from the text column.
Now assign topics to the documents. Note that this also assigns clusters.

crisp --assign

Now print the results to examine.

crisp --print "metadata clusters"

Let us choose the cluster 1 and see the SMS classes in this cluster. (0=ham, 1=smish)

crisp --filters cluster=1 --print "dataframe stats"

Next, let us check if the SMS texts converge towards predicting the CLASS (ham/ smish) variable with LSTM model.

crisp --lstm --outcome CLASS

MCP Server for agentic AI. (Optional, but LLMs may be better at sense-making!)¶

Try out the MCP server with the following command. (LLMs will offer course corrections and suggestions)¶

load corpus from /Users/your-user-id/crisp_input
use available tools
What are the columns in df?
Do a regression using time_bp,time_dp,travel_time,self_time with relaxed as outcome
Interpret the results
Is self_time or related concepts occur frequently in documents?
can you ignore "interviewer,interviewee" and assign topics again? Yes.
What are the topics in documents with keyword "work"?

Visualization¶

Let's visualize the clusters in 2D space using PCA.¶

crispviz --ldavis --out viz_out/

The visualization will be saved in viz_out folder. Open the html file in a browser to explore.

Let's generate a word cloud of keywords in the corpus.¶

crispviz --wordcloud --out viz_out/

The word cloud will be saved in viz_out folder.

More examples — comprehensive CLI usage¶

The following grouped examples show common and advanced usage patterns for the three CLIs: crisp, crispt, and crispviz. These are practical, copy-pasteable command lines that demonstrate option combinations and formats discussed in this demo and cheatsheet.

A. Data import & basic workflow (`crisp`)¶

Import a folder with text files and a CSV; specify unstructured text column¶

crisp --source ./raw_data --unstructured "comments" --out ./crisp_input

Import but limit text files and CSV rows when ingesting large sources¶

crisp --source ./raw_data --out ./crisp_input --num 10 --rec 500

Import CSV placed in the source folder; ignore specific stopwords/columns¶

crisp --source ./survey --unstructured "comments" --ignore "interviewer,interviewee" --out ./survey_corpus

B. Filtering and linking (`crisp` + `crispt`) — examples¶

Exact-match filters (both `=` and `:` separators supported)¶

crisp --inp ./crisp_input --filters category=Health --topics
crisp --inp ./crisp_input --filters category:Health --topics

Special link filters (text→df and df→text)¶

# Filter dataframe rows that are linked from documents via embeddings
crispt --inp ./crisp_input --filters embedding:text --out ./linked_by_embedding

# Filter documents that are linked from dataframe rows via temporal links
crispt --inp ./crisp_input --filters temporal:df --out ./linked_docs

Legacy shorthand mappings — both map to `embedding:text` or `temporal:text`¶

crispt --inp ./crisp_input --filters =embedding
crispt --inp ./crisp_input --filters :temporal

ID linkage: filter to a single ID, or sync remaining docs↔rows with blank value¶

# Filter to specific id
crisp --inp ./crisp_input --filters id=12345 --nlp

# Sync documents and dataframe rows by ID after other filters
crisp --inp ./crisp_input --filters id: --out ./synced_output

C. Text analysis quick examples (`crisp`)¶

Topic modeling and then assign topics to documents¶

crisp --inp ./crisp_input --topics --assign --out ./crisp_input_analyzed

Run sentiment and summary together¶

crisp --inp ./crisp_input --sentiment --summary --num 5

Run all NLP analyses (coding dictionary, topics, categories, summary, sentiment)¶

crisp --inp ./crisp_input --nlp

D. Machine learning & cross-modal examples (`crisp`)¶

Run k-means clustering on numeric CSV columns¶

crisp --inp ./survey_corpus --kmeans --num 4 --include age,income,score

Classification (SVM + Decision Tree) using a DataFrame outcome column¶

crisp --inp ./survey_corpus --cls --outcome satisfaction_binary --include a,b,c --aggregation majority

Neural net (requires `crisp-t[ml]`)¶

crisp --inp ./survey_corpus --nnet --outcome target_col --include feat1,feat2

LSTM using text documents aligned by `id` column in CSV¶

crisp --inp ./survey_corpus --lstm --outcome CLASS

E. Corpus management & inspection (`crispt`) — examples¶

Create a new corpus and add documents¶

crispt --id my_corpus --name "Study A" --doc "1|Intro|This is the first document" --out ./my_corpus

Add metadata and a relationship¶

crispt --inp ./my_corpus --meta "source=field" --add-rel "text:work|numb:self_time|correlates" --out ./my_corpus

Remove a document and clear relationships¶

crispt --inp ./my_corpus --remove-doc 1 --clear-rel --out ./my_corpus

Inspect dataset columns, row counts, or specific rows¶

crispt --inp ./my_corpus --df-cols
crispt --inp ./my_corpus --df-row-count
crispt --inp ./my_corpus --df-row 12

Print usage: two supported formats¶

# Multi-flag form
crispt --inp ./my_corpus --print documents --print 10

# Single-string form
crispt --inp ./my_corpus --print "dataframe metadata"

F. Semantic & embedding features (`crispt`) — examples¶

Semantic search for similar documents (requires embedding backend)¶

crispt --inp ./my_corpus --semantic "patient anxiety" --num 8 --rec 0.45

Find documents similar to a list of document IDs¶

crispt --inp ./my_corpus --similar-docs "1,2,3" --num 5

Semantic-chunks: search within specific document chunks (use with --doc-id)¶

crispt --inp ./my_corpus --doc-id 5 --semantic-chunks "query phrase" --rec 0.6

Embedding linking and stats¶

crispt --inp ./my_corpus --embedding-link "cosine:3:0.7" --embedding-stats --out ./emb_links
crispt --inp ./emb_links --filters embedding:df --out ./docs_linked_to_rows

G. Temporal utilities (`crispt`) — examples¶

Link by time (nearest, window with seconds, or sequence)¶

crispt --inp ./my_corpus --temporal-link "nearest:timestamp"
crispt --inp ./my_corpus --temporal-link "window:timestamp:300"  # ±300 seconds

Temporal summaries, sentiment trends, and topics over periods¶

crispt --inp ./my_corpus --temporal-summary W
crispt --inp ./my_corpus --temporal-sentiment W:mean
crispt --inp ./my_corpus --temporal-topics W:5

H. Visualization examples (`crispviz`)¶

Word frequency + topic wordcloud + LDA interactive visualization¶

crispviz --inp ./crisp_input_analyzed --out viz_out --freq --wordcloud --ldavis

Top terms with custom top-n and bins¶

crispviz --inp ./crisp_input --out viz_out --top-terms --top-n 30 --bins 80

Correlation heatmap with selected numeric columns¶

crispviz --inp ./survey_corpus --out viz_out --corr-heatmap --corr-columns "age,income,score"

Graph visualization filtered by node types and a different layout¶

crispviz --inp ./my_corpus --out viz_out --graph --graph-nodes document,keyword --graph-layout circular

I. Small tips & parameter semantics¶

--rec for crispt semantic commands can be a similarity threshold (float, default 0.4), while --rec for some crisp commands is used as an integer count — check the command context.
--num defaults differ by context (e.g., crispt search default is 5; crisp analysis default is 3).
--aggregation accepts majority|mean|first|mode and controls how multiple documents map to one numeric row are aggregated for ML tasks.

J. Full-run example (import → analyze → visualize)¶

# 1) Import
crisp --source ./raw_data --unstructured "comments" --out ./crisp_input

# 2) Run NLP + sentiment + save
crisp --inp ./crisp_input --topics --assign --sentiment --out ./crisp_input_analyzed

# 3) Link by embedding and run regression on linked set
crispt --inp ./crisp_input_analyzed --embedding-link "cosine:1:0.7" --out ./linked
crisp --inp ./linked --outcome satisfaction_score --regression --out ./final_results

# 4) Create visualizations
crispviz --inp ./final_results --out viz_out --ldavis --wordcloud --corr-heatmap