C463 Hw 10

C463/B551/I400 Artificial Intelligence

C463/B551/I400 Homework 10

Due date: Wednesday, November 19, 2025.

Ex. 1. Creating a Content Graph

In this exercise, we will create a knowledge graph that represents the content of a text document. The graph will have words (tokens, stems) for vertices, each of them having an assigned score that reflects its importance in the document. The edges connect the words if they appear in the same sentence in the document and with a weight that represents how strongs the connection between words is. We will use the graph to find out which words are the most important in the document based on their frequency, as a way to define the content of the document.

a. Files.

Download the following files:
make_graph.py: contains a class reading information from the files and using the graph class to build the knowledge graph.
word_graph.py: contains the graph class specific for this problem.
porter_stemmer.py: contains functions to stem a word and reduce it to its root or stem.
stop_words.txt: contains the list of common words that are usually not indexed by search engines because they are too common.
frequent_words.txt: contains the 1000 most common words in the English language according to the Guttenberg project. They are listed in order of frequency.
three_piggies.txt: contains a short story that can be used to test the program. Source: https://reedsy.com/short-stories/bedtime/.

b. Graph Class.

In the class Word_Graph, implement the function print_top that prints the info for the top vertices with the highest score in the graph. You will need to somehow sort the vertices by the score, and then output the top however many are required.

In the same class, implement the function output_file that receives a file name and outputs the graph data to it. We will use the simplest pairwise file format. The file will contain a line for each edge in the graph showing the following data:

name1   name2   weight   score

where the weight is the weight of the edge and the score is the score of the first vertex. The values are separated by a tab ('\t').

c. Make Graph Class.

In the class Make_Graph, implement the function read_index_file that receives the name of a file containing a text document to be indexed. The function should

read one line at a time from the file; strip it of leading and trailing spaces with the function strip and then break it into sentences by splitting it using periods ('.').
For each sentence, split it into words using spaces. Discard all the words that can be found in the list of stop words stopw.
For the others, stem the word first using the stemmer parameter and the function stem_word like in the function read_freq_words. Call the function add_vertex_score to increase the frequency count for this token in the document by 1.
Call the function add_edge_score to increase the strength of the connection between each pair of words in the sentence by 1.
After you're done reading the file, divide the score of each vertex in the graph by their frequency score. If the vertex doesn't have a frequency score, consider it equal to 0.0009.

d. Your Own Test File

Find a small to medium size document to test the program with. It can be anything you may find interesting: a story, a poem, a song lyrics, a news article, and so on. Use the program to create a graph file from it and also save it to a file.

Ex. 2. Visualizing the Graph

We will use the visualization tool Graphia from graphia.app. If you are using a Windows computer, you can download it and install it. Otherwise you can also just use it directly from the web site.

To open the web application, you can go to web.graphia.app. You can go through the tutorial to get familiarized with the tool or just skip it. From the File menu, you can open one of the graph file you created with the program.

After playing around with the visualization and figuring out if it helps you understand the data better, take a screenshot of one of them and save it as a png file to upload with the submission.

Turn in:

Upload all the Python files you have modified as well as the text file you chose for the test, the resulting graph files, and a snapshot of visualizing the graph with Graphia - on Canvas in Assignments - Homework 10.