Galatia
Do you recall back in 2023 where I mentioned failing my C programming class in college? So long ago! Going into this holiday break (a two week vacation for me), I got bored and picked up the old source code and input file and I finished the assignment…on the 30th anniversary of my Incomplete.
Assignment:
Based on knowledge learned through the semester with file management, text processing, memory allocation, data structures, B-Trees, linked-lists, and so on, write a program that can take a text file representing a book of the bible and produce a concordance of the important words, listing each word in alphabetical order with a list of references for each word in book/chapter/verse order. Extra credit if you include a parser to stem the words (instead of “write”, “writes”, “wrote”, you get “write”).
My book was actually Galatians, not Ephesians like I recalled earlier, but whatever.
Input format (sample snippet, no newlines):
@$GAL@ 01:01 Paul, an apostle - sent not from men nor by man, but by Jesus Christ and God the Father, who raised him from the dead -
My old code was written for Borland C on Windows 3 to be run on the command line. That’s how old this code is. It had decent bones, and I gave it an honest try in 1994, but I just couldn’t get the bones to stick together. Something-something about my obvious misunderstanding of fundamentals like pointers, recursion, source file flow, something-something.
Once I got the gcc build chain up on my Linux box and got VSCode going, I tried building what I had. There were so many dependency issues and syntax errors, I moved everything aside and rebuilt the code from the ground up, using old pieces to build new files by the lessons I learned a decade ago when I built my JX3P Tape Dump Decoder tool (also written in C).
20 years of professional and hobby code development has taught me so much more than I ever could’ve grokked in 4 months at that thumb-headed age of 22.
The new code did it proper:
- makefile with real and .phony targets (all, clean)
- *.c source files under
./src/
- *.h headers under
./include/
- *.o object files under
./obj/
- source management with
git
Invocation, with explicit source and destination files (can also read from stdin and print to stdout for piping):
user@host:~/concordance$ ./concordance books/galatians.txt final-gal.txt
Sample output from final-gal.txt
:
gained GAL 2:21Galatia GAL 1:2Galatians GAL 3:1gave GAL 1:4, 2:9, 2:20, 3:18Gentile GAL 2:14, 2:15Gentiles GAL 1:16, 2:2, 2:7, 2:8, 2:9, 2:12, 2:12, 2:14, 3:8, 3:14gentleness GAL 5:23gently GAL 6:1get GAL 1:18, 4:30give GAL 2:5, 3:5, 6:9given GAL 2:9, 3:14, 3:21, 3:22, 3:22, 4:15glad GAL 4:27glory GAL 1:5go GAL 1:17, 2:9, 5:12goal GAL 3:3God GAL 1:1, 1:3, 1:4, 1:10, 1:13, 1:15, 1:20, 1:24, 2:6, 2:8, 2:19, 2:20, 2:21, 3:5, 3:6, 3:8, 3:11, 3:17, 3:18, 3:20, 3:21, 3:26, 4:4, 4:6, 4:7, 4:8, 4:9, 4:9, 4:14, 5:21, 6:7, 6:16gods GAL 4:8good GAL 4:17, 4:18, 5:7, 6:6, 6:9, 6:10, 6:12goodness GAL 5:22gospel GAL 1:6, 1:7, 1:7, 1:8, 1:9, 1:11, 2:2, 2:5, 2:7, 2:14, 3:8, 4:13grace GAL 1:3, 1:6, 1:15, 2:9, 2:21, 3:18, 5:4, 6:18gratify GAL 5:16Greek GAL 2:3, 3:28group GAL 2:12guardians GAL 4:2
As an aside, I grabbed the entire bible from Gutenberg.org and modified the formatting to fit the concordance parser, and — hoo-boy — it took 5 minutes for a single thread to chew through that 4MB text file to produce a 3MB concordance. Mighty. Just look at that sample output:
account 1CH 27:24; 2CH 26:11; JOB 33:13; PSA 144:3; ECC 7:27; MAT 12:36, 18:23; LUK 16:2; ACT 19:40; ROM 14:12; 1CO 4:1; PHI 1:18, 4:17; HEB 13:17; 1PE 4:5; 2PE 3:15; 2KA 12:4accounted DEU 2:11, 2:20; 1KI 10:21; 2CH 9:20; PSA 22:30; ISA 2:22; MAR 10:42; LUK 20:35, 21:36, 22:24; ROM 8:36; GAL 3:6Accounting HEB 11:19accounts DAN 6:2accursed DEU 21:23; JOS 6:17, 6:18, 6:18, 6:18, 7:1, 7:1, 7:11, 7:12, 7:12, 7:13, 7:13, 7:15, 22:20; 1CH 2:7; ISA 65:20; ROM 9:3; 1CO 12:3; GAL 1:8, 1:9accusation JUD 1:9; EZR 4:6; MAT 27:37; MAR 15:26; LUK 6:7, 19:8; JOH 18:29; ACT 25:18; 1TI 5:19; 2PE 2:11accuse PRO 30:10; MAT 12:10; MAR 3:2; LUK 3:14, 11:54, 23:2, 23:14; JOH 5:45, 8:6; ACT 24:2, 24:8, 24:13, 25:5, 25:11, 28:19; 1PE 3:16
I included code to preserve capitalization if words appear to be known names, and bring capitalized words to lowercase of they’re seen in lowercase elsewhere (useful for words first seen at the start of sentences). I didn’t include any stemming code, so no extra credit; the parser is naive. And I did cheat a little and use the string search/case/copy methods available in the gcc stdlib, and I don’t feel guilty about it. But I did write the recursive B-Tree and linked list code from scratch, so there’s that.
I won’t be posting the code. I’m proud of it and happy it works, and it’s clean and neat, but I’m not a fan of public git repo sites (especAIlly now). And tarballs seem excessive for how silly this project is.
I still chafe that Dr. H made us do this with a book of the bible but, honestly, it’s an interesting project with other applications. I tested myself and believe I would’ve passed had I enough experience and patience.
So take that, Doctor H! I hope you’re doing well, wherever you are.