Python for Newbies: A Simple Anagram Finder

Once you get past Hello World, there are a several good entry points for a new Python developer to get acquainted with the language.  Three areas which really helped me were simple math problems, the Python challenge, and writing some word game solvers.

The best source of basic math problems for a new python developer is Project Euler. This site is basically an online math test - where you need to write a small program to perform the required calculations. Most of these problems can be solved by a new developer in under an hour (some faster). It will broaden your understanding of mathematics and give you a good grounding in basic Python.

The Python Challenge is another good learning environment. The participant is presented with a web page that contains a riddle, with clues buried somewhere on the page. Your job is to use Python to solve the riddle. This generally requiring you to take a deep dive into one of the modules in the Standard Library. We've used this as a warmup exercise in one of the meetups I attend - I can recall doing modules around regular expressions, urllib (simple web scraping), and image processing.

My deepest dive into the language occured when I started doing some "code doodling" around solving word puzzles. For example, the script below takes a list of words and searches a dictionary to identify their anagrams. For a dictionary, I opted to use Enable (Google hosts a copy of it - look for enable.txt on the Google Code directory behind this link). 

A couple of insights made this script possible:

  • Exact anagrams of a word have the same number of letters - just in a different order.  If you sort the letters of each word in the dictionary into alphabetical order (eg. cart => acrt), words that are anagrams of each other will have the same value.
  • This guides us towards a solution. Use Python's sort function to map the anagrams in a list of words into a common key; then construct a dictionary using this common key to accumulate a list of anagrams mapped to that value.
  • The anagrams for a particular word can be retrieved using the "get" method of the dictionary.
  • Finally - just for fun - we use Python's map function to demonstrate how you can run this check for a whole list of candidates

Since it's five o'clock somewhere, I opted to run this script against a list of beverages.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
testlist = ["wine","beer","mead","liquor","scotch","gin","whiskey","rum"]
 
 
def gethandle(word):
    return ("").join(sorted(word))
 
 
anagrams={}
 
 
wordlist =open("enable1.txt")
for item in wordlist:
     handle = gethandle(item.strip())
     if handle not in anagrams:
         anagrams[handle] = [item.strip()]
     else:
         anagrams[handle].append(item.strip())
 
 
def testforanagrams(word):
    results = anagrams.get(gethandle(word),[])
 
 
    if len(results)>1:
          print word, ":",", ".join([item for item in results if item != word])
    else:
          print word, ":", "None"
 
 
map(testforanagrams,testlist)

producing the following output:

1
2
3
4
5
6
7
8
wine : None
beer : bree
mead : dame, made
liquor : None
scotch : None
gin : None
whiskey : None
rum : None

This is a fairly basic and slightly verbose script and there are a number of ways to improve upon it. I'll suggest a couple of the more obvious ones, the implementation of which is left as an exercise for the student.

First, it could benefit by reorganizing all of the code into proper functions. Several elements of the script execute sequentially with the script vs. being packaged into reusable functions. I'd actually recommend reorganizing the entire script into a class object. This neatly encapsulates the above activities and helps you separate the act of loading the dataset from searching the dataset. From a performance perspective, you spend a lot of IO and CPU time parsing and loading the initial dictionary. If you operated this as some type of "service", you should leverage this setup time across as many requests as possible.

On a personal note, this little script actually got me started on a larger journey - involving two additional steps.

Once I had a basic anagram finder, I started looking at adding other word games into the package. This forced me to do some optimization around the underlying algorithm using several articles I found on Python data structures and processing performance. In the process, I was able to learn a lot about how to benchmark the memory usage and CPU speed for a Python program. While the final product was much more complicated, it was a lean, mean "word searching machine".

While I was building the class object, I was also learning about several python web frameworks for work and wanted to find a safe test project. Once I built the word search class object, I used a web framework to link a URL Route (/solveword) to a class method (solve-word function). This provided me with the server-side chassis for a word game solver website (the first "production grade" solver I wrote is here, a fancier version of the solver is here). I gave a talk on this project at my local Python meetup (overview and slides are here). This series of steps actually prompted me to learn a lot more than Python - HTML, CSS, Jquery, basic devops, etc. 

Don't be afraid of taking little steps when learning a new language. Little steps and silly problems (eg. anagram finding, finding palindromes, many of the other Project Euler exercises) are frequently the gateway to working on more serious applications. Having "code in production" tends to prompt you to create more code - forcing you to learn more than you ever expected about topics you never knew existed until they were suddenly relevant to bringing your baby to life. Silly little steps sometimes can turn into respectable deliverables....

P.S. And don't be afraid to play around with projects that seem hopeless from an adoption perspective. I wrote the website's scrabble solver purely to round out the collection of solvers; while that particular solver received zero public love (too many people already build better solvers and are way too intense about promoting them), the site actually gets traffic from a simpler form of the scrabble solver that was rebranded into tools to unscramble words and serve as a word jumble solver. Look for adjacent spaces and see if your program works there; you may be very surprised at the outcome.

Addendum (three years later): I've apparently found my version of a code kata project. I've actually replicated this project about four times since then. First version was done in two three hour sprints as a challenge (the word unscrambler implements the same algorithm above, except as a database). And replicated it using PHP (version 1 is here and another version with more polished internals is here).




Leave comments

authimage
  • Great post!
    Another dimension of optimization would be to use something like parallelpython. Check out
    http://www.parallelpython.com

    -Vick

    • Vick
  • Ahh...yes, I agree. Will check it out.

    I actually did an extension of this project where I wrote a map-reduce job to run every word in the dictionary through one of the methods in the word-solver class.

    • john

Copyright(c) 2017 - PythonBlogs.com
By using this website, you signify your acceptance of Terms and Conditions and Privacy Policy
All rights reserved