This time, in the third installment in this series, we're going to start delving deeper into working with strings of text and lists of words. But first, we need to learn how to write a program.
Up until now, we've been typing code directly into the Python interpreter. This is sometimes a convenient thing to do, as it's very easy to test a few lines of code to see if you get the desired result. However, as we want to create more complex programs it's going to be annoying to have to type everything from scratch every time we want to repeat a task. Thus, we will write programs.
Writing a program in python is no more complicated than writing code into a text file, and storing it with the .py file extension. For this task, you can use any program that can edit plain text, such as notepad on Windows or TextEdit on Mac, though I recommend that you get an editor which supports syntaxt highlighting. If you want to really become a serious hacker I recommend vim of course, but other good alternatives are TextMate
for Mac, Sublime Text
for Windows, Mac and Linux, Notepad++
for Windows, and many more. Do a little searching, perhaps watch some videos or something, and find one that suits you.
Assuming that you now have the editor in place, let's write our first program. Open a file, type
print 'Hullo,Pippin! This is a pleasant surprise!'
and save the file as hello.py
. Now open a terminal, navigate to the folder in which you saved the file, and type
If everything went well, you should see the text being printed to the terminal, and you have written your first python program. That's really all there is to it, all the code we've looked at previously can go directly into a program. The line at the top is just there to tell the system that this is a python file.
One difference to notice is that while typing for example a calculation into the interpreter, like this
will print the result, it won't if typed into a program. To print output from a program, you need to use for example the print
statement, as we did above.
Another convenient thing to remember is that you can comment your code. This isn't really much use in the interpreter, but when writing a program, it can be helpful to include comments to help you remember what you were thinking if you need to change a program later on. The comment sign in python is #
. What this means is that if you include a #
anywhere on a line, the rest of that line won't be read by python. For example, we can edit the file hello.py
# This is a comment
print 'Hullo,Pippin! This is a pleasant surprise!' # This is also a comment
# print 'This bit will not be printed, since it is also a comment'
If you run it, it will still produce the same output as before. A good rule is to think that code is often written once, but read many times, and it is therefore worth taking a little extra time to write in order to save time later on. This not only applies to comments, but also to the code itself. For example, it is a good habit to choose descriptive variable names. For other good habits, have a look at PEP 8 -- Style Guide for Python Code
Moving on to the topic of today, further dealings with strings of text, let me introduce the method replace
. It is a method that works with strings, and what it does is to replace one substring with another. Let's look at an example in the interpreter (I often use the interpreter to test out a function, to make sure the output is what I expect, before I include it in a program):
>>> a = 'hello'
>>> a.replace('e', 'u')
>>> a.replace('o', '')
>>> a.replace('hello', 'Hey there, sailor!')
'Hey there, sailor!'
As we can see, replace
can replace one or more consecutive characters in a string with zero or more characters. (In fact, it can replace the empty string, ''
, as well, but that's a bit weird I think.) This is going to come in handy when we want to remove special characters from our text. We'll create a new program with the following contents:
text = "Fog everywhere. Fog up the river, where it flows among green aits and meadows; fog down the river, where it rolls defiled among the tiers of shipping and the waterside pollutions of a great (and dirty) city. Fog on the Essex marshes, fog on the Kentish heights. Fog creeping into the cabooses of collier-brigs; fog lying out on the yards and hovering in the rigging of great ships; fog drooping on the gunwales of barges and small boats. Fog in the eyes and throats of ancient Greenwich pensioners, wheezing by the firesides of their wards; fog in the stem and bowl of the afternoon pipe of the wrathful skipper, down in his close cabin; fog cruelly pinching the toes and fingers of his shivering little 'prentice boy on deck. Chance people on the bridges peeping over the parapets into a nether sky of fog, with fog all round them, as if they were up in a balloon and hanging in the misty clouds."
word_list = 
for word in text.split(' '):
cleaned_word = word.replace('.', '')
cleaned_word = cleaned_word.replace(',', '')
cleaned_word = cleaned_word.replace(';', '')
cleaned_word = cleaned_word.replace(':', '')
cleaned_word = cleaned_word.replace(')', '')
cleaned_word = cleaned_word.replace('(', '')
for word in word_list:
If you run this code, you will see that it prints all the words in the paragraph, one on each line, with special characters removed (except for the word 'prentice
). What's happening is first of all that we create the variable text
, which holds the text. Then, we create an empty list named word_list
. Looping through all the words in the text, we then first use the replace
function to replace any instance of '.'
, i.e., full stop, with ''
, i.e., nothing, and store the result in the variable cleaned_word
. Then we repeat the process, removing the other special characters from cleaned_word
, each time storing the result back into cleaned_word
Now if you feel pretty sure that there ought to be a simpler way to do this, with less repeated code, you are of course right. The principle of DRY, Don't Repeat Yourself, is an important one, and the way I see it it's one of the main reasons to learn to program. Next time, we'll look at defining functions, which are an easy way to reuse code several times.