Benford’s law describes the (surprising) distribution of first (leading) digits of many different sets of numbers:
Benford’s law states that in listings, tables of statistics, etc., the digit 1 tends to occur with probability ~30%, much greater than the expected 11.1% (i.e., one digit out of 9). Benford’s law can be observed, for instance, by examining tables of logarithms and noting that the first pages are much more worn and smudged than later pages (Newcomb 1881).
Read it about it on Wikipedia or MathWorld
We’ll write a Python function benford_count
that tabulates the occurrence of digits from a set of numbers.
f = open(filename,mode)
(mode = r
for reading, w
for writing, a
for appending) - text by default; returns a file objectf = open("test.txt")
type(f)
print(f)
print(f.closed)
f.close()
print(f.closed)
.closed
is a data attribute (e.g. see here): it’s a value rather than a function (unlike a method attribute like list.append()
): no parentheses!
close(f)
to close the file once you’re done with it.open()
looks in the current working directory: if you want to get a file from somewhere else, you need to specify the full path, e.g.f = open('/Users/bolker/Documents/temp/temp.txt', 'r')
f.close()
import os; os.getcwd()
to check the current working directory and os.listdir()
to see what files are there.Once a file has been opened, its entire contents can be read in as a str
using the read()
method on the associated file object.
f = open('test.txt', 'r')
contents = f.read()
print(contents)
print(repr(contents))
f.close()
\n
.read()
method to read a specified number of characters from the file.f.read(10)
will read the next 10 characters from the file.''
, the empty string.f = open('test.txt', 'r')
print('first 10 characters of the file:')
print(f.read(10))
print('next 6 characters (including the end of line character `\\n`:')
print(f.read(6))
print("this is the rest of the file:")
contents = f.read()
print(contents)
f.close()
readlines()
method file to read the file in a list of strings.f = open('test.txt')
lines = f.readlines()
# print the list of lines.
# Each line ends with a new line character \n.
print(lines)
print(lines[2])
f.close()
for
loop automatically reads lines too: for line in f: do_something_with(L)
. This is probably the most common way to process a file. For example, if you want to convert each line to uppercase:f = open("test.txt")
for line in f:
print(line)
line_u = line.upper()
print(line_u)
f.close()
If you wanted instead to print the square of a single number occurring on every line, you could use print(int(line)**2)
(int()
ignores the \n
at the end of the line).
.strip()
method for strings (gets rid of leading and trailing whitespace [spaces, tabs, newlines …])f = open("test.txt")
L = f.readline() ## read one line
print(repr(L))
print(repr(L.strip()))
.split()
method for stringsf = open("test.txt")
L = f.readline() ## read one line
LL = L.strip().split(" ")
print(LL)
import os
in order to find out working directory (os.getcwd()
), or set the directory os.chdir(newpath)
; use full path or use (e.g.) ..
to go up one levelimport urllib.request as ur
; ur.urlopen(url)
io.TextIOWrapper()
import csv
, use csv.reader()
next()
, and more flow controlf = open("test.txt")
line = next(f) ## read one line
print(line)
next()
you get a StopIteration
error; use try
/except
to handle it safelyf = open("test.txt")
finished = False
while not finished:
try:
line = next(f)
except StopIteration:
finished = True
try
is a general way to handle errors safelyexcept
tells Python what to do if try
doesn’t work. You can specify the particular type of error you’re looking for, or use except:
to catch any kind of errorbreak
:while True:
try:
x = input("enter a number: ")
print(int(x))
break
except ValueError:
print("Try again!")
benford_count
that tabulates the occurrence of leading digits from a set of numbers.digits_count
of length 10 (one for each digit) such that digits_count[i]
is equal to the number of occurrences of the digit i as the leading digit of some number from the file, for i = 1, 2, …, 9.digits_count[2]
is the count of the number of times the digit 2 occurred as the leading digit of some number from the file.To properly process a data file, we need to make some assumptions about its format. - We’ll assume that each line contains a number of words, for example, the name of a town, followed by its province/state and country, etc., and that the last word will be a number that represents the size of the quantities being counted in the file.
digits_count
list of length 10, filled with zeros (why a list??)digits_count
.tuple(digits_count)
__main__
. This will be the case when the file itself is being executed and not imported.set
s are mutable objects and so can be altered.{
and }
).vowels = {'a', 'e', 'i', 'o', 'u'}
print(type(vowels))
print(vowels)
vowels = {'a', 'e', 'i', 'o', 'u', 'e', 'a'}
print(vowels)
print({1, 2, 3, 'a'} == {'a', 1, 1, 3, 2, 'a'})
set
.set()
produces the empty set.L
is a list and S
is a str
, set(L)
and set(S)
produce the set of items/characters from the list/string.print(set())
print(set([1, 2, 0, -1, 3, 1, 1, 2]))
print(set('hello world!'))
print(set(range(0, 10, 2)))
add
, remove
, intersection
, union
, …small = {0, 1, 2, 3}
mid = {3, 4, 5, 6, 7}
big = {7, 8, 9}
big.add(10)
small.remove(0)
print(small, big)
print(small.intersection(mid))
print(small.union(big))
sets can be compared using methods or set operators.
d = {0,1}
print(d.issubset({0, 1, 2, 3}))
print(d <= {0, 1, 2, 3})
Code for testing whether a string is hexadecimal:
hex_char = "0123456789abcdef"
word = "12aac"
for char in word:
if not char in hex_char:
return False
return True
this can be replaced by:
set(word) <= set(hex_char)