CSC 142/Chapter 6
From charlesreid1
Chapter 6: File Processing
Sections:
6.1 File reading basics
6.2 Token based processing
6.3 Line based processing
6.4 Advanced file processing
6.5 Case study: zip code lookup
Chapter 3 focused on a scanner for user input. Chapter 6 focuses on a scanner for file reading.
Many intro programming classes see this as a complicated topic, and Java doesn't make it easy. It's awkward, but it's manageable.
We will also explore exceptions relate to file processing.
(Python makes this a dream.)
with open('data.txt','r') as f:
lines = f.readlines()
Done.
Section 6.1: File Reading Basics
Definitions
Definitions;
- File
- File extension
- binary
- ASCII
- Checked exception
- Throws clause
Material
Examples of the deluge of data available:
- Landmark-project (earthquakes, pollution, baserball, history, weather, etc)
- Gutenberg - see ciphertexts
- ncbi.nlm.nih.gov - biological/genomic data
- IMDB
- Fedstats.gov
- US census
- World bank
- CIA world factbook
Files and file objects:
- Data stored on computer as files
- Files have extensions
- Files can be stored as text, or as binary
- To deal with a file, use a File object
- This provides various methods
- Java API lookup/reference
- Note: we aren't constructing a NEW FILE, we're constructing a new object to represent an existing file
Reading files with scanner:
- Useful methods of File objects: (see list)
- File object is like a pipe: doesn't care much about what kind of fluid flowing thru, or where it comes from
- File object is the delivery system
- You can then pas sthe File object into a scanner
- Again, scanner is like nozzle at end of pipe - does not care much about File type or details of File object, just like nozzle doesn't care about type of fluid
- Need to deal with potential problems; file not there
- Checked exception - like "check" in chess
- Must be dealt with (can't just say, ignore and keep going)
- To handle this exception, put the code that may cause the error into a throws clause
Throws clause: diapers for your code
More in throws/catch clauses:
- You're anticipating a particular kind of mess
- Like an if statement, for exceptions
- If we see this kind of exception, catch it this particular way
public static void main (String[] args) throws FileNotFoundException {
...
}
Other exceptions:
- If you reach the end of a file, then ask for more
- NoSuchElementException
A word on the correct way:
Scanner input = new Scanner(new File("hamlet.txt"))
versus the incorrect way:
Scanner input = new Scanner("hamlet.txt")
(Latter would be like saying, a file with the literal contents "hamlet.txt")
NOTE: This is overloading in action (Scanner can take multiple data types)
Section 6.2: Token-Based Processing
Definitions
Definitions:
- Token-baesd processing
- Input cursor
- Consuming input
- File path
- Current directory
Token - a single chunk of letters or character data
- Usually WORDS separated by SPACES
- But could also be NUMBERS separated by COMMAS
- Or, other stuff...
Example: file with 5 numbers
- Read in the first 5 numbers
- Cumulative sum of first 5 numbers
- don't forget the throws
Output:
- Program outputs sum as 337.19999999 instead of 337.2
Utilize scanner functions:
- Scanners have next() and nextDouble() and etc to read next values
Structure of files:
- Computer sees a one-dimensional stream of characters: everything else is our own invention (e.g., line breaks are ignored so computer doesn't even see lines)
- Scanner handles details of, e.g., what to do when it gets to a newline char or a number char
Exceptions from wrong data type:
- InputMismatchException
- Pay close attention to errors: not clear, but provide you with hints
Moving through a file:
- Comptuer sees 1D stream of text
- Can't jump around - like a VCR tape
- So, current location/position is important (input cursor)
- Cursor moves down one char at a time
- Scanner handles details:
- nextFloat() knows what to look for
- advances cursor to next word
Scanner object info:
- if we repeatedly call Scanner, it doesn't reset the cursor
- one scanner --> one File, one position
- processTokens(input,2) --> first 2 tokens
- processTokens(input,3) --> processes tokens 3, 4, and 5 (not 1, 2, 3)
etc.
Paths and directories:
Flags
| CSC 142 - Intro to Programming I Computer Science 142 - Intro to Programming I, South Seattle College.
Chapter 1: Intro to Java CSC 142/Chapter 1 Chapter 2: Primitive Data and Definite Loops CSC 142/Chapter 2 Chapter 3: Parameters and Objects CSC 142/Chapter 3 Chapter 4: Conditional Execution CSC 142/Chapter 4 Chapter 5: Program Logic and Indefinite Loops CSC 142/Chapter 5 Chapter 6: File Processing CSC 142/Chapter 6 Chapter 7: Arrays CSC 142/Chapter 7 Chapter 8: Classes CSC 142/Chapter 8
Puzzles: Puzzles
Category:Teaching · Category:CSC 142 · Category:CSC Related: CSC 143 Flags · Template:CSC142Flag · e |