File Handling
File handling connects your Python scripts to real data: CSV exports from Excel, JSON from APIs, text logs from systems. Business analysts read files to analyze data, write files to share results, and automate report generation. Master file I/O and you never manually copy-paste data again.
Estimated reading time: 25–30 minutes
File Operations
- open() with 'r' → read existing files
- open() with 'w' → create/overwrite files
- open() with 'a' → append to existing files
- with statement → auto-close files (safer)
Great for: loading data, saving reports
Common Formats
- CSV → tabular data (Excel exports, databases)
- JSON → structured data (APIs, configs)
- TXT → logs, notes, simple reports
- Encoding → use utf-8 for international characters
Great for: ETL pipelines, data integration
Reading and Writing Text Files
Text files are the simplest format. Use the with statement to auto-close files.
# Writing a text file
with open("report.txt", "w") as f:
f.write("Sales Report\n")
f.write("Total: $125,000\n")
# Reading a text file
with open("report.txt", "r") as f:
content = f.read()
print(content)
# Reading line by line (memory efficient)
with open("report.txt", "r") as f:
for line in f:
print(line.strip())Working with CSV Files
CSV is the standard for tabular data. Python csv module handles parsing and writing.
import csv
# Reading CSV
with open("sales.csv", "r") as f:
reader = csv.DictReader(f) # returns dicts with column names as keys
for row in reader:
print(row['name'], ":", row['amount'])
# Writing CSV
data = [
{"name": "Ana", "amount": 1200},
{"name": "Bob", "amount": 1500},
]
with open("output.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["name", "amount"])
writer.writeheader()
writer.writerows(data)CSV Best Practices
- Use
DictReader/DictWriterfor named columns (clearer than lists) - Always set
newline=""when writing (prevents blank rows on Windows) - Specify
encoding="utf-8"for international characters
Working with JSON Files
JSON stores structured data (dicts, lists). Common for API responses and configs.
import json
# Reading JSON
with open("config.json", "r") as f:
data = json.load(f)
print(data["api_key"])
# Writing JSON
report = {
"date": "2024-01-15",
"sales": [
{"rep": "Ana", "amount": 1200},
{"rep": "Bob", "amount": 1500}
],
"total": 2700
}
with open("report.json", "w") as f:
json.dump(report, f, indent=2) # indent=2 for readabilityJSON Tips
- Use
indent=2for human-readable output - JSON keys must be strings (not ints or tuples)
- Use
json.dumps()to convert to string without writing to file
Error Handling for Files
Files can be missing, locked, or corrupted. Handle errors gracefully.
try:
with open("data.csv", "r") as f:
content = f.read()
except FileNotFoundError:
print("Error: File not found")
except PermissionError:
print("Error: No permission to read file")
except Exception as e:
print("Unexpected error:", e)Cornerstone Project — Sales Report Generator (step-by-step)
Build a tool that reads sales data from CSV, calculates totals and top performers, and exports results to both JSON (for APIs) and TXT (for email). This is a real workflow analysts use to automate weekly reports.
Step 1 — Create sample input CSV
Simulate a sales export from your CRM.
import csv
# Create sample data
sales_data = [
{"rep": "Ana", "region": "North", "amount": 1200},
{"rep": "Bob", "region": "South", "amount": 1500},
{"rep": "Carol", "region": "North", "amount": 980},
{"rep": "Ana", "region": "North", "amount": 1100},
]
with open("sales.csv", "w", newline="") as f:
writer = csv.DictWriter(f, fieldnames=["rep", "region", "amount"])
writer.writeheader()
writer.writerows(sales_data)Step 2 — Read and parse CSV
Load data into a list of dicts for processing.
def load_sales(filename):
with open(filename, "r") as f:
reader = csv.DictReader(f)
return [row for row in reader]
sales = load_sales("sales.csv")
print("Loaded", len(sales), "records")Step 3 — Calculate totals by rep
Aggregate sales by rep name.
from collections import defaultdict
def totals_by_rep(sales):
totals = defaultdict(float)
for row in sales:
rep = row["rep"]
amount = float(row["amount"])
totals[rep] += amount
return dict(totals)
rep_totals = totals_by_rep(sales)
print(rep_totals) # {"Ana": 2300.0, "Bob": 1500.0, "Carol": 980.0}Step 4 — Find top performers
Sort by total sales descending.
def top_performers(totals, n=3):
sorted_reps = sorted(totals.items(), key=lambda x: x[1], reverse=True)
return sorted_reps[:n]
top_3 = top_performers(rep_totals)
print("Top performers:", top_3)Step 5 — Export to JSON
Save structured data for APIs or dashboards.
def export_json(totals, top, filename):
report = {
"date": "2024-01-15",
"total_sales": sum(totals.values()),
"rep_totals": totals,
"top_performers": [{"rep": r, "amount": a} for r, a in top]
}
with open(filename, "w") as f:
json.dump(report, f, indent=2)
print("JSON report saved to", filename)
export_json(rep_totals, top_3, "sales_report.json")Step 6 — Export to TXT for email
Create a human-readable summary.
def export_txt(totals, top, filename):
with open(filename, "w") as f:
f.write("Sales Report - Week of Jan 15\n")
f.write("=" * 40 + "\n\n")
total = sum(totals.values())
f.write("Total Sales: $" + str(round(total, 2)) + "\n\n")
f.write("Top Performers:\n")
for i, (rep, amount) in enumerate(top, 1):
f.write(" " + str(i) + ". " + rep + ": $" + str(round(amount, 2)) + "\n")
f.write("\nAll Reps:\n")
for rep, amount in sorted(totals.items()):
f.write(" • " + rep + ": $" + str(round(amount, 2)) + "\n")
print("Text report saved to", filename)
export_txt(rep_totals, top_3, "sales_report.txt")Step 7 — Put it all together
Combine into a single pipeline function.
def generate_sales_report(input_csv):
# Load
sales = load_sales(input_csv)
# Process
totals = totals_by_rep(sales)
top = top_performers(totals, n=3)
# Export
export_json(totals, top, "sales_report.json")
export_txt(totals, top, "sales_report.txt")
print("Report generation complete!")
# Run it
generate_sales_report("sales.csv")How this helps at work
- Weekly automation → run every Monday, email results to team
- Multi-format output → JSON for dashboards, TXT for humans
- Scalable → handles 10 rows or 10,000 rows the same way
- Reusable pattern → adapt for expenses, inventory, customer data
Key Takeaways
- with statement → auto-closes files, prevents leaks
- CSV → use DictReader/DictWriter for named columns
- JSON → structured data, perfect for APIs and configs
- Error handling → catch FileNotFoundError and PermissionError
- Encoding → always specify utf-8 for international text
- Cornerstone → sales report generator automates weekly workflows
Next Steps
You have mastered file I/O. Next, explore error handling to make your file scripts robust, or dive into working with libraries like pandas for advanced CSV/Excel processing.