File Handling

File handling connects your Python scripts to real data: CSV exports from Excel, JSON from APIs, text logs from systems. Business analysts read files to analyze data, write files to share results, and automate report generation. Master file I/O and you never manually copy-paste data again.

Estimated reading time: 25–30 minutes

File Operations

  • open() with 'r' → read existing files
  • open() with 'w' → create/overwrite files
  • open() with 'a' → append to existing files
  • with statement → auto-close files (safer)

Great for: loading data, saving reports

Common Formats

  • CSV → tabular data (Excel exports, databases)
  • JSON → structured data (APIs, configs)
  • TXT → logs, notes, simple reports
  • Encoding → use utf-8 for international characters

Great for: ETL pipelines, data integration

Reading and Writing Text Files

Text files are the simplest format. Use the with statement to auto-close files.

python
# Writing a text file
with open("report.txt", "w") as f:
    f.write("Sales Report\n")
    f.write("Total: $125,000\n")

# Reading a text file
with open("report.txt", "r") as f:
    content = f.read()
    print(content)

# Reading line by line (memory efficient)
with open("report.txt", "r") as f:
    for line in f:
        print(line.strip())

Working with CSV Files

CSV is the standard for tabular data. Python csv module handles parsing and writing.

python
import csv

# Reading CSV
with open("sales.csv", "r") as f:
    reader = csv.DictReader(f)  # returns dicts with column names as keys
    for row in reader:
        print(row['name'], ":", row['amount'])

# Writing CSV
data = [
    {"name": "Ana", "amount": 1200},
    {"name": "Bob", "amount": 1500},
]

with open("output.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["name", "amount"])
    writer.writeheader()
    writer.writerows(data)

CSV Best Practices

  • Use DictReader / DictWriter for named columns (clearer than lists)
  • Always set newline="" when writing (prevents blank rows on Windows)
  • Specify encoding="utf-8" for international characters

Working with JSON Files

JSON stores structured data (dicts, lists). Common for API responses and configs.

python
import json

# Reading JSON
with open("config.json", "r") as f:
    data = json.load(f)
    print(data["api_key"])

# Writing JSON
report = {
    "date": "2024-01-15",
    "sales": [
        {"rep": "Ana", "amount": 1200},
        {"rep": "Bob", "amount": 1500}
    ],
    "total": 2700
}

with open("report.json", "w") as f:
    json.dump(report, f, indent=2)  # indent=2 for readability

JSON Tips

  • Use indent=2 for human-readable output
  • JSON keys must be strings (not ints or tuples)
  • Use json.dumps() to convert to string without writing to file

Error Handling for Files

Files can be missing, locked, or corrupted. Handle errors gracefully.

python
try:
    with open("data.csv", "r") as f:
        content = f.read()
except FileNotFoundError:
    print("Error: File not found")
except PermissionError:
    print("Error: No permission to read file")
except Exception as e:
    print("Unexpected error:", e)

Cornerstone Project — Sales Report Generator (step-by-step)

Build a tool that reads sales data from CSV, calculates totals and top performers, and exports results to both JSON (for APIs) and TXT (for email). This is a real workflow analysts use to automate weekly reports.

Step 1 — Create sample input CSV

Simulate a sales export from your CRM.

python
import csv

# Create sample data
sales_data = [
    {"rep": "Ana", "region": "North", "amount": 1200},
    {"rep": "Bob", "region": "South", "amount": 1500},
    {"rep": "Carol", "region": "North", "amount": 980},
    {"rep": "Ana", "region": "North", "amount": 1100},
]

with open("sales.csv", "w", newline="") as f:
    writer = csv.DictWriter(f, fieldnames=["rep", "region", "amount"])
    writer.writeheader()
    writer.writerows(sales_data)

Step 2 — Read and parse CSV

Load data into a list of dicts for processing.

python
def load_sales(filename):
    with open(filename, "r") as f:
        reader = csv.DictReader(f)
        return [row for row in reader]

sales = load_sales("sales.csv")
print("Loaded", len(sales), "records")

Step 3 — Calculate totals by rep

Aggregate sales by rep name.

python
from collections import defaultdict

def totals_by_rep(sales):
    totals = defaultdict(float)
    for row in sales:
        rep = row["rep"]
        amount = float(row["amount"])
        totals[rep] += amount
    return dict(totals)

rep_totals = totals_by_rep(sales)
print(rep_totals)  # {"Ana": 2300.0, "Bob": 1500.0, "Carol": 980.0}

Step 4 — Find top performers

Sort by total sales descending.

python
def top_performers(totals, n=3):
    sorted_reps = sorted(totals.items(), key=lambda x: x[1], reverse=True)
    return sorted_reps[:n]

top_3 = top_performers(rep_totals)
print("Top performers:", top_3)

Step 5 — Export to JSON

Save structured data for APIs or dashboards.

python
def export_json(totals, top, filename):
    report = {
        "date": "2024-01-15",
        "total_sales": sum(totals.values()),
        "rep_totals": totals,
        "top_performers": [{"rep": r, "amount": a} for r, a in top]
    }
    
    with open(filename, "w") as f:
        json.dump(report, f, indent=2)
    
    print("JSON report saved to", filename)

export_json(rep_totals, top_3, "sales_report.json")

Step 6 — Export to TXT for email

Create a human-readable summary.

python
def export_txt(totals, top, filename):
    with open(filename, "w") as f:
        f.write("Sales Report - Week of Jan 15\n")
        f.write("=" * 40 + "\n\n")
        
        total = sum(totals.values())
        f.write("Total Sales: $" + str(round(total, 2)) + "\n\n")
        
        f.write("Top Performers:\n")
        for i, (rep, amount) in enumerate(top, 1):
            f.write("  " + str(i) + ". " + rep + ": $" + str(round(amount, 2)) + "\n")
        
        f.write("\nAll Reps:\n")
        for rep, amount in sorted(totals.items()):
            f.write("  • " + rep + ": $" + str(round(amount, 2)) + "\n")
    
    print("Text report saved to", filename)

export_txt(rep_totals, top_3, "sales_report.txt")

Step 7 — Put it all together

Combine into a single pipeline function.

python
def generate_sales_report(input_csv):
    # Load
    sales = load_sales(input_csv)
    
    # Process
    totals = totals_by_rep(sales)
    top = top_performers(totals, n=3)
    
    # Export
    export_json(totals, top, "sales_report.json")
    export_txt(totals, top, "sales_report.txt")
    
    print("Report generation complete!")

# Run it
generate_sales_report("sales.csv")

How this helps at work

  • Weekly automation → run every Monday, email results to team
  • Multi-format output → JSON for dashboards, TXT for humans
  • Scalable → handles 10 rows or 10,000 rows the same way
  • Reusable pattern → adapt for expenses, inventory, customer data

Key Takeaways

  • with statement → auto-closes files, prevents leaks
  • CSV → use DictReader/DictWriter for named columns
  • JSON → structured data, perfect for APIs and configs
  • Error handling → catch FileNotFoundError and PermissionError
  • Encoding → always specify utf-8 for international text
  • Cornerstone → sales report generator automates weekly workflows

Next Steps

You have mastered file I/O. Next, explore error handling to make your file scripts robust, or dive into working with libraries like pandas for advanced CSV/Excel processing.