Requirements

Name: EzEpoch
Author: Wil Hurley

PyTorch + Transformers Compatibility & Package Management

⚡ Auto-Generated Dependencies

Just click "Generate Requirements" and we'll create 66+ optimized packages with perfect compatibility. All conflicts are automatically resolved and installation order is optimized!

🔗 PyTorch + Transformers Compatibility

Select the best combination for your setup

▼

⚠️ Advanced Configuration

PyTorch and Transformers are already configured by default with optimal compatibility. Only change these if you have specific requirements or know what you're doing!

Compatibility Preset:

🔧 Individual Version Selection

Auto-Select Versions

Select specific versions or enable Auto to let the system choose compatible versions

🔥 PyTorch Version:

🤗 Transformers Version:

⚡ CUDA Version:

💡 These presets ensure PyTorch and Transformers versions work together perfectly. Choose based on your GPU and stability needs.

📦 Dependencies & Libraries

Required software packages and versions

Click "Generate Requirements" to create your dependency list.

🔑 API Keys & Access

Manage all your API keys and tokens in one place

🔐 Centralized Secure Storage

All API keys are encrypted and stored securely. Configure them once here, and they'll sync automatically to the Cloud Deploy section.

🤗 HuggingFace Token

Not configured

Your HuggingFace token is used to download and upload models to your account.

Get your token →

🐙 GitHub Token (Optional)

Not configured

GitHub token enables discovery of models from research repositories and increases API rate limits.

Create token →

🌐 Vast.ai API Key

Not configured

Your Vast.ai API key is used to browse and rent GPUs from their marketplace.

Get your API key →

🎮 RunPod API Key

Not configured

Your RunPod API key is used to create and manage GPU pods.

Get your API key →

💡 Request a Cloud Provider

Want us to add support for another GPU cloud provider? Let us know! We're actively expanding our integrations.

✅ Vast.ai ✅ RunPod 🔜 Lambda Labs 🔜 CoreWeave

💡

API Keys Sync Across Platform

Configure your API keys once here, and they'll automatically populate in:

☁️ Cloud Deploy - Full GPU browsing and management
⚡ Quick Deploy - Instant model training

Security: All keys are encrypted at rest and in transit. We never log or expose your API keys.

📚

Multi-Repository Support

EzEpoch searches multiple AI model repositories to give you access to the latest models:

🤗 HuggingFace Hub - 500K+ models (requires token for gated models)
🐙 GitHub Repositories - Research models and implementations (optional token)
📄 Papers with Code - State-of-the-art research models (no token needed)
🔥 PyTorch Hub - Native PyTorch models (no token needed)

AI Auto-Analysis: All discovered models get optimized settings generated automatically!

🤖 AI Model Library

Browse, search, and manage AI models from HuggingFace

🔍 100,000+ Models Available

Browse and search from the world's largest AI model library. Use filters to find models compatible with your GPU and training goals. Your favorites and recent models are automatically saved!

🔐 Repository Access Status

Connect to repositories to discover and access their models

🤗

HuggingFace Hub

500K+ models, gated access available

🔴 Not Connected

🐙

GitHub Repositories

Research models and implementations

🔴 Not Connected

📄

Papers with Code

State-of-the-art research models

🟢 Available

🔥

PyTorch Hub

Native PyTorch models

🟢 Available

📚 Browse Models

Select a repository and browse available models

Repository

Select a repository to view available models...

👤 User Profile

Manage your account information and preferences

🔐 Account Security

Update your profile information, manage your subscription, and configure notification preferences. All changes are saved automatically.

🚀

Founding Beta Program

Limited to 50 spots — refer friends who subscribe to keep your discount

Join the first builders shaping EzEpoch. Earn exclusive discounts by referring friends who actually subscribe — real users, real savings.

🔑 All tiers include access to 3 platforms: EzEpoch · MyEZSetup · DataLabPro

🏆 FOUNDING (1-25)

2 mo free + 30% off for life

2 paid referrals in 30 days

⚡ EARLY ACCESS (26-50)

1 mo free + 25% off 12 mo

1 paid referral in 30 days

🎯 ALL BETA SIGNUPS

25% off first 3 months

No referral needed

🎁 Bonus: every friend who subscribes through your link = +1 free month (up to 3)

Apply for Beta Access

What will you train?

Why EzEpoch? (optional)

📦 Build Package

Create a complete training package for cloud deployment

📦 One-Click Package Creation

Everything is ready! Just click "Create Package" and we'll generate a complete training package with all dependencies, scripts, monitoring tools, and your data folders. Ready for any cloud GPU platform!

📁 Package Contents

File	Status	Size	Description
requirements.txt	⏳ Pending	~20KB	Python dependencies (~22 selective packages, EzSetup validated)
main.py	✅ Complete	~5KB	Primary training script with repo-specific model loading
all.env	✅ Complete	~600B	API keys, model repos, and training configuration
setup.sh	✅ Complete	~750B	Environment setup script with EzSetup ordering + venv creation
README.md	✅ Complete	~1KB	Setup instructions and quick-start guide for RunPod/Vast.ai

🚀 Package Creation

☁️ Cloud GPU Deployment

Select a package and deploy to Vast.ai or RunPod with one click

🚀 One-Click Cloud Deployment

Connect your Vast.ai or RunPod account, browse available GPUs, and deploy your training package directly to the cloud.

📦

Deploy Package:

Provider:

GPU Type:

Disk:

Number of GPUs:

(Auto-set from package)

✅ Only Match Requirements 🚫 Hide Unavailable

🌐

Vast.ai Auto-Deploy

Searches marketplace for best matching GPU

Strategy: Try up to 15 offers

Sorted by: Price (cheapest first)

🎮

RunPod Auto-Deploy

Creates pod with your exact specifications

Min vCPUs

Min RAM

💡 Higher CPU/RAM = faster data loading & preprocessing

☁️ Your Running Instances

⏱️

Just created an instance?
New instances take 15-30 seconds to appear. Please refresh if needed.

🔌

SSH Connection Info:
Cloud providers can take 30-90 seconds for SSH to be fully ready after "running" status. Our system automatically retries with intelligent backoff if initial connection fails. This is normal!

📋 Full Requirements Details ▶

Connect your API keys above to browse available GPUs

📊 Training Dashboard

Monitor your active training sessions in real-time

📁 Your Training Sessions

Training Session

Running

📊 Training Progress

Step 0 / 0 0%

Current Loss

Learning Rate

GPU Utilization

GPU Memory

Epoch

Batch Size

ETA

Disk Usage

Progress

Speed

Tokens/sec

MFU

Cost

GPU Temp

Power

Checkpoint

🤖 AI Guardian & Analysis

🎯 Mode:

AI analysis will appear here once training data is available...

📋 AI Guardian Activity

No AI Guardian activity yet.

💬 Ask Training AI

▼

📊 Training Metrics

(3 charts)

📈 Training Loss

📉 Loss Stability (Variance)

📊 Learning Rate

▼

💾 GPU Health & Memory

(5 charts)

💾 GPU Memory Usage (GB)

📊 GPU Memory % (Static vs Dynamic)

🔬 Detailed Memory Breakdown

Model • Gradients • Optimizer • Activations • CUDA • Free

📈 Memory Timeline (Last 100 Steps)

⚡ GPU Utilization %

🌡️ GPU Temperature

⚡ GPU Power Draw (Watts)

▼

⚡ Performance & Efficiency

(4 charts)

🚀 Throughput (Steps/sec)

🔤 Tokens per Second

🎯 MFU (Model FLOPs Utilization)

💰 Cost Tracker

▼

🔬 Training Scattergram

(interactive)

vs colored by

Points colored from ■ early → ■ mid → ■ late in training

▶

🔬 Advanced Analytics

(5 charts)

📉 Gradient Norm

📊 Loss & Learning Rate

⏱️ ETA Projection

💾 Multi-GPU Balance

💾 Checkpoint Timeline

📊 Data Quality Metrics

Cleaning Sessions

+0%

Avg Improvement

Entries Processed

Duplicates Removed

Recent Cleaning Sessions

Loading cleaning history...

📜 Training Log

[00:00:00] Waiting for training to start...

Showing last 10 entries

💾 Checkpoints

Save points to resume training from

No checkpoints saved yet

Showing last 3 checkpoints

📦 Final Model Download

Download your trained model with optional quantization

⏳ Checking...

⏳ Model download will be available once training completes.

⚙️ Admin Panel

User management, usage tracking, subscriptions & promotions

Total Users

Active (7d)

Total Sessions

Training Hours

Paid Subscribers

Training Now

System: DB: ● S3: ● OpenAI: ● CPU: -% Memory: -%

All Users

Subscription Management

🎁 Gift Subscription / Free Month

Quick: Free 30-day gift (enter user first)

User (email or ID)

Plan

Duration (days)

Reason

➕ Add Individual Sessions

User (email or ID)

Sessions

Reason

Promotional Codes

🎟️ Create Promo Code

Code

Plan

Duration (days)

Max Uses

Reason / Campaign

Code Expires In (days)

Loading promos...

Training Sessions by User

All users ranked by total training session count

Total AI Cost (USD)

Total AI Calls

Total Tokens

Cost by User

Recent Transactions

Active Training Sessions

🔬 Memory Calibration Data

Approve or reject real-world GPU memory measurements collected from training runs. Approved entries improve batch size predictions.

🚀 Beta Applications

📧 Mass Email Broadcast

📋 Quick Templates

Compose Email

Subject

HTML Body (use for personalization)

Placeholders: = recipient name, = recipient email

Recipients

Select All (0)

Click "Refresh Recipients" to load users

💳 Subscription & Billing

Manage your plan, sessions, and billing

♾️ Unlimited Training - Netflix Model

All paid plans include unlimited training sessions. Train as much as you need - we don't host GPUs, you do. Our job is to make sure your training works the first time.

📊 Current Plan

Plan: Loading...

Monthly Cost: Loading...

Sessions This Month: 0

Sessions Available: Loading...

Next Billing: Loading...

🚀 Founding Beta — Unlock Full Access

Full access is yours — share if you love EzEpoch and earn monthly drawing entries!

$120+ FREE

2 months free + 30% off for life + direct founder access

♾️

Unlimited Sessions

Train as much as you want

📊

DataLab Pro

Data cleaning, AI pair gen, quantization

🔧

MyEzSetup

Dependency fix + API access

📧

Direct Founder Access

Email wil@ezepoch.com anytime

🎁 Share EzEpoch if you're enjoying it — each share = 1 drawing entry!

No shares yet

📣 Share EzEpoch — every post = 1 entry for a free month:

🎰 Monthly Free Month Drawing

Every social media share = 1 entry for 1 free month of EzEpoch, MyEZSetup & DataLab Pro. Drawings held once per month!

Your Entries

Share more to increase your chances. Winner drawn at the end of each month for 1 free month of all 3 products.

🎁 Every signup gets 1 free training session (up to 8B models) — no sharing required.

🏆 Founding Beta (spots 1-25): Instant access — 2 months free + 30% off for life
⚡ Early Access Beta (spots 26-50): Instant access — 1 month free + 25% off 12 months
🎯 After spot 50: 1 free 8B session + 25% off for 3 months

Loading beta status...

💰 Why EzEpoch Pays for Itself

Train unlimited with confidence - one saved failure pays for months of subscription

❌ Without EzEpoch

OOM errors waste $20-50+ in GPU time
Config mistakes require restart from scratch
Crashes lose hours of training progress
40% of training jobs fail industry-wide

✅ With EzEpoch

AI calculates perfect settings every time
Crash prevention catches issues before failure
Auto-recovery resumes from checkpoints
99% success rate - guaranteed

🚀 Available Plans

Choose the plan that fits your training needs

🆓 Free Trial

✅ 1 free training session

✅ Up to 8B models

✅ Full platform access

✅ AI Guardian monitoring

🚀 BETA PRICING

⚡ Essential

$79.99 $59/month

♾️ UNLIMITED Training Sessions

✅ Up to 20B models

✅ AI Guardian Auto-Pilot

✅ Crash prevention & recovery

🎁 EzSetup Included FREE

✅ Email support

🛒 Add-On Products & One-Time Purchases

Enhance your workflow with standalone tools or buy sessions as needed

🎯 Single Session

$9.99/one-time

✅ 1 training session

✅ Up to 70B models

✅ Full AI monitoring

✅ No subscription required

💡 Perfect for one-off training

🎯 Large Model Session

$19.99/one-time

✅ 1 training session

✅ Up to 200B models

✅ Full AI monitoring

✅ No subscription required

💡 For the biggest models

📊 DataLab Pro

$9.99/month

✅ Smart data cleaning & chunking

✅ AI-powered training pair generation

✅ Model quantization (8-bit, 4-bit, GGUF)

✅ RAG database builder

✅ Format converter (JSON/JSONL/CSV)

✅ Quality analysis with before/after scores

✅ Image & audio linking for multimodal

✅ Windows/Mac/Linux desktop app

💡 FREE with Pro+

🔧 EzSetup v1.0

$4.99/month

✅ AI dependency analysis & resolution

✅ Auto-fix version conflicts

✅ Correct install order detection

✅ Platform-specific handling

✅ PyTorch/CUDA compatibility checks

✅ Manual install flagging (flash-attn, etc)

❌ No API access (basic)

💡 FREE with Essential+

🔧 EzSetup API v1.0

$9.99/month

✅ Everything in EzSetup

✅ Full REST API access

✅ CI/CD pipeline integration

✅ Automated dependency fixing

✅ Bulk requirements processing

💡 FREE with Pro

💡 We Want You To Succeed!

Unlimited training means unlimited support - we're here to help you every step of the way

🔄 Crash Recovery

AI Guardian automatically analyzes crashes and restarts training with corrected settings. Resumes from last checkpoint - no progress lost!

🔁 Unlimited Restarts

Training keeps going until YOU succeed. No limits on restart attempts - our goal is your successful model.

🎯 Direct Developer Support

Stuck? You get direct access to the developer who built this platform. I'll personally help debug your setup, review your settings, and guide you to success.

📊 Dashboard Control

Monitor live metrics, adjust settings on-the-fly, and let AI Guardian optimize every 100 steps automatically.

🚀 So Easy, Anyone Can Do It!

Industry Standard: 40% of AI training jobs fail. With EzEpoch: 99% success rate because our AI calculates perfect settings and auto-recovers from crashes.

👨‍💻 Built by One Developer. For Real People.

The story behind EzEpoch

🧑‍💻

Hi, I'm Wil Hurley

EzEpoch is a one-person operation. I designed, built, and maintain every line of code — the AI training engine, the crash prevention system, the dependency resolver, DataLab Pro, and everything in between.

I built EzEpoch because I was frustrated watching people waste hundreds of dollars on failed training runs due to wrong configs, OOM crashes, and dependency nightmares. There had to be a better way.

When you subscribe, you're not paying a faceless corporation — you're supporting an indie developer who personally reads every support message and ships updates weekly. I care about every single user's success because your success is my success.

🛠️

100,000+ lines of code

🤖

AI Guardian patent pending

📊

3 products — 1 developer

💬

Direct support from the creator

📚 EzEpoch Help & Guide

Step-by-step visual guide to every feature — from first login to live training

🚀 Quick Start 🔐 Login 🔑 API Keys & HF Token ⚙️ Training Config 🔬 Advanced / Expert 📋 Requirements 📦 Package Builder ☁️ Cloud Deploy 🧙 Deployment Wizard 📊 Dashboard 🤗 AI Models 📖 Settings Dictionary

🚀 Quick Start — 5 Steps to Your First Training Run

From zero to a deployable training package in minutes

1️⃣ Log In

Create your account or sign in. Your session is remembered so you stay logged in.

2️⃣ Add API Keys

Add your HuggingFace token (required), plus Vast.ai or RunPod key for cloud GPU access.

3️⃣ Pick a Model & GPU

Choose your model. GPUs with a ⭐ star have enough memory for full fine-tuning.

4️⃣ Configure & Generate

Review the Training Config tab. Click Create Package — EzEpoch resolves all conflicts automatically.

5️⃣ Deploy to Cloud

Use the Deployment Wizard to rent a GPU, upload your package, and start training — all from the browser.

🔐 Login & Account

What to know

Use your email + password to sign in.
Forgot password? Use the reset link on the login page.
All your API keys and training configs are saved to your account and encrypted at rest.
You can stay logged in across sessions — your data is always waiting for you.

🔑 API Keys & HuggingFace Token

Connect your cloud accounts — all keys are encrypted and stored securely

The 4 HuggingFace token permission levels

🤗 How to create a HuggingFace token (required)

Go to huggingface.co/settings/tokens
Click + Create new token
Give it a name (e.g. "EzEpoch")
Choose a permission level:
• Read — download public & gated models (recommended for most users)
• Write — upload fine-tuned models back to HF Hub
• Fine-grained — custom per-repo permissions
• Legacy — older token format (avoid for new tokens)
Click Generate token and copy it (starts with hf_)
Paste it into the HuggingFace Token field in EzEpoch and click Test & Save

Other API keys

Vast.ai — get your key from cloud.vast.ai/account
RunPod — get your key from runpod.io → Settings
GitHub — optional, for repo access. Use a token with repo scope.
All keys are encrypted before storage. Use 👁 to reveal, or Clear to remove.

⚙️ Training Configuration

Select your model, GPU, dataset, and method — EzEpoch auto-optimises the rest

Default view — clean and simple

All settings expanded for full control

⭐ GPU Star System

Stars appear on GPUs with enough VRAM for full fine-tuning with AdamW. No star = use LoRA/QLoRA.

🤖 Auto-Optimization

Incompatible settings are greyed out automatically. The system picks the best defaults for your model + GPU combo.

📁 Dataset Upload

Upload a JSONL file or paste a HuggingFace dataset path. The system validates format automatically.

🔢 Batch Size

Set manually or use Auto — the AI probes your GPU and finds the largest safe batch size.

🔬 Advanced & Expert Settings

Fine-tune precision, optimizer, scheduler, and low-level hyperparameters

Advanced tab — precision & optimizer settings

Expert tab — scheduler, warmup, weight decay

🎯 Precision

bf16 (recommended for Ampere+), fp16, or fp32. BF16 uses less memory with no accuracy loss.

⚡ Optimizer

AdamW (default), 8-bit Adam (saves memory), Adafactor (very low memory), or SGD.

📉 Scheduler

Cosine, linear, constant, or warmup variants. Cosine with warmup is recommended for most fine-tuning.

💾 Gradient Checkpointing

Reduces VRAM by recomputing activations during backward pass. Slightly slower but essential for large models.

📋 Requirements Tab

Auto-generated requirements.txt — conflict-free and CUDA-aware

What EzEpoch does for you

Generates a complete requirements.txt matched to your model + training method
Resolves version conflicts between torch, transformers, peft, and other packages
Handles CUDA 11.8 / 12.1 / CPU-only variants automatically
Flags packages requiring special install steps (flash-attn, xformers, bitsandbytes)
Add extra packages — the system validates them before including
All dependencies install in the correct order in your deploy script

📦 Package Builder

One click generates a complete, ready-to-run training package

What's inside every package

train.py — full fine-tuning script with your exact config
requirements.txt — conflict-free, versioned dependencies
install.sh — installs packages in the correct order
run.sh — launch script with all env vars set
config.json — all training hyperparameters captured

✅ Conflict-free guarantee: Every package is validated before download. If there's a conflict the system tells you exactly what to change.

☁️ Cloud Deployment Tab

Manage your Vast.ai / RunPod instances from inside EzEpoch

What you can do here

Browse available GPUs on Vast.ai and RunPod with live pricing
Filter by VRAM, cost/hr, location, and CUDA version
Rent a GPU instance directly — no separate website needed
See your active instances: status, cost, start time
SSH into a running instance for manual access
Destroy instances when training completes to stop billing

🧙 Deployment Wizard — 3-Step Auto-Deploy

Rent GPU → Upload package → Start training. Fully automated.

Step 1 — Select GPU & provider

Step 2 — Review & confirm

Step 3 — Launch & monitor

Step 1 — Choose GPU

Pick from Vast.ai or RunPod. Filter by VRAM required, price per hour, and CUDA version.

Step 2 — Review & Confirm

See the full cost estimate, instance specs, and package contents before committing. Estimated training time shown.

Step 3 — Auto-Launch

EzEpoch rents the GPU, uploads your package, and starts training. Live progress streams to your Dashboard.

📊 Training Dashboard

Real-time training metrics, live logs, and GPU stats — embedded or standalone

Embedded dashboard — inside EzEpoch

Standalone dashboard — dashboard.ezepoch.com

📈 Live Loss Curve

Training and validation loss plotted in real time. Divergence is flagged automatically.

🖥️ GPU Stats

VRAM usage, GPU utilisation %, temperature, and power draw — all streaming live.

📜 Live Logs

Full training stdout/stderr streamed to the browser. Click any line to jump to that epoch.

🌐 Standalone

Visit dashboard.ezepoch.com to monitor from any device.

🤗 AI Models & Repos

Browse, search, and manage HuggingFace models & your saved repos

What you can do here

Search HuggingFace Hub for models by name, task, or architecture
See model size, license, downloads, and VRAM requirements at a glance
Save favourite models to your account for quick access
Link HuggingFace repos to auto-push fine-tuned weights after training
View gated model access status — EzEpoch checks your HF token has permission

📖 Settings Dictionary

Quick reference for every training parameter

Core Settings

Epochs: Full passes through your training dataset
Batch size: Samples processed per gradient update step
Learning rate: Step size for weight updates (1e-4 to 1e-5 typical for fine-tuning)
Max seq len: Maximum token length per sample — longer = more VRAM
Grad accum: Accumulate N gradients before updating — simulates larger batch sizes

Training Methods

Full Fine-tune: All model weights updated. Best quality, needs most VRAM.
LoRA: Trains small adapters alongside frozen base model. ~10-30% VRAM vs full.
QLoRA: LoRA on a 4-bit quantised base. Run 70B models on a single 24GB GPU.
DoRA: Direction + magnitude LoRA variant. Often better than standard LoRA.

Multi-GPU Strategies

DDP: Data Parallel. Each GPU holds a full model copy. Simplest, best throughput.
ZeRO-2: Shards optimizer states + gradients. Cuts VRAM ~50% vs DDP.
ZeRO-3: Also shards model weights. Required for very large models across GPUs.
FSDP: PyTorch native sharding. Good for models too large for a single GPU.

LoRA Parameters

Rank (r): Adapter size. Higher = more capacity but more VRAM. 8-64 typical.
Alpha: Scales the LoRA update. Usually set equal to rank or 2x rank.
Dropout: Regularisation for adapter layers. 0.05-0.1 typical.
Target modules: Which layers get adapters. EzEpoch auto-selects the optimal set.

Share & Earn Rewards!

Training Configuration

🎯 Project Name

🎯 ESSENTIALS

📊 Optimal Batch Size

⚙️ Advanced Settings

⚙️ Core Training Settings

⚡ Performance Optimizations

📊 Data Processing Settings

🎓 Expert Options

Performance Toggles

🤖 AI Monitoring & Analysis

💻 Batch Size Calculator

Requirements

🔗 PyTorch + Transformers Compatibility

🔧 Individual Version Selection

📦 Dependencies & Libraries

🔑 API Keys & Access

🤗 HuggingFace Token

🐙 GitHub Token (Optional)

🌐 Vast.ai API Key

🎮 RunPod API Key

💡 Request a Cloud Provider

API Keys Sync Across Platform

Multi-Repository Support

🤖 AI Model Library

🔐 Repository Access Status

HuggingFace Hub

GitHub Repositories

Papers with Code

PyTorch Hub

📚 Browse Models

👤 User Profile

Founding Beta Program

Apply for Beta Access

Application Pending

Your Referral Link

📝 Personal Information

📞 Contact Information

🤖 AI Monitoring Preferences

🔐 API Hash & Sessions

📦 Build Package

📁 Package Contents

🚀 Package Creation

Creating Package...

✅ Package Created Successfully!

☁️ Cloud GPU Deployment

Vast.ai Auto-Deploy

RunPod Auto-Deploy

☁️ Your Running Instances

🔑 Your Custom SSH Providers

🚀 Deploy Training Job

Select Your GPU

Selected GPU:

Review Configuration

Model Settings

GPU Settings

Memory Optimizations

Instance Status

Upload Training Data (Optional)

Ready to Deploy!

📊 Deployment Summary

Deployment Progress

📊 Training Dashboard

📁 Your Training Sessions

Training Session

📊 Training Progress

🤖 AI Guardian & Analysis

📋 AI Guardian Activity

📝 Analysis History

📊 Training Metrics

📈 Training Loss

📉 Loss Stability (Variance)

📊 Learning Rate

💾 GPU Health & Memory

💾 GPU Memory Usage (GB)

📊 GPU Memory % (Static vs Dynamic)

🔬 Detailed Memory Breakdown

📈 Memory Timeline (Last 100 Steps)

⚡ GPU Utilization %