Initial commit to git.yoin
This commit is contained in:
198
csv-data-summarizer/README.md
Normal file
198
csv-data-summarizer/README.md
Normal file
@@ -0,0 +1,198 @@
|
||||
<div align="center">
|
||||
|
||||
[-4F46E5?style=for-the-badge)](https://www.skool.com/ai-for-your-business)
|
||||
[](https://github.com/coffeefuelbump)
|
||||
|
||||
[](https://linktr.ee/corbin_brown)
|
||||
[](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
# 📊 CSV Data Summarizer - Claude Skill
|
||||
|
||||
A powerful Claude Skill that automatically analyzes CSV files and generates comprehensive insights with visualizations. Upload any CSV and get instant, intelligent analysis without being asked what you want!
|
||||
|
||||
<div align="center">
|
||||
|
||||
[](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill)
|
||||
[](https://www.python.org/)
|
||||
[](LICENSE)
|
||||
|
||||
</div>
|
||||
|
||||
## 🚀 Features
|
||||
|
||||
- **🤖 Intelligent & Adaptive** - Automatically detects data type (sales, customer, financial, survey, etc.) and applies relevant analysis
|
||||
- **📈 Comprehensive Analysis** - Generates statistics, correlations, distributions, and trends
|
||||
- **🎨 Auto Visualizations** - Creates multiple charts based on what's in your data:
|
||||
- Time-series plots for date-based data
|
||||
- Correlation heatmaps for numeric relationships
|
||||
- Distribution histograms
|
||||
- Categorical breakdowns
|
||||
- **⚡ Proactive** - No questions asked! Just upload CSV and get complete analysis immediately
|
||||
- **🔍 Data Quality Checks** - Automatically detects and reports missing values
|
||||
- **📊 Multi-Industry Support** - Adapts to e-commerce, healthcare, finance, operations, surveys, and more
|
||||
|
||||
## 📥 Quick Download
|
||||
|
||||
<div align="center">
|
||||
|
||||
### Get Started in 2 Steps
|
||||
|
||||
**1️⃣ Download the Skill**
|
||||
[](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/csv-data-summarizer.zip)
|
||||
|
||||
**2️⃣ Try the Demo Data**
|
||||
[](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/raw/main/examples/showcase_financial_pl_data.csv)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## 📦 What's Included
|
||||
|
||||
```
|
||||
csv-data-summarizer-claude-skill/
|
||||
├── SKILL.md # Claude Skill definition
|
||||
├── analyze.py # Comprehensive analysis engine
|
||||
├── requirements.txt # Python dependencies
|
||||
├── examples/
|
||||
│ └── showcase_financial_pl_data.csv # Demo P&L financial dataset (15 months, 25 metrics)
|
||||
└── resources/
|
||||
├── sample.csv # Example dataset
|
||||
└── README.md # Usage documentation
|
||||
```
|
||||
|
||||
## 🎯 How It Works
|
||||
|
||||
1. **Upload** any CSV file to Claude.ai
|
||||
2. **Skill activates** automatically when CSV is detected
|
||||
3. **Analysis runs** immediately - inspects data structure and adapts
|
||||
4. **Results delivered** - Complete analysis with multiple visualizations
|
||||
|
||||
No prompting needed. No options to choose. Just instant, comprehensive insights!
|
||||
|
||||
## 📥 Installation
|
||||
|
||||
### For Claude.ai Users
|
||||
|
||||
1. Download the latest release: [`csv-data-summarizer.zip`](https://github.com/coffeefuelbump/csv-data-summarizer-claude-skill/releases)
|
||||
2. Go to [Claude.ai](https://claude.ai) → Settings → Capabilities → Skills
|
||||
3. Upload the zip file
|
||||
4. Enable the skill
|
||||
5. Done! Upload any CSV and watch it work ✨
|
||||
|
||||
### For Developers
|
||||
|
||||
```bash
|
||||
git clone git@github.com:coffeefuelbump/csv-data-summarizer-claude-skill.git
|
||||
cd csv-data-summarizer-claude-skill
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
## 📊 Sample Dataset Highlights
|
||||
|
||||
The included demo CSV contains **15 months of P&L data** with:
|
||||
- 3 product lines (SaaS, Enterprise, Services)
|
||||
- 25 financial metrics including revenue, expenses, margins, CAC, LTV
|
||||
- Quarterly trends showing business growth
|
||||
- Perfect for showcasing time-series analysis, correlations, and financial insights
|
||||
|
||||
## 🎨 Example Use Cases
|
||||
|
||||
- **📊 Sales Data** → Revenue trends, product performance, regional analysis
|
||||
- **👥 Customer Data** → Demographics, segmentation, geographic patterns
|
||||
- **💰 Financial Data** → Transaction analysis, trend detection, correlations
|
||||
- **⚙️ Operational Data** → Performance metrics, time-series analysis
|
||||
- **📋 Survey Data** → Response distributions, cross-tabulations
|
||||
|
||||
## 🛠️ Technical Details
|
||||
|
||||
**Dependencies:**
|
||||
- Python 3.8+
|
||||
- pandas 2.0+
|
||||
- matplotlib 3.7+
|
||||
- seaborn 0.12+
|
||||
|
||||
**Visualizations Generated:**
|
||||
- Time-series trend plots
|
||||
- Correlation heatmaps
|
||||
- Distribution histograms
|
||||
- Categorical bar charts
|
||||
|
||||
## 📝 Example Output
|
||||
|
||||
```
|
||||
============================================================
|
||||
📊 DATA OVERVIEW
|
||||
============================================================
|
||||
Rows: 100 | Columns: 15
|
||||
|
||||
📋 DATA TYPES:
|
||||
• order_date: object
|
||||
• total_revenue: float64
|
||||
• customer_segment: object
|
||||
...
|
||||
|
||||
🔍 DATA QUALITY:
|
||||
✓ No missing values - dataset is complete!
|
||||
|
||||
📈 NUMERICAL ANALYSIS:
|
||||
[Summary statistics for all numeric columns]
|
||||
|
||||
🔗 CORRELATIONS:
|
||||
[Correlation matrix showing relationships]
|
||||
|
||||
📅 TIME SERIES ANALYSIS:
|
||||
Date range: 2024-01-05 to 2024-04-11
|
||||
Span: 97 days
|
||||
|
||||
📊 VISUALIZATIONS CREATED:
|
||||
✓ correlation_heatmap.png
|
||||
✓ time_series_analysis.png
|
||||
✓ distributions.png
|
||||
✓ categorical_distributions.png
|
||||
```
|
||||
|
||||
## 🌟 Connect & Learn More
|
||||
|
||||
<div align="center">
|
||||
|
||||
[-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about)
|
||||
|
||||
[](https://linktr.ee/corbin_brown)
|
||||
|
||||
[](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
|
||||
|
||||
[](https://twitter.com/corbin_braun)
|
||||
|
||||
</div>
|
||||
|
||||
## 🤝 Contributing
|
||||
|
||||
Contributions are welcome! Feel free to:
|
||||
- Report bugs
|
||||
- Suggest new features
|
||||
- Submit pull requests
|
||||
- Share your use cases
|
||||
|
||||
## 📄 License
|
||||
|
||||
MIT License - feel free to use this skill for personal or commercial projects!
|
||||
|
||||
## 🙏 Acknowledgments
|
||||
|
||||
Built for the Claude Skills platform by [Anthropic](https://www.anthropic.com/news/skills).
|
||||
|
||||
---
|
||||
|
||||
<div align="center">
|
||||
|
||||
**Made with ❤️ for the AI community**
|
||||
|
||||
⭐ Star this repo if you find it useful!
|
||||
|
||||
</div>
|
||||
|
||||
148
csv-data-summarizer/SKILL.md
Normal file
148
csv-data-summarizer/SKILL.md
Normal file
@@ -0,0 +1,148 @@
|
||||
---
|
||||
name: csv-data-summarizer
|
||||
description: CSV数据分析技能。使用Python和pandas分析CSV文件,生成统计摘要和快速可视化图表。当用户上传或提到CSV文件、需要分析表格数据时自动使用。
|
||||
metadata:
|
||||
version: "2.1.0"
|
||||
dependencies: python>=3.8, pandas>=2.0.0, matplotlib>=3.7.0, seaborn>=0.12.0
|
||||
---
|
||||
|
||||
# CSV 数据分析器
|
||||
|
||||
此技能分析 CSV 文件并提供包含统计洞察和可视化的全面摘要。
|
||||
|
||||
## 何时使用此技能
|
||||
|
||||
当用户:
|
||||
- 上传或提到 CSV 文件
|
||||
- 要求汇总、分析或可视化表格数据
|
||||
- 请求从 CSV 数据中获取洞察
|
||||
- 想了解数据结构和质量
|
||||
|
||||
## 工作原理
|
||||
|
||||
## ⚠️ 关键行为要求 ⚠️
|
||||
|
||||
**不要问用户想用数据做什么。**
|
||||
**不要提供选项或选择。**
|
||||
**不要说"您想让我帮您做什么?"**
|
||||
**不要列出可能的分析选项。**
|
||||
|
||||
**立即自动执行:**
|
||||
1. 运行全面分析
|
||||
2. 生成所有相关可视化
|
||||
3. 展示完整结果
|
||||
4. 不提问、不给选项、不等待用户输入
|
||||
|
||||
**用户想要立即获得完整分析 - 直接做就行。**
|
||||
|
||||
### 自动分析步骤:
|
||||
|
||||
**该技能通过先检查数据,然后确定最相关的分析,智能适应不同的数据类型和行业。**
|
||||
|
||||
1. **加载并检查** CSV 文件到 pandas DataFrame
|
||||
2. **识别数据结构** - 列类型、日期列、数值列、类别
|
||||
3. **根据数据内容确定相关分析**:
|
||||
- **销售/电商数据**(订单日期、收入、产品):时间序列趋势、收入分析、产品表现
|
||||
- **客户数据**(人口统计、细分、区域):分布分析、细分、地理模式
|
||||
- **财务数据**(交易、金额、日期):趋势分析、统计摘要、相关性
|
||||
- **运营数据**(时间戳、指标、状态):时间序列、绩效指标、分布
|
||||
- **调查数据**(分类响应、评分):频率分析、交叉表、分布
|
||||
- **通用表格数据**:根据找到的列类型调整
|
||||
|
||||
4. **只创建对特定数据集有意义的可视化**:
|
||||
- 时间序列图仅在存在日期/时间戳列时
|
||||
- 相关性热图仅在存在多个数值列时
|
||||
- 类别分布仅在存在分类列时
|
||||
- 数值分布的直方图(相关时)
|
||||
|
||||
5. **自动生成全面输出**包括:
|
||||
- 数据概览(行数、列数、类型)
|
||||
- 与数据类型相关的关键统计和指标
|
||||
- 缺失数据分析
|
||||
- 多个相关可视化(仅适用的那些)
|
||||
- 基于此特定数据集中发现的模式的可操作洞察
|
||||
|
||||
6. **一次性展示所有内容** - 不追问
|
||||
|
||||
**适应示例:**
|
||||
- 带患者ID的医疗数据 → 专注于人口统计、治疗模式、时间趋势
|
||||
- 带库存水平的库存数据 → 专注于数量分布、补货模式、SKU分析
|
||||
- 带时间戳的网站分析 → 专注于流量模式、转化指标、时段分析
|
||||
- 调查响应 → 专注于响应分布、人口统计细分、情感模式
|
||||
|
||||
### 行为指南
|
||||
|
||||
✅ **正确方法 - 这样说:**
|
||||
- "我现在对这些数据进行全面分析。"
|
||||
- "这是带可视化的完整分析:"
|
||||
- "我识别出这是[类型]数据并生成了相关洞察:"
|
||||
- 然后立即展示完整分析
|
||||
|
||||
✅ **要做:**
|
||||
- 立即运行分析脚本
|
||||
- 自动生成所有相关图表
|
||||
- 无需询问即提供完整洞察
|
||||
- 在第一次响应中就做到全面完整
|
||||
- 果断行动,不需征求许可
|
||||
|
||||
❌ **永远不要说这些话:**
|
||||
- "您想用这些数据做什么?"
|
||||
- "您想让我帮您做什么?"
|
||||
- "这里有一些常见选项:"
|
||||
- "让我知道您想要什么帮助"
|
||||
- "如果您愿意,我可以创建全面分析!"
|
||||
- 任何以"?"结尾询问用户方向的句子
|
||||
- 任何选项或选择列表
|
||||
- 任何条件性的"如果您想,我可以做X"
|
||||
|
||||
❌ **禁止行为:**
|
||||
- 询问用户想要什么
|
||||
- 列出选项供用户选择
|
||||
- 在分析前等待用户指示
|
||||
- 提供需要后续跟进的部分分析
|
||||
- 描述你可以做什么而不是直接做
|
||||
|
||||
### 使用方法
|
||||
|
||||
该技能提供 Python 函数 `summarize_csv(file_path)`:
|
||||
- 接受 CSV 文件的路径
|
||||
- 返回带统计信息的全面文本摘要
|
||||
- 根据数据结构自动生成多个可视化
|
||||
|
||||
### 示例提示
|
||||
|
||||
> "这是 `sales_data.csv`。你能汇总这个文件吗?"
|
||||
|
||||
> "分析这个客户数据 CSV 并展示趋势。"
|
||||
|
||||
> "你能从 `orders.csv` 中发现什么洞察?"
|
||||
|
||||
### 示例输出
|
||||
|
||||
**数据集概览**
|
||||
- 5,000 行 × 8 列
|
||||
- 3 个数值列,1 个日期列
|
||||
|
||||
**统计摘要**
|
||||
- 平均订单价值:$58.2
|
||||
- 标准差:$12.4
|
||||
- 缺失值:2%(100个单元格)
|
||||
|
||||
**洞察**
|
||||
- 销售随时间呈上升趋势
|
||||
- Q4活动达到峰值
|
||||
*(附:趋势图)*
|
||||
|
||||
## 文件
|
||||
|
||||
- `analyze.py` - 核心分析逻辑
|
||||
- `requirements.txt` - Python 依赖
|
||||
- `resources/sample.csv` - 用于测试的示例数据集
|
||||
- `resources/README.md` - 附加文档
|
||||
|
||||
## 注意事项
|
||||
|
||||
- 自动检测日期列(名称中包含 'date' 的列)
|
||||
- 优雅处理缺失数据
|
||||
- 仅在存在日期列时生成可视化
|
||||
- 所有数值列都包含在统计摘要中
|
||||
182
csv-data-summarizer/analyze.py
Normal file
182
csv-data-summarizer/analyze.py
Normal file
@@ -0,0 +1,182 @@
|
||||
import pandas as pd
|
||||
import matplotlib.pyplot as plt
|
||||
import seaborn as sns
|
||||
from pathlib import Path
|
||||
|
||||
def summarize_csv(file_path):
|
||||
"""
|
||||
Comprehensively analyzes a CSV file and generates multiple visualizations.
|
||||
|
||||
Args:
|
||||
file_path (str): Path to the CSV file
|
||||
|
||||
Returns:
|
||||
str: Formatted comprehensive analysis of the dataset
|
||||
"""
|
||||
df = pd.read_csv(file_path)
|
||||
summary = []
|
||||
charts_created = []
|
||||
|
||||
# Basic info
|
||||
summary.append("=" * 60)
|
||||
summary.append("📊 DATA OVERVIEW")
|
||||
summary.append("=" * 60)
|
||||
summary.append(f"Rows: {df.shape[0]:,} | Columns: {df.shape[1]}")
|
||||
summary.append(f"\nColumns: {', '.join(df.columns.tolist())}")
|
||||
|
||||
# Data types
|
||||
summary.append(f"\n📋 DATA TYPES:")
|
||||
for col, dtype in df.dtypes.items():
|
||||
summary.append(f" • {col}: {dtype}")
|
||||
|
||||
# Missing data analysis
|
||||
missing = df.isnull().sum().sum()
|
||||
missing_pct = (missing / (df.shape[0] * df.shape[1])) * 100
|
||||
summary.append(f"\n🔍 DATA QUALITY:")
|
||||
if missing:
|
||||
summary.append(f"Missing values: {missing:,} ({missing_pct:.2f}% of total data)")
|
||||
summary.append("Missing by column:")
|
||||
for col in df.columns:
|
||||
col_missing = df[col].isnull().sum()
|
||||
if col_missing > 0:
|
||||
col_pct = (col_missing / len(df)) * 100
|
||||
summary.append(f" • {col}: {col_missing:,} ({col_pct:.1f}%)")
|
||||
else:
|
||||
summary.append("✓ No missing values - dataset is complete!")
|
||||
|
||||
# Numeric analysis
|
||||
numeric_cols = df.select_dtypes(include='number').columns.tolist()
|
||||
if numeric_cols:
|
||||
summary.append(f"\n📈 NUMERICAL ANALYSIS:")
|
||||
summary.append(str(df[numeric_cols].describe()))
|
||||
|
||||
# Correlations if multiple numeric columns
|
||||
if len(numeric_cols) > 1:
|
||||
summary.append(f"\n🔗 CORRELATIONS:")
|
||||
corr_matrix = df[numeric_cols].corr()
|
||||
summary.append(str(corr_matrix))
|
||||
|
||||
# Create correlation heatmap
|
||||
plt.figure(figsize=(10, 8))
|
||||
sns.heatmap(corr_matrix, annot=True, cmap='coolwarm', center=0,
|
||||
square=True, linewidths=1)
|
||||
plt.title('Correlation Heatmap')
|
||||
plt.tight_layout()
|
||||
plt.savefig('correlation_heatmap.png', dpi=150)
|
||||
plt.close()
|
||||
charts_created.append('correlation_heatmap.png')
|
||||
|
||||
# Categorical analysis
|
||||
categorical_cols = df.select_dtypes(include=['object']).columns.tolist()
|
||||
categorical_cols = [c for c in categorical_cols if 'id' not in c.lower()]
|
||||
|
||||
if categorical_cols:
|
||||
summary.append(f"\n📊 CATEGORICAL ANALYSIS:")
|
||||
for col in categorical_cols[:5]: # Limit to first 5
|
||||
value_counts = df[col].value_counts()
|
||||
summary.append(f"\n{col}:")
|
||||
for val, count in value_counts.head(10).items():
|
||||
pct = (count / len(df)) * 100
|
||||
summary.append(f" • {val}: {count:,} ({pct:.1f}%)")
|
||||
|
||||
# Time series analysis
|
||||
date_cols = [c for c in df.columns if 'date' in c.lower() or 'time' in c.lower()]
|
||||
if date_cols:
|
||||
summary.append(f"\n📅 TIME SERIES ANALYSIS:")
|
||||
date_col = date_cols[0]
|
||||
df[date_col] = pd.to_datetime(df[date_col], errors='coerce')
|
||||
|
||||
date_range = df[date_col].max() - df[date_col].min()
|
||||
summary.append(f"Date range: {df[date_col].min()} to {df[date_col].max()}")
|
||||
summary.append(f"Span: {date_range.days} days")
|
||||
|
||||
# Create time-series plots for numeric columns
|
||||
if numeric_cols:
|
||||
fig, axes = plt.subplots(min(3, len(numeric_cols)), 1,
|
||||
figsize=(12, 4 * min(3, len(numeric_cols))))
|
||||
if len(numeric_cols) == 1:
|
||||
axes = [axes]
|
||||
|
||||
for idx, num_col in enumerate(numeric_cols[:3]):
|
||||
ax = axes[idx] if len(numeric_cols) > 1 else axes[0]
|
||||
daily_data = df.groupby(date_col)[num_col].agg(['mean', 'sum', 'count'])
|
||||
daily_data['mean'].plot(ax=ax, label='Average', linewidth=2)
|
||||
ax.set_title(f'{num_col} Over Time')
|
||||
ax.set_xlabel('Date')
|
||||
ax.set_ylabel(num_col)
|
||||
ax.legend()
|
||||
ax.grid(True, alpha=0.3)
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig('time_series_analysis.png', dpi=150)
|
||||
plt.close()
|
||||
charts_created.append('time_series_analysis.png')
|
||||
|
||||
# Distribution plots for numeric columns
|
||||
if numeric_cols:
|
||||
n_cols = min(4, len(numeric_cols))
|
||||
fig, axes = plt.subplots(2, 2, figsize=(12, 10))
|
||||
axes = axes.flatten()
|
||||
|
||||
for idx, col in enumerate(numeric_cols[:4]):
|
||||
axes[idx].hist(df[col].dropna(), bins=30, edgecolor='black', alpha=0.7)
|
||||
axes[idx].set_title(f'Distribution of {col}')
|
||||
axes[idx].set_xlabel(col)
|
||||
axes[idx].set_ylabel('Frequency')
|
||||
axes[idx].grid(True, alpha=0.3)
|
||||
|
||||
# Hide unused subplots
|
||||
for idx in range(len(numeric_cols[:4]), 4):
|
||||
axes[idx].set_visible(False)
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig('distributions.png', dpi=150)
|
||||
plt.close()
|
||||
charts_created.append('distributions.png')
|
||||
|
||||
# Categorical distributions
|
||||
if categorical_cols:
|
||||
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
|
||||
axes = axes.flatten()
|
||||
|
||||
for idx, col in enumerate(categorical_cols[:4]):
|
||||
value_counts = df[col].value_counts().head(10)
|
||||
axes[idx].barh(range(len(value_counts)), value_counts.values)
|
||||
axes[idx].set_yticks(range(len(value_counts)))
|
||||
axes[idx].set_yticklabels(value_counts.index)
|
||||
axes[idx].set_title(f'Top Values in {col}')
|
||||
axes[idx].set_xlabel('Count')
|
||||
axes[idx].grid(True, alpha=0.3, axis='x')
|
||||
|
||||
# Hide unused subplots
|
||||
for idx in range(len(categorical_cols[:4]), 4):
|
||||
axes[idx].set_visible(False)
|
||||
|
||||
plt.tight_layout()
|
||||
plt.savefig('categorical_distributions.png', dpi=150)
|
||||
plt.close()
|
||||
charts_created.append('categorical_distributions.png')
|
||||
|
||||
# Summary of visualizations
|
||||
if charts_created:
|
||||
summary.append(f"\n📊 VISUALIZATIONS CREATED:")
|
||||
for chart in charts_created:
|
||||
summary.append(f" ✓ {chart}")
|
||||
|
||||
summary.append("\n" + "=" * 60)
|
||||
summary.append("✅ COMPREHENSIVE ANALYSIS COMPLETE")
|
||||
summary.append("=" * 60)
|
||||
|
||||
return "\n".join(summary)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Test with sample data
|
||||
import sys
|
||||
if len(sys.argv) > 1:
|
||||
file_path = sys.argv[1]
|
||||
else:
|
||||
file_path = "resources/sample.csv"
|
||||
|
||||
print(summarize_csv(file_path))
|
||||
|
||||
46
csv-data-summarizer/examples/showcase_financial_pl_data.csv
Normal file
46
csv-data-summarizer/examples/showcase_financial_pl_data.csv
Normal file
@@ -0,0 +1,46 @@
|
||||
month,year,quarter,product_line,total_revenue,cost_of_goods_sold,gross_profit,gross_margin_pct,marketing_expense,sales_expense,rd_expense,admin_expense,total_operating_expenses,operating_income,operating_margin_pct,interest_expense,tax_expense,net_income,net_margin_pct,customer_acquisition_cost,customer_lifetime_value,units_sold,avg_selling_price,headcount,revenue_per_employee
|
||||
Jan,2023,Q1,SaaS Platform,450000,135000,315000,70.0,65000,85000,45000,35000,230000,85000,18.9,5000,16000,64000,14.2,125,2400,1200,375,45,10000
|
||||
Jan,2023,Q1,Enterprise Solutions,280000,112000,168000,60.0,35000,55000,25000,20000,135000,33000,11.8,3000,6600,23400,8.4,450,8500,450,622,45,6222
|
||||
Jan,2023,Q1,Professional Services,125000,50000,75000,60.0,15000,22000,8000,12000,57000,18000,14.4,1500,3600,12900,10.3,200,3200,95,1316,45,2778
|
||||
Feb,2023,Q1,SaaS Platform,475000,142500,332500,70.0,68000,89000,47000,36000,240000,92500,19.5,5200,18500,68800,14.5,120,2500,1300,365,47,10106
|
||||
Feb,2023,Q1,Enterprise Solutions,295000,118000,177000,60.0,38000,58000,27000,22000,145000,32000,10.8,3200,6400,22400,7.6,440,8600,470,628,47,6277
|
||||
Feb,2023,Q1,Professional Services,135000,54000,81000,60.0,16000,24000,9000,13000,62000,19000,14.1,1600,3800,13600,10.1,195,3300,105,1286,47,2872
|
||||
Mar,2023,Q1,SaaS Platform,520000,156000,364000,70.0,75000,95000,52000,40000,262000,102000,19.6,5500,19250,77250,14.9,115,2650,1450,359,50,10400
|
||||
Mar,2023,Q1,Enterprise Solutions,325000,130000,195000,60.0,42000,63000,30000,25000,160000,35000,10.8,3500,7000,24500,7.5,425,8800,520,625,50,6500
|
||||
Mar,2023,Q1,Professional Services,148000,59200,88800,60.0,18000,26000,10000,14000,68000,20800,14.1,1800,4160,14840,10.0,190,3400,115,1287,50,2960
|
||||
Apr,2023,Q2,SaaS Platform,555000,166500,388500,70.0,80000,100000,55000,42000,277000,111500,20.1,5800,22300,83400,15.0,110,2750,1550,358,52,10673
|
||||
Apr,2023,Q2,Enterprise Solutions,340000,136000,204000,60.0,45000,65000,32000,26000,168000,36000,10.6,3700,7200,25100,7.4,420,9000,540,630,52,6538
|
||||
Apr,2023,Q2,Professional Services,158000,63200,94800,60.0,19000,27000,11000,15000,72000,22800,14.4,1900,4560,16340,10.3,185,3500,125,1264,52,3038
|
||||
May,2023,Q2,SaaS Platform,590000,177000,413000,70.0,85000,105000,58000,44000,292000,121000,20.5,6000,24200,90800,15.4,105,2850,1650,358,55,10727
|
||||
May,2023,Q2,Enterprise Solutions,365000,146000,219000,60.0,48000,68000,35000,28000,179000,40000,11.0,4000,8000,28000,7.7,410,9200,580,629,55,6636
|
||||
May,2023,Q2,Professional Services,172000,68800,103200,60.0,21000,29000,12000,16000,78000,25200,14.7,2100,5040,18060,10.5,180,3600,135,1274,55,3127
|
||||
Jun,2023,Q2,SaaS Platform,625000,187500,437500,70.0,90000,110000,62000,46000,308000,129500,20.7,6200,25850,97450,15.6,100,2950,1750,357,58,10776
|
||||
Jun,2023,Q2,Enterprise Solutions,385000,154000,231000,60.0,50000,70000,37000,29000,186000,45000,11.7,4200,9000,31800,8.3,400,9400,610,631,58,6638
|
||||
Jun,2023,Q2,Professional Services,185000,74000,111000,60.0,22000,31000,13000,17000,83000,28000,15.1,2200,5580,20220,10.9,175,3700,145,1276,58,3190
|
||||
Jul,2023,Q3,SaaS Platform,665000,199500,465500,70.0,95000,115000,65000,48000,323000,142500,21.4,6500,28500,107500,16.2,95,3050,1850,359,60,11083
|
||||
Jul,2023,Q3,Enterprise Solutions,410000,164000,246000,60.0,53000,73000,40000,31000,197000,49000,12.0,4400,9800,34800,8.5,390,9600,650,631,60,6833
|
||||
Jul,2023,Q3,Professional Services,198000,79200,118800,60.0,24000,33000,14000,18000,89000,29800,15.1,2400,5960,21440,10.8,170,3800,155,1277,60,3300
|
||||
Aug,2023,Q3,SaaS Platform,705000,211500,493500,70.0,100000,120000,68000,50000,338000,155500,22.1,6800,31100,117600,16.7,90,3150,1950,362,63,11190
|
||||
Aug,2023,Q3,Enterprise Solutions,435000,174000,261000,60.0,56000,76000,42000,33000,207000,54000,12.4,4600,10800,38600,8.9,380,9800,690,630,63,6905
|
||||
Aug,2023,Q3,Professional Services,210000,84000,126000,60.0,25000,35000,15000,19000,94000,32000,15.2,2500,6400,23100,11.0,165,3900,165,1273,63,3333
|
||||
Sep,2023,Q3,SaaS Platform,750000,225000,525000,70.0,108000,128000,72000,53000,361000,164000,21.9,7200,33360,123440,16.5,88,3250,2080,360,65,11538
|
||||
Sep,2023,Q3,Enterprise Solutions,465000,186000,279000,60.0,60000,80000,45000,35000,220000,59000,12.7,5000,11800,42200,9.1,370,10000,735,633,65,7154
|
||||
Sep,2023,Q3,Professional Services,225000,90000,135000,60.0,27000,37000,16000,20000,100000,35000,15.6,2700,6920,25380,11.3,160,4000,175,1286,65,3462
|
||||
Oct,2023,Q4,SaaS Platform,795000,238500,556500,70.0,115000,135000,75000,55000,380000,176500,22.2,7500,35870,133130,16.7,85,3350,2200,361,68,11691
|
||||
Oct,2023,Q4,Enterprise Solutions,490000,196000,294000,60.0,63000,83000,47000,36000,229000,65000,13.3,5200,13000,46800,9.6,360,10200,770,636,68,7206
|
||||
Oct,2023,Q4,Professional Services,238000,95200,142800,60.0,29000,39000,17000,21000,106000,36800,15.5,2800,7360,26640,11.2,158,4100,185,1286,68,3500
|
||||
Nov,2023,Q4,SaaS Platform,840000,252000,588000,70.0,122000,142000,78000,58000,400000,188000,22.4,7800,38440,141760,16.9,82,3450,2320,362,70,12000
|
||||
Nov,2023,Q4,Enterprise Solutions,520000,208000,312000,60.0,67000,87000,50000,38000,242000,70000,13.5,5500,14100,50400,9.7,355,10400,815,638,70,7429
|
||||
Nov,2023,Q4,Professional Services,252000,100800,151200,60.0,31000,41000,18000,22000,112000,39200,15.6,3000,7728,28472,11.3,155,4200,195,1292,70,3600
|
||||
Dec,2023,Q4,SaaS Platform,895000,268500,626500,70.0,130000,150000,82000,62000,424000,202500,22.6,8200,41145,153155,17.1,80,3550,2480,361,72,12431
|
||||
Dec,2023,Q4,Enterprise Solutions,555000,222000,333000,60.0,72000,92000,53000,40000,257000,76000,13.7,6000,15400,54600,9.8,350,10600,870,638,72,7708
|
||||
Dec,2023,Q4,Professional Services,268000,107200,160800,60.0,33000,43000,19000,23000,118000,42800,16.0,3200,8352,31248,11.7,152,4300,205,1307,72,3722
|
||||
Jan,2024,Q1,SaaS Platform,925000,277500,647500,70.0,135000,155000,85000,64000,439000,208500,22.5,8500,42070,157930,17.1,78,3650,2550,363,75,12333
|
||||
Jan,2024,Q1,Enterprise Solutions,575000,230000,345000,60.0,75000,95000,55000,42000,267000,78000,13.6,6200,15760,56040,9.7,345,10800,900,639,75,7667
|
||||
Jan,2024,Q1,Professional Services,280000,112000,168000,60.0,34000,45000,20000,24000,123000,45000,16.1,3300,8770,32930,11.8,150,4400,215,1302,75,3733
|
||||
Feb,2024,Q1,SaaS Platform,965000,289500,675500,70.0,140000,160000,88000,66000,454000,221500,23.0,8800,44510,168190,17.4,75,3750,2660,363,77,12532
|
||||
Feb,2024,Q1,Enterprise Solutions,600000,240000,360000,60.0,78000,98000,57000,43000,276000,84000,14.0,6400,16800,60800,10.1,340,11000,940,638,77,7792
|
||||
Feb,2024,Q1,Professional Services,295000,118000,177000,60.0,36000,47000,21000,25000,129000,48000,16.3,3500,9420,35080,11.9,148,4500,225,1311,77,3831
|
||||
Mar,2024,Q1,SaaS Platform,1020000,306000,714000,70.0,148000,168000,92000,69000,477000,237000,23.2,9200,47880,179920,17.6,73,3850,2810,363,80,12750
|
||||
Mar,2024,Q1,Enterprise Solutions,635000,254000,381000,60.0,82000,103000,60000,45000,290000,91000,14.3,6800,18200,66000,10.4,335,11200,990,641,80,7938
|
||||
Mar,2024,Q1,Professional Services,312000,124800,187200,60.0,38000,49000,22000,26000,135000,52200,16.7,3700,10230,38270,12.3,145,4600,240,1300,80,3900
|
||||
|
4
csv-data-summarizer/requirements.txt
Normal file
4
csv-data-summarizer/requirements.txt
Normal file
@@ -0,0 +1,4 @@
|
||||
pandas>=2.0.0
|
||||
matplotlib>=3.7.0
|
||||
seaborn>=0.12.0
|
||||
|
||||
83
csv-data-summarizer/resources/README.md
Normal file
83
csv-data-summarizer/resources/README.md
Normal file
@@ -0,0 +1,83 @@
|
||||
# CSV Data Summarizer - Resources
|
||||
|
||||
---
|
||||
|
||||
## 🌟 Connect & Learn More
|
||||
|
||||
<div align="center">
|
||||
|
||||
### 🚀 **Join Our Community**
|
||||
[-blue?style=for-the-badge&logo=data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSIyNCIgaGVpZ2h0PSIyNCIgdmlld0JveD0iMCAwIDI0IDI0IiBmaWxsPSJ3aGl0ZSI+PHBhdGggZD0iTTEyIDJDNi40OCAyIDIgNi40OCAyIDEyczQuNDggMTAgMTAgMTAgMTAtNC40OCAxMC0xMFMxNy41MiAyIDEyIDJ6bTAgM2MxLjY2IDAgMyAxLjM0IDMgM3MtMS4zNCAzLTMgMy0zLTEuMzQtMy0zIDEuMzQtMyAzLTN6bTAgMTQuMmMtMi41IDAtNC43MS0xLjI4LTYtMy4yMi4wMy0xLjk5IDQtMy4wOCA2LTMuMDggMS45OSAwIDUuOTcgMS4wOSA2IDMuMDgtMS4yOSAxLjk0LTMuNSAzLjIyLTYgMy4yMnoiLz48L3N2Zz4=)](https://www.skool.com/ai-for-your-business/about)
|
||||
|
||||
### 🔗 **All My Links**
|
||||
[](https://linktr.ee/corbin_brown)
|
||||
|
||||
### 🛠️ **Become a Builder**
|
||||
[](https://www.youtube.com/channel/UCJFMlSxcvlZg5yZUYJT0Pug/join)
|
||||
|
||||
### 🐦 **Follow on Twitter**
|
||||
[](https://twitter.com/corbin_braun)
|
||||
|
||||
</div>
|
||||
|
||||
---
|
||||
|
||||
## Sample Data
|
||||
|
||||
The `sample.csv` file contains example sales data with the following columns:
|
||||
|
||||
- **date**: Transaction date
|
||||
- **product**: Product name (Widget A, B, or C)
|
||||
- **quantity**: Number of items sold
|
||||
- **revenue**: Total revenue from the transaction
|
||||
- **customer_id**: Unique customer identifier
|
||||
- **region**: Geographic region (North, South, East, West)
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Summary
|
||||
```
|
||||
Analyze sample.csv
|
||||
```
|
||||
|
||||
### With Custom CSV
|
||||
```
|
||||
Here's my sales_data.csv file. Can you summarize it?
|
||||
```
|
||||
|
||||
### Focus on Specific Insights
|
||||
```
|
||||
What are the revenue trends in this dataset?
|
||||
```
|
||||
|
||||
## Testing the Skill
|
||||
|
||||
You can test the skill locally before uploading to Claude:
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r ../requirements.txt
|
||||
|
||||
# Run the analysis
|
||||
python ../analyze.py sample.csv
|
||||
```
|
||||
|
||||
## Expected Output
|
||||
|
||||
The analysis will provide:
|
||||
|
||||
1. **Dataset dimensions** - Row and column counts
|
||||
2. **Column information** - Names and data types
|
||||
3. **Summary statistics** - Mean, median, std dev, min/max for numeric columns
|
||||
4. **Data quality** - Missing value detection and counts
|
||||
5. **Visualizations** - Time-series plots when date columns are present
|
||||
|
||||
## Customization
|
||||
|
||||
To adapt this skill for your specific use case:
|
||||
|
||||
1. Modify `analyze.py` to include domain-specific calculations
|
||||
2. Add custom visualization types in the plotting section
|
||||
3. Include validation rules specific to your data
|
||||
4. Add more sample datasets to test different scenarios
|
||||
|
||||
22
csv-data-summarizer/resources/sample.csv
Normal file
22
csv-data-summarizer/resources/sample.csv
Normal file
@@ -0,0 +1,22 @@
|
||||
date,product,quantity,revenue,customer_id,region
|
||||
2024-01-15,Widget A,5,129.99,C001,North
|
||||
2024-01-16,Widget B,3,89.97,C002,South
|
||||
2024-01-17,Widget A,7,181.98,C003,East
|
||||
2024-01-18,Widget C,2,199.98,C001,North
|
||||
2024-01-19,Widget B,4,119.96,C004,West
|
||||
2024-01-20,Widget A,6,155.94,C005,South
|
||||
2024-01-21,Widget C,1,99.99,C002,South
|
||||
2024-01-22,Widget B,8,239.92,C006,East
|
||||
2024-01-23,Widget A,3,77.97,C007,North
|
||||
2024-01-24,Widget C,5,499.95,C003,East
|
||||
2024-01-25,Widget B,2,59.98,C008,West
|
||||
2024-01-26,Widget A,9,233.91,C004,West
|
||||
2024-01-27,Widget C,3,299.97,C009,North
|
||||
2024-01-28,Widget B,6,179.94,C010,South
|
||||
2024-01-29,Widget A,4,103.96,C005,South
|
||||
2024-01-30,Widget C,7,699.93,C011,East
|
||||
2024-01-31,Widget B,5,149.95,C012,West
|
||||
2024-02-01,Widget A,8,207.92,C013,North
|
||||
2024-02-02,Widget C,2,199.98,C014,South
|
||||
2024-02-03,Widget B,10,299.90,C015,East
|
||||
|
||||
|
Reference in New Issue
Block a user