PYTHON | My Site

Python projects

Project: Cats Dataset

Dataset Gatos

Context

This project demonstrates the application of Exploratory Data Analysis (EDA) and information visualization using the Python language. The main objective was to extract valuable insights and present the fundamental characteristics of a dataset about cats, transforming raw records into understandable knowledge.

Data source

Public database obtained via Kaggle, containing detailed records about cats, including variables such as breed, age, weight, color, and gender.

Link: https://www.kaggle.com/datasets/waqi786/cats-dataset

Data preparation

The structuring and cleaning were performed using the Pandas library. The process involved: Manipulation and Organization: Loading the CSV file and verifying data integrity. Aggregation: Grouping information by categories (race and gender) to enable statistical analyses.

Tool used

Technology: Python.

Libraries: * Pandas: For data manipulation and processing.

Matplotlib & Seaborn: For creating and aesthetically customizing graphs, ensuring visual clarity and professionalism.

Conclusion

The analysis was carried out using three distinct visualization techniques:

1. Age Distribution (Histogram): Reveals the age frequency of the cats, allowing identification of the predominant age range in the sample.

2. Average Weight by Breed (Bar Chart): Facilitates direct visual comparison and immediate identification of the heaviest and lightest breeds.

3. Analysis by Color and Gender (Count Chart): A bivariate analysis that helps to understand the distribution and possible correlations between coat colors and the gender of the animals.

Recommended actions

1. Care Segmentation: Use the predominant age range identified in the histogram to target health campaigns or specific products for that age.

2. Dietary Standardization: Adjust nutritional recommendations based on the average weight by breed chart, focusing on breeds that tend to be overweight.

3. Market Analysis: Cross-reference the distribution of colors and genders to understand adoption preferences or pet market trends.

Analysis: Sales and Profit

Análise: Vendas e Lucro

Context

This project aims to address the lack of clarity regarding sales volume and the real financial impact per product category. The objective was to transform a raw list of transactions into a strategic analysis using Python, allowing the identification of which items have the highest inventory turnover and which categories are the pillars of total revenue.

Data source

Operational database containing records of products sold, quantities, and unit values, simulating the sales flow of a retail operation.

Data preparation

The structuring and processing were performed using Pandas, focusing on:

Data Modeling: Creation of DataFrames to organize the raw data structure.

Automatic Calculations: Generation of calculated revenue columns (Quantity × Unit Value) to enable financial weight analysis.

Tool used

Technology: Developed entirely in Python.

Libraries: * Pandas: For data manipulation, cleaning, and aggregation.

Matplotlib & Seaborn: For generating graphs with professional aesthetics and coordinated color palettes, facilitating technical interpretation.

Conclusion

The analysis yielded two fundamental insights through distinct visualizations:

1. Inventory Turnover (Units Sold): Identification that the Accessories category (Keyboards and Mice) has the highest sales volume, being essential for daily cash flow.

2. Revenue Mix (Sector Chart): Revealed that, despite the lower sales volume, the Electronics category dominates total revenue due to the high unit value, representing the highest margin in the business.

Recommended actions

The analysis yielded two fundamental insights through distinct visualizations:

1. Inventory Turnover (Units Sold): Identification that the Accessories category (Keyboards and Mice) has the highest sales volume, being essential for daily cash flow.