date-normalizer

安装量: 38
排名: #18497

安装

npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill date-normalizer

Date Normalizer

Parse and normalize dates from various formats into consistent, standardized formats for data cleaning and ETL pipelines.

Purpose

Date standardization for:

Data cleaning and ETL pipelines Database imports with mixed date formats Log file parsing and analysis International data harmonization Report generation with consistent dating Features Smart Parsing: Automatically detect and parse 100+ date formats Format Conversion: Convert to ISO 8601, US, EU, or custom formats Batch Processing: Normalize entire CSV columns Ambiguity Detection: Flag dates that could be interpreted multiple ways Timezone Handling: Convert and normalize timezones Relative Dates: Parse "today", "yesterday", "next week" Validation: Detect and report invalid dates Quick Start from date_normalizer import DateNormalizer

Normalize single date

normalizer = DateNormalizer() result = normalizer.normalize("03/14/2024") print(result) # {'normalized': '2024-03-14', 'format': 'iso8601'}

Normalize to specific format

result = normalizer.normalize("March 14, 2024", output_format="us") print(result) # {'normalized': '03/14/2024', 'format': 'us'}

Batch normalize CSV column

normalizer.normalize_csv( 'data.csv', date_column='created_at', output='normalized.csv', output_format='iso8601' )

CLI Usage

Normalize single date

python date_normalizer.py --date "March 14, 2024"

Convert to specific format

python date_normalizer.py --date "14/03/2024" --format us

Normalize CSV column

python date_normalizer.py --csv data.csv --column date --format iso8601 --output normalized.csv

Detect ambiguous dates

python date_normalizer.py --date "01/02/03" --detect-ambiguous

API Reference DateNormalizer class DateNormalizer: def normalize(self, date_string: str, output_format: str = 'iso8601', dayfirst: bool = False, yearfirst: bool = False) -> Dict def normalize_batch(self, dates: List[str], kwargs) -> List[Dict] def normalize_csv(self, csv_path: str, date_column: str, output: str = None, kwargs) -> str def detect_format(self, date_string: str) -> str def is_valid(self, date_string: str) -> bool def is_ambiguous(self, date_string: str) -> bool def parse_relative(self, relative_string: str) -> datetime

Output Formats

ISO 8601 (default):

'2024-03-14' # Date only '2024-03-14T15:30:00' # With time '2024-03-14T15:30:00+00:00' # With timezone

US Format:

'03/14/2024' # MM/DD/YYYY

EU Format:

'14/03/2024' # DD/MM/YYYY

Long Format:

'March 14, 2024'

Custom Format:

normalizer.normalize(date, output_format='%Y%m%d') # '20240314'

Supported Input Formats

Numeric:

2024-03-14 (ISO) 03/14/2024 (US) 14/03/2024 (EU) 14.03.2024 (German) 2024/03/14 (Japanese) 20240314 (Compact)

Textual:

March 14, 2024 14 March 2024 Mar 14, 2024 14-Mar-2024

Relative:

today, yesterday, tomorrow next week, last month 2 days ago, in 3 weeks

With Time:

2024-03-14 15:30:00 03/14/2024 3:30 PM 2024-03-14T15:30:00Z Ambiguity Handling

Dates like 01/02/03 are ambiguous. Specify interpretation:

Day first (EU)

normalizer.normalize("01/02/03", dayfirst=True)

Result: 2003-02-01

Month first (US)

normalizer.normalize("01/02/03", dayfirst=False)

Result: 2003-01-02

Year first

normalizer.normalize("01/02/03", yearfirst=True)

Result: 2001-02-03

Use Cases

Clean Messy Data:

messy_dates = [ "March 14, 2024", "2024-03-15", "03/16/2024", "17-Mar-2024" ]

normalized = normalizer.normalize_batch(messy_dates)

All converted to: ['2024-03-14', '2024-03-15', '2024-03-16', '2024-03-17']

CSV Normalization:

Input CSV with mixed date formats

Convert all to ISO 8601

normalizer.normalize_csv( 'orders.csv', date_column='order_date', output='orders_normalized.csv', output_format='iso8601' )

Validation:

if not normalizer.is_valid("invalid date"): print("Invalid date detected")

Timezone Conversion:

normalizer.normalize( "2024-03-14 15:30:00+00:00", output_timezone='America/New_York' )

Limitations Cannot parse dates from images or PDFs (use OCR first) Ambiguous dates require manual specification of format Very old dates (<1900) may have limited support Non-Gregorian calendars not supported Some regional formats may need explicit configuration

返回排行榜