sqlite-db-truncate

安装量: 36
排名: #19485

安装

npx skills add https://github.com/letta-ai/skills --skill sqlite-db-truncate

SQLite Truncated Database Recovery This skill provides systematic approaches for recovering data from SQLite database files that have been corrupted through binary truncation. It emphasizes understanding the SQLite file format before attempting recovery and avoiding common pitfalls that lead to multiple failed iterations. When to Use This Skill This skill applies when: A SQLite database file cannot be opened with standard sqlite3 commands The database error indicates corruption or malformed data File size is smaller than expected (suggesting truncation) Standard recovery tools like .recover command fail Manual binary parsing of SQLite page structure is required Initial Assessment Strategy Before writing any recovery code, perform a thorough analysis of the corrupted file: Step 1: Examine File Characteristics

Check file size and basic properties

ls -lh database.db file database.db

Create hex dump for analysis

hexdump
-C
database.db
|
head
-100
Key observations to make:
File size
SQLite pages are typically 4096 bytes. Check if size aligns with page boundaries
Magic bytes
Valid SQLite files start with "SQLite format 3\000" (16 bytes)
First byte after header
Identifies page type (0x0d = table leaf page with actual data)
Step 2: Identify Corruption Pattern
Common truncation scenarios:
Header-only file
Only the 100-byte header remains
Missing header
File starts with a data page (first byte is 0x0d, 0x05, 0x0a, or 0x02)
Partial page
File ends mid-page, truncating some cells If the file lacks the standard "SQLite format 3" magic header but starts with 0x0d, this indicates the file contains only a table leaf page without the database header. Step 3: Try Standard Tools First Always attempt standard recovery before manual parsing:

Check if sqlite3 can read the file

sqlite3 database.db ".schema" 2

&1 sqlite3 database.db "SELECT * FROM sqlite_master" 2

&1

Try built-in recovery

sqlite3 database.db ".recover"

recovered.sql 2

&1

Try integrity check

sqlite3 database.db "PRAGMA integrity_check;" If these fail with "database disk image is malformed" or similar errors, proceed to manual binary parsing. SQLite Page Structure Overview Understanding the page structure is essential before writing recovery code. Table Leaf Page Layout (Page Type 0x0d) Offset Size Description ------ ---- ----------- 0 1 Page type (0x0d for table leaf) 1 2 First freeblock offset (big-endian) 3 2 Number of cells on page (big-endian) 5 2 Cell content area start offset (big-endian) 7 1 Fragmented free bytes count 8+ varies Cell pointer array (2 bytes per cell, big-endian) ... [Gap/free space] End Cell data (grows backward from page end) Cell Structure Each cell contains a database row: [Payload size: varint] [Row ID: varint] [Header size: varint] [Serial type 1: varint] [Serial type 2: varint] ... [Column 1 value] [Column 2 value] ... Varint Encoding SQLite uses variable-length integers (varints): Bytes 1-8: Use 7 bits for data, high bit (0x80) indicates continuation Byte 9: Uses all 8 bits (no continuation) Serial Types Serial types indicate how to interpret column data: Type Size Meaning 0 0 NULL 1 1 8-bit signed integer 2 2 16-bit big-endian signed integer 3 3 24-bit big-endian signed integer 4 4 32-bit big-endian signed integer 7 8 IEEE 754 64-bit float (big-endian) 8 0 Integer constant 0 9 0 Integer constant 1 N >= 12, even (N-12)/2 BLOB N >= 13, odd (N-13)/2 Text string (UTF-8) Example: Serial type 0x21 (33) = text string of length (33-13)/2 = 10 bytes. Recovery Approach Build a Single, Modular Script Avoid creating multiple separate recovery scripts. Instead, build one script iteratively with clear debug output: import struct import json DEBUG = True def read_varint ( data , offset ) : """Read SQLite variable-length integer.""" value = 0 for i in range ( 9 ) : if offset + i

= len ( data ) : return None , offset byte = data [ offset + i ] if i == 8 : value = ( value << 8 ) | byte return value , offset + i + 1 value = ( value << 7 ) | ( byte & 0x7f ) if ( byte & 0x80 ) == 0 : return value , offset + i + 1 return value , offset def decode_value ( data , offset , serial_type ) : """Decode value based on serial type.""" if serial_type == 0 : return None , offset elif serial_type == 1 : return struct . unpack ( '>b' , data [ offset : offset + 1 ] ) [ 0 ] , offset + 1 elif serial_type == 2 : return struct . unpack ( '>h' , data [ offset : offset + 2 ] ) [ 0 ] , offset + 2 elif serial_type == 4 : return struct . unpack ( '>i' , data [ offset : offset + 4 ] ) [ 0 ] , offset + 4 elif serial_type == 7 : return struct . unpack ( '>d' , data [ offset : offset + 8 ] ) [ 0 ] , offset + 8 elif serial_type == 8 : return 0 , offset elif serial_type == 9 : return 1 , offset elif serial_type = 12 : if serial_type % 2 == 0 : length = ( serial_type - 12 ) // 2 return data [ offset : offset + length ] , offset + length else : length = ( serial_type - 13 ) // 2 return data [ offset : offset + length ] . decode ( 'utf-8' , errors = 'replace' ) , offset + length return None , offset Parse Incrementally with Debug Output Parse one cell completely and verify before processing all cells: def parse_cell ( data , cell_offset , debug = DEBUG ) : """Parse a single cell with detailed debug output.""" if debug : print ( f"\nParsing cell at offset { cell_offset } (0x { cell_offset : 04x } )" )

Read payload size

payload_size , offset = read_varint ( data , cell_offset ) if debug : print ( f" Payload size: { payload_size } " )

Read row ID

row_id , offset = read_varint ( data , offset ) if debug : print ( f" Row ID: { row_id } " )

Read header size

header_size , header_start = read_varint ( data , offset ) if debug : print ( f" Header size: { header_size } " )

Parse serial types

serial_types

[ ] current = header_start header_end = offset + header_size while current < header_end : st , current = read_varint ( data , current ) serial_types . append ( st ) if debug : print ( f" Serial types: { serial_types } " )

Parse values

values

[
]
for
st
in
serial_types
:
val
,
current
=
decode_value
(
data
,
current
,
st
)
values
.
append
(
val
)
if
debug
:
print
(
f" Values:
{
values
}
"
)
return
{
'row_id'
:
row_id
,
'values'
:
values
}
Common Pitfalls and Prevention
Pitfall 1: Not Understanding the Corruption Pattern
Mistake
Assuming the file has a standard SQLite header when it may only contain a data page.
Prevention
Always examine the first few bytes with hexdump. If the file starts with 0x0d instead of "SQLite format 3", the header is missing. Adjust parsing offsets accordingly (no 100-byte header offset needed).
Pitfall 2: Multiple Script Iterations
Mistake
Creating many separate recovery scripts (recover1.py, recover2.py, etc.) based on trial and error.
Prevention
:
Read the hex dump thoroughly first and annotate the structure manually
Build one script with debug flags
Reference the SQLite file format specification before coding
Pitfall 3: Reading Strings Beyond Their Boundaries
Mistake
Reading string data without checking the serial type length, resulting in incorrect strings (e.g., "testword052" instead of "testword05").
Prevention
Always calculate string length from serial type:
length = (serial_type - 13) // 2
. Read exactly that many bytes.
Pitfall 4: Syntax Errors in Generated Code
Mistake
Missing spaces in operators like
if48
instead of
if 48
, or
12and
instead of
12 and
.
Prevention
Validate syntax before running:
python3
-m
py_compile recovery_script.py
Pitfall 5: Wrong Byte Order
Mistake
Reading multi-byte integers with little-endian instead of big-endian.
Prevention
SQLite uses big-endian for all multi-byte integers. Always use
struct.unpack('>...', data)
with the
>
prefix.
Pitfall 6: Not Handling Truncation Gracefully
Mistake
Script crashes when encountering truncated data at end of file.
Prevention
Check bounds before every read operation: def safe_read ( data , offset , length ) : if offset + length

len ( data ) : return None return data [ offset : offset + length ] Verification Strategy Step 1: Validate Cell Count Compare the number of cells reported in the page header (offset 3-4) with actual cells found. Step 2: Validate Data Patterns If expected patterns are known (e.g., words matching "testwordXY"), verify extracted strings match the pattern. Step 3: Check Value Ranges Verify extracted numeric values are within expected ranges. Watch for: Unexpected negative numbers (sign bit interpretation) Very large numbers (byte order issues) NaN or infinity for floats Step 4: Compare with Expected Output Format Before finalizing output, ensure JSON structure matches requirements:

Validate output structure

for record in recovered_data : assert 'word' in record and 'value' in record assert isinstance ( record [ 'word' ] , str ) assert isinstance ( record [ 'value' ] , ( int , float ) ) Output Generation Format recovered data according to the required output specification: def generate_output ( recovered_rows , output_path ) : """Format and save recovered data.""" results = [ ] for row in recovered_rows : if len ( row [ 'values' ] )

= 2 : results . append ( { 'word' : row [ 'values' ] [ 0 ] , 'value' : row [ 'values' ] [ 1 ] } ) with open ( output_path , 'w' ) as f : json . dump ( results , f , indent = 2 ) print ( f"Recovered { len ( results ) } records to { output_path } " ) return results Summary Checklist Before writing recovery code: Examined file with hexdump to understand corruption extent Identified whether header is present or missing Tried standard SQLite tools first Reviewed SQLite file format specification During implementation: Using a single script with debug output (not multiple scripts) Validated Python syntax before running Using big-endian byte order for all multi-byte integers Calculating string lengths from serial types Handling truncation with bounds checking After recovery: Verified cell count matches expectation Validated string patterns if known Checked numeric value ranges Confirmed output format matches requirements

返回排行榜