Peek's purpose is to dissect a file and display its content in a hierarchical way. It runs entirely in a web browser on the client computer (your computer), without any connection to the server.
Peek is experimental. It relies on technologies Calerga has developed for other products, such as Sysquake js, and for projects for its customers, or uses itself internally. Peek's feature set is incomplete and its user interface is a little bit rough. Nevertheless, we think it can be useful to analyze files, to realize the amount of metadata some formats like MS Office or PDF contain and the privacy leaks they can cause, and maybe to better understand malware reaching your mailbox.
Data displayed by Peek can be intimidating. Unless you're familiar with file formats, some jargon will seem meaningless. But the user interface is very simple: open a file with the File menu, or just drag and drop a file to the browser window, and Peek will happily work for you without further question. At least some metadata should be easy to spot. Peek doesn't change your files, doesn't execute them, doesn't send them to anybody.
Some formats let files embed subfiles. For instance a PDF document can contain images stored as JPEG. Peek doesn't dissect everything at once, because for some large files, it can take a while and clutter your display with information you may not be interested in. What “large” means depends on the file itself and on your computer; typically files up to a few megabytes are very fast to parse. You can dissect further elements displayed with a wheel icon by clicking them, or select the menu View>Analyze All. Most links are internal cross-references, or let you extract data as a file if they have a file icon .
File formats
Here is a brief overview of the file formats recognized by Peek.
- ar
- AR files are flat archives, typically used for static libraries on Linux. They are also used internally for Debian packages (.deb).
- asf
- ASF (Advanced Systems Format) is the multimedia format of Microsoft; it's used for video (wmv) and audio (wma) files. ASF files contain one or more media streams, such as audio or video. Metadata and streams are stored in objects, data subsets whose purpose is identified by a 16-byte globally-unique ID (GUID). Some object types are nested.
- bmp
- BMP files are Windows bitmap files.
- class
- Class files are compiled Java. Each file corresponds to a separate class (a kind of module in object-oriented programming languages).
- cpio
- cpio files are archive files on unix and unix-like systems, similar to TAR but less common. The cpio format is used internally in RPM packages.
- cur
- CUR files are Windows cursor files. They share the same format as ICO files, except for the meaning of cursor-specific fields (the position of the hotspot).
- dbf
- DBF files are dBase database files. They contain a single table with the field definitions and the records, possibly with deleted records.
- deb
- Debian packages are a way to distribute software on many Linux distributions, such as Debian and Ubuntu. They are made of ar and tar archives compressed with gzip.
- dll
- Dynamic-link libraries are dynamic libraries for Windows. They contain compiled code and data which are loaded automatically when the application which requires them starts, or later when it requests them. Their format is the same as EXE files.
- dng
- The DNG format aims at replacing proprietary raw file formats produced by digital cameras with a common format based on TIFF. It's developed and promoted by Adobe.
- doc
- The DOC format was the native format of MS Word; it has been replaced by DOCX. The DOC format is based on the Microsoft Compound Document file format, where binary file streams, identified by names and directories, are stored in a single file.
- docx
- The DOCX format is the native format of MS Word; it's the successor of the DOC format. With XLSX and PPTX, it's also one of the formats of Office Open XML, an ISO standard. Office Open XML files are based on OPC, a hierarchical file system stored in a zip file.
- eml
- E-mail messages can be stored in .eml files as they're transmitted, with their headers and attachments. EML is mostly a text format with support for different character encodings and binary data which are usually encoded in ASCII. Mbox files contain a succession of multiple messages starting with a single start line and separated with an empty line; they come in multiple variants which differ subtly in the way mail content which could be misinterpreted as a message start line is encoded.
- eps
- EPS stands for Encapsulated PostScript, PostScript programs for graphics to be embedded in larger documents. EPS files contain special comments which contain metadata such as size and font dependencies. While PostScript files are usually text files, EPS can start with a binary header which specifies where are located in the file the PostScript code and a TIFF or Windows Metafile preview.
- epub
- EPUB is a file format based on ZIP to store electronic books. Files stored in the EPUB archive are XHTML files, other files used on the web such as images and CSS, and XML files which describe the structure of the book and store metadata.
- exe
- EXE is the format of applications in MS-DOS ("MZ" executable) and Windows. Windows applications start with code which MS-DOS executes to display an error message, but their real content, ignored by MS-DOS but recognized by Windows, is placed further in the file and is known as the Portable Executable format.
- exr
- EXR files store high-dynamic-range images, uncompressed or compressed with the deflate format.
- fits
- Flexible Image Transport System, a format used by astronomers for images and tables. Files start with parameters stored in fixed-length text lines of 80 characters, a hint of FITS' age...
- gif
- GIF files are bitmap palette-based files with LZW compression. They can also contain simple animations with finite or infinite loops. Because of a patent on LZW and fear, uncertainty and doubt about exactly which usage would be accepted, the PNG format was designed as a replacement, based on the more efficient and patent-free deflate compression format, and overcoming some limitations of the GIF format. Yet GIF has survived, the LZW patent has expired, and GIF animations are cool again.
- gz
- Gzip files contain one, or less frequently several, files compressed with the deflate algorithm. Gzip is often used to compress a single tar archive, resulting in a .tar.gz or .tgz file.
- hdf5
- HDF5 files contain scientific data in a hierarchical tree similar to a file system. HDF means Hierarchical Data Format.
- ico
- ICO files are Windows icon files. They contain multiple images for different resolutions and pixel depths. Each image is encoded as a header-less BMP or a PNG subfile.
- ipa
- iOS application archive files are packages which contain the code and resources of iOS applications. IPA files use the ZIP format.
- jar
- JAR files are packages which contain multiple compiled Java class files, as well as metadata. They are common to distribute compiled Java applications or libraries. JAR files use the ZIP format.
- jp2, jpf, jpx
- JPEG-2000 is a wavelet-based lossy image compression method, a more modern alternative to JPEG which has remained much less common. Wavelets are a way to represent data at different resolutions; for images, this allows for a cruder encoding (with fewer bits) for finer details where human vision doesn't notice. JPEG-2000 files ara available as two variants, "core coding systems" (jp2) or with "extensions" (jpf or jpx), but the file format is the same, based on the ISO base media file format like MPEG-4.
- jpg
- The JPEG compression method (Joint Photographic Experts Group) is the most common lossy compression method for digital images. Technically, JPEG refers to a family of compression methods, some of them lossless; not to a file format. The most common file format is JFIF (JPEG File Interchange Format), where data structures are stored in segments starting with a marker. Some segments are reserved for extensions, such as Exif for metadata stored like the TIFF format, or XMP for metadata stored as XML. Complete JPEG images stored as JFIF can also be found in JFIF itself for thumbnails and in PDF.
- mat
- MAT files are data files of Matlab, software by Mathworks for numerical computation. Peek understands versions 4 and 5.
- midi
- MIDI is an interface to connect synthezisers, keyboards, computers and similar devices. It defines the physical link as well as the protocol, a stream of events which specify the start and end of notes, channels, effects, and other parameters. Standard MIDI files contain essentially these events with timing information and meta-data.
- mov
- QuickTime movie files share the format of MPEG-4 which they have inspired. The format name is known as ISO base media file format; it's also used by JPEG-2000.
- mp3
- MP3 is an audio lossy compression format developed for the MPEG-1 standard. Metadata can be added as ID3, version 1 (128 bytes at the end of the file) or version 2 (at the start of the file, designed to not be decoded as audio).
- mp4
- MPEG-4 movie files share the general format of QuickTime MOV files, the ISO base media file format.
- mpg
- MPG files are MPEG version 1 or 2 movie files.
- ods
- ODS files are OpenDocument files containing spreadsheet data. The OpenDocument Format is similar to Office Open XML: both are standards based on XML files stored in ZIP archives with additional manifest files. OpenDocument is better documented and simpler. ODS is the default spreadsheet format of Apache OpenOffice (previously OpenOffice.org), LibreOffice and other open-source applications, the equivalent of XLSX in MS Excel.
- odt
- ODT files are OpenDocument files containing word-processing data. It's the default format for word-processing documents in Apache OpenOffice and LibreOffice, equivalent to DOCX for MS Word.
- pdf
- PDF files are made of objects, i.e. pieces of information, with cross-references. RC4 encryption with empty passwords is decoded. PDF files often contain objects of other formats, such as JPEG for images or XMP for metadata; they can also embed arbitrary files which can be extracted.
- pkg
- macOS package files (.pkg) are XAR archives with some fixed file and directory components.
- png
- PNG image files are made of chunks. Pixels are compressed with the deflate method and stored in a zlib container.
- ppt, ppa
- The PPT format was the native format of MS Powerpoint; it has been replaced by PPTX. Like DOC and XLS, the PPT format is based on the Microsoft Compound Document file format. PPA uses the same internal format for auxiliary Powerpoint data.
- pptx
- The PPTX format is the native format of MS Powerpoint; it's the successor of the PPT format. With DOCX and XLSX, it's also one of the formats of Office Open XML, an ISO standard.
- riff
- The RIFF format (Resource Interchange File Format) contains resources with properties. Data are stored in chunks, possibly nested, wth properties. It is used by multimedia file formats such as WAV (uncompressed audio) and AVI (video).
- rpm
- RPM files are packages for the RPM Package Manager (also known as RPM) used on Linux distributions such as Redhat, Fedora and CentOS. Unlike DEB files, RPM files have a specific binary format, which ends with the files to install as a cpio archive compressed with gzip.
- shp
- Shapefiles are binary files which contain vector spatial shapes such as points, lines or polygons. Coordinates are geographical positions. Each file contains a set of shapes of the same type, except for null shapes which are permitted in any shapefile. Shape attributes are stored separately, usually in dbf files.
- stl
- An STL file contains the triangulated surface of a 3D object. It's used by 3D printers for rapid prototyping and can be produced by many CAD applications. Two variants exist, ascii (text) and binary.
- svg
- SVG is a format based on XML to store vector graphics. SVG files are often compressed with gzip, with suffix .svgz (aka .svg.gz).
- tar
- TAR files are archive files on unix and unix-like systems. The plain TAR format itself is not compressed, but TAR files are often compressed, resulting in an additional suffix like .tar.gz (aka .tgz) or .tar.bz2 (not dissected by Peek).
- tiff
- TIFF files are image files, often stored without compression. The TIFF format is very flexible. It's the basis of Exif, the most common way metadata are stored in JPEG, TIFF (of course!), DNG, and other image formats; and of DNG itself.
- txt
- Text files (also known as plain text, raw text, or ascii) contain text where each character is represented by (usually) one byte, and can be displayed to a human (almost) without processing. Program source code is often stored as text. Peek reports some basic properties such as whether they're encoded as ASCII (each character is encoded in 7 bits), the control character or character sequence used as line terminator (LF, CR, CRLF or LFCR) and the number of lines. Peek also displays these statistics for files with the following suffices: asc, bas, c, cpp, cs, f, f77, h, hpp, java, js, md, pas, py, sq, sqd, tex, text, wrl, x3dv.
- u3d
- The U3D format (Universal 3D) is a binary format for 3D models.
- vcard
- vCard is a standard to store a variety of information about individuals and other entities, such as names, contact information and photos. It's used as an exchange format for business cards and address books. Data elements are stored as name/value pairs in a UTF-8 text file; binary data is encoded as text. Files have a .vcard or .vcf suffix.
- wma
- WMA is an ASF file containing audio.
- wmf
- Windows Metafile Format is a dump of internal Windows structures for exchanging and storing vector graphics. It can leak to files saved by MS Office.
- wmv
- WMV is an ASF file containing video.
- xar
- XAR files (eXtensible ARchive format) are archives whose file components are usually compressed with zlib. Their most common use is in package files (.pkg) on macOS.
- xls
- The XLS format was the native format of MS Excel; it has been replaced by XLSX. Like DOC and PPT, the XLS format is based on the Microsoft Compound Document file format.
- xlsx
- The XLSX format is the native format of MS Excel; it's the successor of the XLS format. With DOCX and PPTX, it's also one of the formats of Office Open XML, an ISO standard.
- xmp
- XMP is a format based on XML to store metadata. It can be used as an alternative to Exif in image files and in other formats such as PDF.
- zip
- ZIP files are archives where files can be individually compressed, typically with the deflate algorithm. It's also used in formats which aren't identified immediately as archives, such as DOCX, ODS, EPUB and JAR. In these specialized formats, the hierarchical file layout provided by ZIP, which matches files in a directory, is often complemented with a specific mechanism, such as a manifest file listing some or all of the other files with their purpose.