First round of speed ups

I started to optimize the subband synthesis with some mild success. If i use psyco and a mono mp3 at 128kps, it 10x slower than realtime. I think I’m going to have to resort to numpy tricks. A short term goal is 100% python realtime decode of an mp3 (super common python modules are fine).

Posted in pyMP3 | 4 Comments

Decoder first pass complete!!

I finished the decoder and it can produce wav file output from an mp3. The good news is that is works, the bad news is that its 34 times to slow to play the audio runtime. (I have a core2 duo running at 1.83ghz.)

Since the spec pdf I found is out of date, I had to resort to looking at a couple mp3 decoders source to get the tables and algorithms. As such the python code is very close in form to the originals. I hate when it feels like im just porting code to python, but the exercise did help me understand the mp3 spec more. I plan to refactor alot of the code that was based on others decoders to make it myown and more python friendly.

Time to start cleaning up the code for readability and speed. Maybe there is enough fat to trim to get the decoding fast enough for direct playback.

Here’s some profile data:

Wed Aug 25 23:19:54 2010    stats.dat

         5725333 function calls (5724461 primitive calls) in 139.562 CPU seconds

   Ordered by: cumulative time
   List reduced from 413 to 20 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000  139.564  139.564 :1()
        1    0.021    0.021  139.564  139.564 {execfile}
        1    0.003    0.003  139.543  139.543 ./pyMP3.py:10()
        1    0.017    0.017  122.649  122.649 ./pyMP3.py:149(decode)
      100    0.832    0.008   61.701    0.617 ./pyMP3.py:587(polyphase_synthesis)
     7200   60.602    0.008   60.719    0.008 ./pyMP3.py:609(subBandSynthesis)
      100    1.555    0.016   29.090    0.291 ./pyMP3.py:543(hybrid_synthesis)
    12800   27.348    0.002   27.535    0.002 ./pyMP3.py:563(inv_mdct)
      100   17.204    0.172   23.309    0.233 ./pyMP3.py:205(dequantize_samples)
        1    0.000    0.000   16.655   16.655 ./pyMP3.py:78(__init__)
        1    8.702    8.702   16.589   16.589 ./bitbfr.py:2(__init__)
  4052179    7.888    0.000    7.888    0.000 {ord}
   739584    5.587    0.000    5.587    0.000 {pow}
      100    0.101    0.001    4.206    0.042 ./pyMP3.py:637(decode_main_info)
      400    1.438    0.004    4.034    0.010 ./pyMP3.py:672(decode_main_data_huffman)
    65073    1.410    0.000    2.558    0.000 ./pyMP3.py:714(huffman_decoder)
      100    1.471    0.015    1.471    0.015 ./pyMP3.py:510(antialias_samples)
   295978    1.296    0.000    1.296    0.000 ./bitbfr.py:18(read_bits)
        1    0.486    0.486    1.005    1.005 ./pyMP3.py:179(write_wav)
      100    0.898    0.009    0.898    0.009 ./pyMP3.py:273(process_stereo)

Wed Aug 25 23:19:54 2010    stats.dat

         5725333 function calls (5724461 primitive calls) in 139.562 CPU seconds

   Ordered by: internal time
   List reduced from 413 to 20 due to restriction 

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
     7200   60.602    0.008   60.719    0.008 ./pyMP3.py:609(subBandSynthesis)
    12800   27.348    0.002   27.535    0.002 ./pyMP3.py:563(inv_mdct)
      100   17.204    0.172   23.309    0.233 ./pyMP3.py:205(dequantize_samples)
        1    8.702    8.702   16.589   16.589 ./bitbfr.py:2(__init__)
  4052179    7.888    0.000    7.888    0.000 {ord}
   739584    5.587    0.000    5.587    0.000 {pow}
      100    1.555    0.016   29.090    0.291 ./pyMP3.py:543(hybrid_synthesis)
      100    1.471    0.015    1.471    0.015 ./pyMP3.py:510(antialias_samples)
      400    1.438    0.004    4.034    0.010 ./pyMP3.py:672(decode_main_data_huffman)
    65073    1.410    0.000    2.558    0.000 ./pyMP3.py:714(huffman_decoder)
   295978    1.296    0.000    1.296    0.000 ./bitbfr.py:18(read_bits)
      100    0.898    0.009    0.898    0.009 ./pyMP3.py:273(process_stereo)
      100    0.832    0.008   61.701    0.617 ./pyMP3.py:587(polyphase_synthesis)
      100    0.797    0.008    0.797    0.008 ./pyMP3.py:460(reorder_samples)
   230400    0.518    0.000    0.518    0.000 {abs}
        1    0.486    0.486    1.005    1.005 ./pyMP3.py:179(write_wav)
    35205    0.318    0.000    0.318    0.000 {numpy.core.multiarray.zeros}
   115213    0.281    0.000    0.281    0.000 {method 'write' of 'file' objects}
   115209    0.237    0.000    0.237    0.000 {_struct.pack}
     7230    0.150    0.000    0.150    0.000 {method 'extend' of 'list' objects}
Posted in Uncategorized | 1 Comment

Main data decoding complete

The scaling factor decoding was fairly straight forward, but the huffman tables gave me fits. I had an error in one of the lookup tables and it was causing the decoder to be off by a couple bits. Argh. I think it is all worked out now. The code is up on bitbucket (the first post). The bit reservoir is still not implemented yet.

The bit reservoir is a cool concept to put more bits where they are needed in the bit stream. This is different from Variable Bitrate. Basically, if a frame completed encoding its data and there is space left over, it can be donated to future frames. The reservoir is only used for the main data section.

The main data is split between scaling factors and coefficients. The coefficients are in the frequency domain and are split into regions ( big value region, one’s region, and zeros) Different coding schemes are used for each region. When a sound signal is converted to the frequency domain, it tends to have most of its energy in the lower bands. Since the values are larger towards the bottom, more bits are used to encode them. After the big value region, most of the values are -1, 0, or 1. That region is called the one’s region and can be represented with few bits. After that, all of the coefficients are 0.

Posted in Uncategorized | Leave a comment

Main data decoding and the joys of bad documentation

I got tripped up trying to make a first pass at decoding the main data section. The MP3 spec doc i got is outdated and is missing information. So the info I have for the main data section comes from random pdfs and mp3 decoder sources.

MP3 frames are composed of a syncword, header, sidedata, and then main data. The main data has 2 parts called granules. Each granule starts with a set of scaling factors and then is followed by huffman encoded coefficients.

To make sure that python code is correct, I started adding printfs to the reference decoder. So far, so good.

Posted in pyMP3 | Leave a comment

Eureka!!

I was trying to dig up more info on the mp3 format and i ran across a great reference site: www.mp3-tech.org One of the docs there is the offical MP3 spec. Now I can code against the reference instead of looking at various decoder implementations

Posted in pyMP3 | Leave a comment

Stage II: Sidedata parsing

The next step is to parse the sidedata. The sidedata has the parameters that are needed to decode the main data that is to follow. Since I don’t have the ISO doc, this is where things get dicey. I have the source of a couple open source mp3 decoders and some random docs on mp3s, but I can’t really vouch for the correctness.

If the protection bit was set in the header, there is a 16 bit CRC word before the start of the sidedata.

pseudo code of sidedata format (primarily based fraunhofer ref source):

main_data_begin = getbits(9)

if  mono
    private_bits = getbits(5)
else
    private_bits = getbits(3)

for ch from 0 upto num_channels
    for i from 0 upto 4
        scfsi[ch][i] = getbits(1)

for gr from 0 upto 2
     for ch from upto num_channels
          part2_3_length[ch][gr] = getbits(12)
          big_values[ch][gr] = getbits(9)
          global_gain[ch][gr] = getbits(8)
          scalefac_compress[ch][gr] = getbits(4)
          window_switching_flag[ch][gr] = getbits(1)
          if window_switching_flag[ch][gr]
              block_type[ch][gr] = getbits(2)
              mixed_block_flag[ch][gr] = getbits(1)
              for i from 0 upto 2
                   table_select[ch][gr][i] = getbits(5)
              for i from 0 upto 3
                   subblock_gain[ch][gr][i] = getbits(3)
              if block_type[ch][gr] == 2 and mixed_block_flag[ch][gr] == 0
                  region0_count[ch][gr] = 8
              else
                  region0_count[ch][gr] = 7
              region1_count[ch][gr] = 20 - region0_count[ch][gr]
          else
               for i from 0 upto 3
                   table_select[ch][gr][i] = getbits(5)
               region0_count[ch][gr] = getbits(4)
               region1_count[ch][gr] = getbits(3)
               block_type[ch][gr] = 0
          preflag[ch][gr] = getbits(1)
          scalefac_scale[ch][gr] = getbits(1)
          count1table_select[ch][gr] = getbits(1)

gr = granule. Each frame has 2 granules each with 18*32 subband samples
main_data_end = negative offset in bytes from next frames header to the end of main data
private_bits = standard has no meaning for these.
scfsi= scale factor selection info. ‘0’ means there are scalefactors for each granule, ‘1’ means one set of scalefactors
part2_3_length=number of bits used for scalefactors and huffman data
big_values= size of big value section
global_gain=quantize step size.
scalefac_compress = used to lookup number of bits in scalefactors
window_switching_flag = true means a block other than normal(0) is used
block_type = used to select window type (len and style)
mixed_block_flag= true if different windows for high and low freq bands
table_select = which huffman table to use.
subblock_gain=offset from global gain for each short block
region0_count = number of scalefactors in region 0 minus one
region1_count = number of scalefactors in region 1 minus one
preflag = if set, a table of values are added to the scalefactors
scalefac_scale = chooses step size of scalefactor quantizer
count1table_select =huffman table selection for “ones” region

Later today I’m going to post python code that decodes the sidedata and also descriptions of each of the values parsed.

Posted in pyMP3 | Leave a comment

Stage I: Sync bits and Headers

Time to get roll’n. First a standard disclaimer. The code is meant to be readable and written quickly. I’ll go back later and optimize. Also I make mistakes, but once discovered I’ll correct them as fast as possible. whew…

MP3s are composed of a series of frames. And each frame starts with a sync word. The sync word is 12 1’s (0xFFF). If reading an MP3 stream, one would wait for the sync word and then try to decode the rest of the header and maybe wait for the next frame to make sure that the syncword that was found was real and not just a random occurrence.

Bit start bit pos bit length Description
0 12 Frame sync word 0xFFF
12 1 MPEG version, 0=v2 1=v1
13 2 Layer, 00=reserved 01=layer III 10=layer II 11=layer I
15 1 Protection bit, 0=has 16bit CRC after header 1=no protection
16 4 Bitrate index

bits bitrate
0000 free
0001 32
0010 40
0011 48
0100 56
0101 64
0110 80
0111 96
1000 112
1001 128
1010 160
1011 192
1100 224
1101 256
1110 320
1111 bad
20 2 Sampling rate 00=44100 01=48000 10=32000 11=reserved
22 1 Padding bit 0=no pad 1=padded
23 1 Private bit
24 2 Channel mode 00=stereo 01=joint stereo 10=dual channel 11= single
26 2 Mode extention (joint stereo)
28 1 Copyrighted? 1=true
29 1 Original? 1=true
30 2 Emphasis 00=none 01=50/15ms 10=reserved 11=CCIT J.17

I whipped up some code to find the sync and parse the header from a standard MP3 and created a new bitbucket repo.

Bitbucket home of pyMP3
bitbfr.py
pyMP3.py

Posted in pyMP3 | 1 Comment