Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: could not convert string to float #20

Open
debboutr opened this issue Jul 28, 2017 · 2 comments
Open

ValueError: could not convert string to float #20

debboutr opened this issue Jul 28, 2017 · 2 comments

Comments

@debboutr
Copy link

Traceback (most recent call last):

  File "<ipython-input-129-655019aebcfc>", line 1, in <module>
    for rec in dbf:

  File "C:\Users\Rdebbout\AppData\Local\Continuum\Anaconda2\envs\cdi3\lib\site-packages\dbfread\dbf.py", line 316, in _iter_records
    for field in self.fields]

  File "C:\Users\Rdebbout\AppData\Local\Continuum\Anaconda2\envs\cdi3\lib\site-packages\dbfread\dbf.py", line 316, in <listcomp>
    for field in self.fields]

  File "C:\Users\Rdebbout\AppData\Local\Continuum\Anaconda2\envs\cdi3\lib\site-packages\dbfread\field_parser.py", line 79, in parse
    return func(field, data)

  File "C:\Users\Rdebbout\AppData\Local\Continuum\Anaconda2\envs\cdi3\lib\site-packages\dbfread\field_parser.py", line 174, in parseN
    return float(data.replace(b',', b'.'))

ValueError: could not convert string to float: b'\x00\x00\x00\x00\x00\x00\x00\x00\x00

Through reading other issues I have found a way to fix the problem, but I'm not sure of the best way to implement it.

class TestFieldParser(FieldParser):    
    def parseN(self, field, data):
        """Parse numeric field (N)

        Returns int, float or None if the field is empty.
        """
        # In some files * is used for padding.
        data = data.strip().strip(b'*')

        try:
            return int(data)
        except ValueError:
            if not data.strip():
                return None
            elif isinstance(data, (bytes, bytearray)):  # I added these 2 lines
                return int.from_bytes(data, byteorder='big', signed=True)
            else:
                # Account for , in numeric fields
                return float(data.replace(b',', b'.')) 

This works in my instance, but there may be a better way to implement, I'm not sure of the best way to make the comparison to find if the data object is a byte literal, I had also made the comparison as such,

data == b'\x00\x00\x00\x00\x00\x00\x00\x00\x00'

Hope this helps

@olemb
Copy link
Owner

olemb commented Aug 18, 2017

Thanks for reporting this. This is the first time I see a binary N field.

I would add your changes but I'm worried that it would cause other files to fail. For example, the value b' 3.14' could be parsed either as text (3.14) or as binary (858665268), but without any context there's no way to tell which one to choose. (parseN() will always get data as a byte string so you can't tell by its type.)

Do you know what software was used to create the file?

@farwestdev
Copy link

farwestdev commented May 7, 2019

My solution was to simply strip out all null \x00 characters before processing numbers and dates. I've added this code in my own customer parser that overrides parseN and parseD:

from dbfread import FieldParser

class MyFieldParser(FieldParser):
    def parseN(self, field, data):
        data = data.strip().strip(b'*\x00')  # Had to strip out the other characters first before \x00, as per super function specs.
        return super(MyFieldParser, self).parseN(field, data)

    def parseD(self, field, data):
        data = data.strip(b'\x00')
        return super(MyFieldParser, self).parseD(field, data)

# Usage
db = DBF("mydbf.dbf", parserclass=MyFieldParser)

I'm converting some old-school point-of-sale databases over to a newer database. I'm not sure if the data is simply corrupt in the DBF (although Excel doesn't seem to mind opening them without warning). The newer sets of DBFs in my collection doesn't seem to have this issue so I'm thinking it could just be improper file handling from older software.

Hope this helps someone.

Edit: Added import line to code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants