Frequently Asked Questions

Please reach us at info@napadb.com if you cannot find an answer to your question.

What industries does NapaDB Data Lake serve?

NapaDB Data Lake serves clients in a wide range of industries where there is a need to manage trillions of rows of structured data, including healthcare, data center logs, semiconductor manufacturing, retail, and more. We have experience working with clients of all sizes, from small startups to large enterprises.

What are the benefits of working with NapaDB Data Lake?

Working with NapaDB Data Lake provides numerous benefits, including access to a team of experienced professionals, ability to manage very large amount of structured data, and a commitment to delivering high-quality products and services that drive results.

How to interface with NapaDB Data Lake?

NapaDB uses a client-server model. A client with an SQL-like interface is shipped with the product.

What platform NapaDB runs on?

NapaDB server and client both have been tested on Ubuntu 20.04 running on Intel64 bit hardware..

Why NapaDB does not have JOIN SQL Operation?

NapaDB is designed to deal with very large tables. JOIN operation involves Cartesian Product of two tables. Imagine calculating Cartesian Product of two tables with 1 trillion rows in each. It is not possible practically. That why there is no JOIN operation in NapaDB. INNER JOIN is under developemment.

Which applications can make use of NapaDB?

If your application related to data analysis or machine learning is dealing with tables with 100's of billions or trillions of rows and you are experiencing very slow performance, NapaDB can be used to store large structured data in 2-D tables and help in extracting a small subset that can be fed into your existing applications to expedite the processing.

What type of new applications can be developed using NapaDB?

If you are unable to develop applications related to GNOME, machine learning, data analysis, etc. due to the large size of data, NapaDB can help you succeed by managing the huge volume of data and feed small subsets to your application on demand for further processing. The first step is to determine the suitability of NapaDB for your application. You have to determine the volume of data you will deal with in your application. If the data size exceeds say 1 billion rows and expected to touch 100's of billion or trillions, of rows then you need NapaDB. If that is the case, you should first develop your application using in-house tools or other commercial/Open-Source data analysis software. Once the application is done, plug NapaDB at the back-end and your application will suddenly be able to deal with unlimited amount of data.

What data types are supported by NapaDB?

NapaDB supports BYTE, SHORT, INT, LONG, LONG LONG, FLOAT, DOUBLE, LONG DOUBLE, DATE, TIMESTAMP, and CHAR (strings) data types to define table columns. Users can create INDEX for columns of all data types for fast access.

Why NapaDB does not support Unsigned values?

Mixing Unsigned and Signed values in Boolean and Arithmetic expressions may lead to unpredictable results. That is why NapaDB supports only Signed values.

What essential requirement must be met by a Data Lake with no limit on data size?

The search time for a single value in a column filled with random values must be approximately same as the number of rows increase. NapaDB meets this requirement.

What does NapaDB do best?

The main strength of NapaDB is the SQL SELECT operation on a single column for a table with unlimited number of rows.

If NapaDB's best performance is on a single column then how do I deal with search on multiple columns?

The output of a SELECT statement can be redirected to a new or existing table in NapaDB. If one wants to search a table with fast speed on multiple columns, he/she can write the first SELECT with its output redirected to a new table and the subsequent SELECT statements can perform search on the newly created tables one after another. If a SELECT statement has boolean expressions using multiple columns, INDEXs will not be used and the search will be very slow.

What is the comparison between NapaDB and Key-Value Tables/Stores?

Suppose the key is an UUID (80 bytes long formed using only decimal digits) and Key-Value store is implemented using sparse table (a common method). The number of rows will be 10**80. This number exceeds the number of atoms in the known Universe. Key-Value store will not be able handle such UUID's. In case of NapaDB, UUID can be say 1000 bytes long and can be formed using all characters and digits. Since NapaDB does not search based on keys nor it has a concept of keys, it can handle such a wide UUID (key) and give the same search performance on all indexed columns as a Key-Value store would.

What is the comparison between NapaDB and In-Memory Databases?

NapaDB is on-disk Dala Lake and it is ~300 times faster than In-Memory Databases for a search operation on a single column. It means that NapaDB does not need costly servers to operate.

What is the comparison between NapaDB and Vector Databases?

Vector Databases have application specific APIs to operate on entities in N-dimensional space. For example, give all entities within a given Hemming Distance from HORSE to retrieve related animals. NapaDB has a generic interface based on Relational Model, but with super fast speed. If a Vector Database has performance problems due to large number of entities, its APIs can be built over NapaDB to have fast response time.

What is the roadmap of NapaDB?

1) Make queries involving more than one column super fast.

2) INNER JOIN

3) Introduce a rule engine that can be used to set rules to detect certain patterns or conditions present in the data. It will help in automation. An implementation of a rule engine exists. It needs to be plugged into NapaDB.

NapaDB Data Lake:
~300 times faster than In-Memory DB**

NapaDB Data Lake:
~300 times faster than In-Memory DB**

Frequently Asked Questions

NapaDB Data Lake: ~300 times faster than In-Memory DB**

Frequently Asked Questions

This website uses cookies.

NapaDB Data Lake:
~300 times faster than In-Memory DB**