Advanced Features
Debugging & Validation
In Section 1 of the project configuration, there are three parameters supporting the devleopment of projects and testing of prompt configurations. They are:
log_level
: [low
],medium
, orhigh
.duplication
: [no
], oryes
.cot_justiication
: [no
], oryes
. First, if debuggin level is higher than low all API responses can be inspected in details. This means that besides output files, users will be able to access, either on terminal (stdout -log_level
:medium
) or in a log file (log_level
:high
), the complete reponses and eventual errors from the API and the prismAId execution.
Second, duplication makes possible to test whether a prompt definition is clear enough. In fact, if running twice the same prompt generates different ouptut it is very likely that the prompt is not deifning the model reviewing task clearly enough. Setting duplication
: yes
and then checking if answers differ in the two analyses of the same manuscript is a good way to assess whether the prompt is clear enough to be used for the review project.
Duplicating manuscripts increases the cost of the project run, but the total cost presented at the beginning of the analysis is updated accordingly to let researchers assess the cost to be incurred. Hence, for instance, with Google AI as provider and Gemini 1.5 Flash model, without duplication:
Unless you are using a free tier with Google AI, the total cost (USD - $) to run this review is at least: 0.0005352
This value is an estimate of the total cost of input tokens only.
Do you want to continue? (y/n): y
Processing file #1/1: lit_test
With duplication active:
Unless you are using a free tier with Google AI, the total cost (USD - $) to run this review is at least: 0.0010704
This value is an estimate of the total cost of input tokens only.
Do you want to continue? (y/n): y
Processing file #1/2: lit_test
Waiting... 30 seconds remaining
Waiting... 25 seconds remaining
Waiting... 20 seconds remaining
Waiting... 15 seconds remaining
Waiting... 10 seconds remaining
Waiting... 5 seconds remaining
Wait completed.
Processing file #2/2: lit_test_duplicate
Third, in order to assess if the prompt definition are not only clear but also effective in extracting the information the researcher is looking for, it is is possible to use cot_justiication
: yes
. This will create an output .txt
for each manuscript containing the chain of thought (CoT) justification for the answer provided. Technically, the justification is provided by the model in the same chat as the answer, and right after it.
The ouput in the justification output reports the information requested, the answer provided, the modle CoT, and eventually the relevant sentences in the manuscript reviewd, like in:
- **clustering**: "no" - The text does not mention any clustering techniques or grouping of data points based on similarities.
- **copulas**: "yes" - The text explicitly mentions the use of copulas to model the joint distribution of multiple flooding indicators (maximum soil moisture, runoff, and precipitation). "The multidimensional representation of the joint distributions of relevant hydrological climate impacts is based on the concept of statistical copulas [43]."
- **forecasting**: "yes" - The text explicitly mentions the use of models to predict future scenarios of flooding hazards and damage. "Future scenarios use hazard and damage data predicted for the period 2018–2100."
Rate Limits
We enforce usage limits for models through two primary parameters specified in Section 1 of the project configuration:
tpm_limit
: Defines the maximum number of tokens that the model can process per minute.rpm_limit
: Specifies the maximum number of requests that the model can handle per minute.
For both parameters, a value of 0
is the default and is used if the parameter is not specified in the configuration file. The default value has a special meaning: no delay will be applied. However, if positive numbers are provided, the algorithm will compute delays and wait times between requests to the API accordingly.
Please note that we do not support automatic enforcement of daily request limits. If your usage tier includes a maximum number of requests per day, you will need to monitor and manage this limit manually.
On OpenAI, for example, as of August 2024 users in tier 1 are subject to the following rate limits:
Model | RPM | RPD | TPM | Batch Queue Limit |
---|---|---|---|---|
gpt-4o | 500 | - | 30,000 | 90,000 |
gpt-4o-mini | 500 | 10,000 | 200,000 | 2,000,000 |
gpt-4-turbo | 500 | - | 30,000 | 90,000 |
gpt-3.5-turbo | 3,500 | 10,000 | 200,000 | 2,000,000 |
On GoogleAI, as of August 2024 free of charge users are subject to the limits:
Model | RPM | RPD | TPM |
---|---|---|---|
Gemini 1.5 Flash | 15 | 1,500 | 1,000,000 |
Gemini 1.5 Pro | 2 | 50 | 32,000 |
Gemini 1.0 Pro | 15 | 1,500 | 32,000 |
while pay-as-you-go users are subject to:
Model | RPM | RPD | TPM |
---|---|---|---|
Gemini 1.5 Flash | 1000 | - | 4,000,000 |
Gemini 1.5 Pro | 360 | - | 4,000,000 |
Gemini 1.0 Pro | 360 | 30,000 | 120,000 |
PLEASE NOTE: If you choose the cost minimization approach described below you must report in the configuration file the smallest tpm and rpm limits of the models by the provider you selected. This is the only way to ensure respecting limits since there is no authomatic check on them by prismAId and the selected model varies because of number of tokens in requests and model use prices.
Cost Minimization
In Section 1 of the project configuration:
model
: Determines the model to use. Options are:- Leave empty
''
- Leave empty
- The cost of using OpenAI models is calculated based on tokens.
- prismAId utilizes a library to compute the input tokens for each single-shot prompt before actually executing the call using another library. Based on the information provided by OpenAI, the cost of each input token for the different models is used to compute the total cost of the inputs to be used in the review. This estimated cost is presented to the user, allowing them to decide whether to proceed with the analysis and incur the associated cost.
- prismAId calls the Google CountTokens API to compute the input tokens for each single-shot prompt before actually executing the call using a library. Based on the information provided by Google AI, the cost of each input token for the different models is used to compute the total cost of the inputs to be used in the review.
- Concise but complete prompts are both cost-effective and efficient in information extraction. Unnecessary text increases costs and may introduce noise, negatively affecting the performance of AI models. While additional explanations and definitions in the prompt engineering part may seem superfluous, they are generally limited in size and do not significantly impact costs.
- By using a project API key, it is possible to track the cost of each project on the OpenAI dashboard or the Google AI dashboard.
- The cost assessment function is indicative.
- We strive to maintain up-to-date data for cost estimation, though our estimations currently pertain only to the input aspect of AI model usage. As such, we cannot guarantee precise assessments.
- Tests should be conducted first, and costs should be estimated more precisely by analyzing the data from the OpenAI dashboard or the Google AI dashboard.